2024-03-11 11:53:54 +00:00
< div align = "center" >
2024-02-12 18:02:37 +00:00
# 🚀 🔥 Fractional GPU! ⚡ 📣
## Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing 🎊
2024-03-11 11:53:54 +00:00
`🌟 Leave a star to support the project! 🌟`
< / div >
2024-02-12 18:02:37 +00:00
## 🔰 Introduction
Sharing high-end GPUs or even prosumer & consumer GPUs between multiple users is the most cost-effective
way to accelerate AI development. Unfortunately until now the
2024-02-29 21:58:42 +00:00
only solution existed applied for MIG/Slicing high-end GPUs (A100+) and required Kubernetes, < br >
🔥 🎉 Welcome To Container Based Fractional GPU For Any Nvidia Card! 🎉 🔥 < br >
We present pre-packaged containers supporting CUDA 11.x & CUDA 12.x with pre-built hard memory limitation!
This means multiple containers can be launched on the same GPU ensuring one user cannot allocate the entire host GPU memory!
2024-02-12 18:02:37 +00:00
(no more greedy processes grabbing the entire GPU memory! finally we have a driver level hard limiting memory option)
## ⚡ Installation
2024-07-15 06:02:53 +00:00
Pick the container that works for you and launch it:
2024-02-12 18:02:37 +00:00
```bash
2024-02-26 15:01:18 +00:00
docker run -it --gpus 0 --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash
2024-02-12 18:02:37 +00:00
```
2024-07-15 06:02:53 +00:00
To verify fraction GPU memory limit is working correctly, run inside the container:
2024-02-12 18:02:37 +00:00
```bash
nvidia-smi
```
2024-02-26 15:01:18 +00:00
Here is an example output from A100 GPU:
2024-02-12 18:02:37 +00:00
```
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 A100-PCIE-40GB Off | 00000000:01:00.0 Off | N/A |
| 32% 33C P0 66W / 250W | 0MiB / 8128MiB | 3% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
```
2024-03-11 11:53:54 +00:00
### 🐳 Containers
2024-02-12 18:02:37 +00:00
2024-02-26 15:01:18 +00:00
| Memory Limit | CUDA Ver | Ubuntu Ver | Docker Image |
|:-------------:|:--------:|:----------:|:----------------------------------------:|
| 12 GiB | 12.3 | 22.04 | `clearml/fractional-gpu:u22-cu12.3-12gb` |
| 12 GiB | 12.3 | 20.04 | `clearml/fractional-gpu:u20-cu12.3-12gb` |
2024-02-29 21:58:42 +00:00
| 12 GiB | 11.7 | 22.04 | `clearml/fractional-gpu:u22-cu11.7-12gb` |
2024-02-26 15:01:18 +00:00
| 12 GiB | 11.1 | 20.04 | `clearml/fractional-gpu:u20-cu11.1-12gb` |
| 8 GiB | 12.3 | 22.04 | `clearml/fractional-gpu:u22-cu12.3-8gb` |
| 8 GiB | 12.3 | 20.04 | `clearml/fractional-gpu:u20-cu12.3-8gb` |
2024-02-29 21:58:42 +00:00
| 8 GiB | 11.7 | 22.04 | `clearml/fractional-gpu:u22-cu11.7-8gb` |
2024-02-26 15:01:18 +00:00
| 8 GiB | 11.1 | 20.04 | `clearml/fractional-gpu:u20-cu11.1-8gb` |
| 4 GiB | 12.3 | 22.04 | `clearml/fractional-gpu:u22-cu12.3-4gb` |
| 4 GiB | 12.3 | 20.04 | `clearml/fractional-gpu:u20-cu12.3-4gb` |
2024-02-29 21:58:42 +00:00
| 4 GiB | 11.7 | 22.04 | `clearml/fractional-gpu:u22-cu11.7-4gb` |
2024-02-26 15:01:18 +00:00
| 4 GiB | 11.1 | 20.04 | `clearml/fractional-gpu:u20-cu11.1-4gb` |
| 2 GiB | 12.3 | 22.04 | `clearml/fractional-gpu:u22-cu12.3-2gb` |
| 2 GiB | 12.3 | 20.04 | `clearml/fractional-gpu:u20-cu12.3-2gb` |
2024-02-29 21:58:42 +00:00
| 2 GiB | 11.7 | 22.04 | `clearml/fractional-gpu:u22-cu11.7-2gb` |
2024-02-26 15:01:18 +00:00
| 2 GiB | 11.1 | 20.04 | `clearml/fractional-gpu:u20-cu11.1-2gb` |
2024-02-12 18:02:37 +00:00
2024-02-29 21:58:42 +00:00
2024-02-12 18:02:37 +00:00
> [!IMPORTANT]
>
> You must execute the container with `--pid=host` !
> [!NOTE]
>
> **`--pid=host`** is required to allow the driver to differentiate between the container's
processes and other host processes when limiting memory / utilization usage
> [!TIP]
>
> **[ClearML-Agent](https://clear.ml/docs/latest/docs/clearml_agent/) users add `[--pid=host]` to your `agent.extra_docker_arguments` section in your [config file](https://github.com/allegroai/clearml-agent/blob/c9fc092f4eea9c3890d582aa2a098c3c2f39ce72/docs/clearml.conf#L190)**
## 🔩 Customization
2024-07-15 06:02:53 +00:00
Build your own containers and inherit form the original containers.
2024-02-12 18:02:37 +00:00
2024-07-15 06:02:53 +00:00
You can find a few examples [here ](https://github.com/allegroai/clearml-fractional-gpu/tree/main/examples ).
2024-02-12 18:02:37 +00:00
2024-03-11 11:53:54 +00:00
## ☸ Kubernetes
2024-02-12 18:02:37 +00:00
2024-03-11 11:53:54 +00:00
Fractional GPU containers can be used on bare-metal executions as well as Kubernetes PODs.
2024-07-15 06:02:53 +00:00
Yes! By using one of the Fractional GPU containers you can limit the memory consumption of your Job/Pod and
easily share GPUs without fearing they will memory crash one another!
2024-02-12 18:02:37 +00:00
Here's a simple Kubernetes POD template:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: train-pod
labels:
app: trainme
spec:
hostPID: true
containers:
- name: train-container
2024-02-26 15:01:18 +00:00
image: clearml/fractional-gpu:u22-cu12.3-8gb
2024-02-12 18:02:37 +00:00
command: ['python3', '-c', 'print(f"Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}")']
```
> [!IMPORTANT]
>
> You must execute the pod with `hostPID: true` !
> [!NOTE]
>
> **`hostPID: true`** is required to allow the driver to differentiate between the pod's
processes and other host processes when limiting memory / utilization usage
## 🔌 Support & Limitations
2024-07-15 06:02:53 +00:00
The containers support Nvidia drivers < = `545.x.x` .
2024-02-12 18:02:37 +00:00
We will keep updating & supporting new drivers as they continue to be released
2024-03-11 11:53:54 +00:00
**Supported GPUs**: RTX series 10, 20, 30, 40, A series, and Data-Center P100, A100, A10/A40, L40/s, H100
2024-07-15 06:02:53 +00:00
**Limitations**: Windows Host machines are currently not supported. If this is important for you, leave a request in the [Issues ](/issues ) section
2024-02-12 18:02:37 +00:00
## ❓ FAQ
2024-02-29 21:58:42 +00:00
- **Q**: Will running `nvidia-smi` inside the container report the local processes GPU consumption? < br >
2024-02-12 18:02:37 +00:00
**A**: Yes, `nvidia-smi` is communicating directly with the low-level drivers and reports both accurate container GPU memory as well as the container local memory limitation.< br >
Notice GPU utilization will be the global (i.e. host side) GPU utilization and not the specific local container GPU utilization.
- **Q**: How do I make sure my Python / Pytorch / Tensorflow are actually memory limited < br >
2024-02-29 21:58:42 +00:00
**A**: For PyTorch you can run: < br >
2024-02-12 18:02:37 +00:00
```python
import torch
print(f'Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}')
```
Numba example:
```python
from numba import cuda
print(f'Free GPU Memory: {cuda.current_context().get_memory_info()}')
```
2024-02-29 21:58:42 +00:00
- **Q**: Can the limitation be broken by a user? < br >
2024-07-15 06:02:53 +00:00
**A**: We are sure a malicious user will find a way. It was never our intention to protect against malicious users. < br >
If you have a malicious user with access to your machines, fractional GPUs are not your number 1 problem 😃
2024-02-12 18:02:37 +00:00
2024-02-29 21:58:42 +00:00
- **Q**: How can I programmatically detect the memory limitation? < br >
**A**: You can check the OS environment variable `GPU_MEM_LIMIT_GB` . < br >
Notice that changing it will not remove or reduce the limitation.
- **Q**: Is running the container **with** `--pid=host` secure / safe? < br >
2024-02-12 18:02:37 +00:00
**A**: It should be both secure and safe. The main caveat from a security perspective is that
a container process can see any command line running on the host system.
If a process command line contains a "secret" then yes, this might become a potential data leak.
2024-07-15 06:02:53 +00:00
Notice that passing "secrets" in the command line is ill-advised, and hence we do not consider it a security risk.
2024-02-29 21:58:42 +00:00
That said if security is key, the enterprise edition (see below) eliminate the need to run with `pid-host` and thus fully secure
2024-02-12 18:02:37 +00:00
2024-02-29 21:58:42 +00:00
- **Q**: Can you run the container **without** `--pid=host` ? < br >
2024-02-12 18:02:37 +00:00
**A**: You can! but you will have to use the enterprise version of the clearml-fractional-gpu container
2024-07-15 06:02:53 +00:00
(otherwise the memory limit is applied system wide instead of container wide). If this feature is important for you, please contact [ClearML sales & support ](https://clear.ml/contact-us ).
2024-02-12 18:02:37 +00:00
## 📄 License
2024-03-19 23:25:13 +00:00
The license to use ClearML is granted for research or development purposes only. ClearML may be used for educational, personal, or internal commercial use.
2024-02-12 18:02:37 +00:00
2024-03-19 23:25:13 +00:00
An expanded Commercial license for use within a product or service is available as part of the [ClearML ](https://clear.ml ) Scale or Enterprise solution.
2024-02-12 18:02:37 +00:00
## 🤖 Commercial & Enterprise version
ClearML offers enterprise and commercial license adding many additional features on top of fractional GPUs,
these include orchestration, priority queues, quota management, compute cluster dashboard,
2024-02-29 21:58:42 +00:00
dataset management & experiment management, as well as enterprise grade security and support.
2024-02-12 18:02:37 +00:00
Learn more about [ClearML Orchestration ](https://clear.ml ) or talk to us directly at [ClearML sales ](https://clear.ml/contact-us )
## 📡 How can I help?
Tell everyone about it! #ClearMLFractionalGPU
2024-07-15 06:02:53 +00:00
2024-02-12 18:02:37 +00:00
Join our [Slack Channel ](https://joinslack.clear.ml/ )
2024-07-15 06:02:53 +00:00
2024-02-12 18:02:37 +00:00
Tell us when things are not working, and help us debug it on the [Issues Page ](https://github.com/allegroai/clearml-fractional-gpu/issues )
## 🌟 Credits
This product is brought to you by the ClearML team with ❤️