diff --git a/README.md b/README.md index e3c4a51..5021f10 100644 --- a/README.md +++ b/README.md @@ -10,21 +10,42 @@ ## 🔰 Introduction Sharing high-end GPUs or even prosumer & consumer GPUs between multiple users is the most cost-effective -way to accelerate AI development. Unfortunately until now the -only solution existed applied for MIG/Slicing high-end GPUs (A100+) and required Kubernetes,
+way to accelerate AI development. Unfortunately, until now the +only existing solution applied for MIG/Slicing high-end GPUs (A100+) and required Kubernetes,
+ 🔥 🎉 Welcome To Container Based Fractional GPU For Any Nvidia Card! 🎉 🔥
+ We present pre-packaged containers supporting CUDA 11.x & CUDA 12.x with pre-built hard memory limitation! -This means multiple containers can be launched on the same GPU ensuring one user cannot allocate the entire host GPU memory! -(no more greedy processes grabbing the entire GPU memory! finally we have a driver level hard limiting memory option) +This means multiple containers can be launched on the same GPU, ensuring one user cannot allocate the entire host GPU memory! +(No more greedy processes grabbing the entire GPU memory! Finally we have a driver level hard limiting memory option). + +## 🚀 Offerings + +ClearML offers several options to optimize GPU resource utilization by partitioning GPUs: +* [**Dynamic GPU Slicing**](https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_fractional_gpus#dynamic-gpu-fractions): +On-demand GPU slicing per task for both MIG and non-MIG devices (available under the ClearML Enterprise plan): + * [Bare Metal deployment](https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_fractional_gpus#bare-metal-deployment) + * [Kubernetes deployment](https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_fractional_gpus#kubernetes-deploymen) +* **Container-based Memory Limits** (**this repository**): Use pre-packaged containers with built-in memory limits to +run multiple containers on the same GPU (available as part of the ClearML open source offering). +* [**Kubernetes-based Static MIG Slicing**](https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_fractional_gpus#kubernetes-static-mig-fractions): +Set up Kubernetes support for NVIDIA MIG (Multi-Instance GPU) to define GPU +fractions for specific workloads (available as part of the ClearML open source offering). + +With these options, ClearML enables running AI workloads with optimized hardware utilization and workload performance. +This repository covers container-based fractional GPUs. For more information on ClearML's fractional GPU offerings, see +the [ClearML documentation](https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_fractional_gpus). + +![Fractional GPU diagram](docs/fractional_gpu_diagram.png) ## ⚡ Installation -Pick the container that works for you and launch it +Pick the container that works for you and launch it: ```bash docker run -it --gpus 0 --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash ``` -To verify fraction gpu memory limit is working correctly, run inside the container: +To verify fraction GPU memory limit is working correctly, run inside the container: ```bash nvidia-smi ``` @@ -89,15 +110,15 @@ processes and other host processes when limiting memory / utilization usage ## 🔩 Customization -Build your own containers and inherit form the original containers +Build your own containers and inherit form the original containers. -You can find a few examples [here](https://github.com/allegroai/clearml-fractional-gpu/docker-examples). +You can find a few examples [here](https://github.com/allegroai/clearml-fractional-gpu/tree/main/examples). ## ☸ Kubernetes Fractional GPU containers can be used on bare-metal executions as well as Kubernetes PODs. -Yes! By using one the Fractional GPU containers you can limit the memory consumption your Job/Pod and -allow you to easily share GPUs without fearing they will memory crash one another! +Yes! By using one of the Fractional GPU containers you can limit the memory consumption of your Job/Pod and +easily share GPUs without fearing they will memory crash one another! Here's a simple Kubernetes POD template: ```yaml @@ -127,12 +148,12 @@ processes and other host processes when limiting memory / utilization usage ## 🔌 Support & Limitations -The containers support Nvidia drivers <= `545.x.x` +The containers support Nvidia drivers <= `545.x.x`. We will keep updating & supporting new drivers as they continue to be released **Supported GPUs**: RTX series 10, 20, 30, 40, A series, and Data-Center P100, A100, A10/A40, L40/s, H100 -**Limitations**: Windows Host machines are currently not supported, if this is important for you, leave a request in the [Issues](/issues) section +**Limitations**: Windows Host machines are currently not supported. If this is important for you, leave a request in the [Issues](/issues) section ## ❓ FAQ @@ -140,7 +161,7 @@ We will keep updating & supporting new drivers as they continue to be released **A**: Yes, `nvidia-smi` is communicating directly with the low-level drivers and reports both accurate container GPU memory as well as the container local memory limitation.
Notice GPU utilization will be the global (i.e. host side) GPU utilization and not the specific local container GPU utilization. -- **Q**: How do I make sure my Python / Pytorch / Tensorflow are actually memory limited
+- **Q**: How do I make sure my Python / Pytorch / Tensorflow are actually memory limited?
**A**: For PyTorch you can run:
```python import torch @@ -153,8 +174,8 @@ print(f'Free GPU Memory: {cuda.current_context().get_memory_info()}') ``` - **Q**: Can the limitation be broken by a user?
-**A**: We are sure a malicious user will find a way. It was never our intention to protect against malicious users,
-if you have a malicious user with access to your machines, fractional gpus are not your number 1 problem 😃 +**A**: We are sure a malicious user will find a way. It was never our intention to protect against malicious users.
+If you have a malicious user with access to your machines, fractional GPUs are not your number 1 problem 😃 - **Q**: How can I programmatically detect the memory limitation?
**A**: You can check the OS environment variable `GPU_MEM_LIMIT_GB`.
@@ -164,12 +185,12 @@ Notice that changing it will not remove or reduce the limitation. **A**: It should be both secure and safe. The main caveat from a security perspective is that a container process can see any command line running on the host system. If a process command line contains a "secret" then yes, this might become a potential data leak. -Notice that passing "secrets" in command line is ill-advised, and hence we do not consider it a security risk. -That said if security is key, the enterprise edition (see below) eliminate the need to run with `pid-host` and thus fully secure +Notice that passing "secrets" in the command line is ill-advised, and hence we do not consider it a security risk. +That said if security is key, the enterprise edition (see below) eliminate the need to run with `pid-host` and thus fully secure. - **Q**: Can you run the container **without** `--pid=host` ?
-**A**: You can! but you will have to use the enterprise version of the clearml-fractional-gpu container -(otherwise the memory limit is applied system wide instead of container wide). If this feature is important for you, please contact [ClearML sales & support](https://clear.ml/contact-us) +**A**: You can! But you will have to use the enterprise version of the clearml-fractional-gpu container +(otherwise the memory limit is applied system wide instead of container wide). If this feature is important for you, please contact [ClearML sales & support](https://clear.ml/contact-us). ## 📄 License @@ -183,12 +204,14 @@ An expanded Commercial license for use within a product or service is available ClearML offers enterprise and commercial license adding many additional features on top of fractional GPUs, these include orchestration, priority queues, quota management, compute cluster dashboard, dataset management & experiment management, as well as enterprise grade security and support. -Learn more about [ClearML Orchestration](https://clear.ml) or talk to us directly at [ClearML sales](https://clear.ml/contact-us) +Learn more about [ClearML Orchestration](https://clear.ml) or talk to us directly at [ClearML sales](https://clear.ml/contact-us). ## 📡 How can I help? Tell everyone about it! #ClearMLFractionalGPU + Join our [Slack Channel](https://joinslack.clear.ml/) + Tell us when things are not working, and help us debug it on the [Issues Page](https://github.com/allegroai/clearml-fractional-gpu/issues) ## 🌟 Credits diff --git a/docs/fractional_gpu_diagram.png b/docs/fractional_gpu_diagram.png new file mode 100644 index 0000000..9df2c90 Binary files /dev/null and b/docs/fractional_gpu_diagram.png differ