mirror of
https://github.com/clearml/clearml-docs
synced 2025-03-09 13:42:26 +00:00
Reorder fractional GPU page (#875)
This commit is contained in:
parent
3c31b09793
commit
ce6cfecc8a
@ -6,161 +6,15 @@ GPUs to them. In order to optimize your compute resource usage, you can partitio
|
||||
device run multiple isolated workloads on separate slices that will not impact each other, and will only use the
|
||||
fraction of GPU memory allocated to them.
|
||||
|
||||
ClearML provides several GPU slicing options to optimize compute resource utilization:
|
||||
ClearML provides several GPU slicing options to optimize compute resource utilization:
|
||||
* [Dynamic GPU Slicing](#dynamic-gpu-fractions): On-demand GPU slicing per task for both MIG and non-MIG devices (**Available under the ClearML Enterprise plan**):
|
||||
* [Bare Metal deployment](#bare-metal-deployment)
|
||||
* [Kubernetes deployment](#kubernetes-deployment)
|
||||
* [Container-based Memory Limits](#container-based-memory-limits): Use pre-packaged containers with built-in memory
|
||||
limits to run multiple containers on the same GPU (**Available as part of the ClearML open source offering**)
|
||||
* [Kubernetes-based Static MIG Slicing](#kubernetes-static-mig-fractions): Set up Kubernetes support for NVIDIA MIG
|
||||
(Multi-Instance GPU) to define GPU fractions for specific workloads (**Available as part of the ClearML open source offering**)
|
||||
* Dynamic GPU Slicing: On-demand GPU slicing per task for both MIG and non-MIG devices (**Available under the ClearML Enterprise plan**):
|
||||
* [Bare Metal deployment](#bare-metal-deployment)
|
||||
* [Kubernetes deployment](#kubernetes-deployment)
|
||||
|
||||
## Container-based Memory Limits
|
||||
Use [`clearml-fractional-gpu`](https://github.com/allegroai/clearml-fractional-gpu)'s pre-packaged containers with
|
||||
built-in hard memory limitations. Workloads running in these containers will only be able to use up to the container's
|
||||
memory limit. Multiple isolated workloads can run on the same GPU without impacting each other.
|
||||
|
||||

|
||||
|
||||
### Usage
|
||||
|
||||
#### Manual Execution
|
||||
|
||||
1. Choose the container with the appropriate memory limit. ClearML supports CUDA 11.x and CUDA 12.x with memory limits
|
||||
ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/blob/main/README.md#-containers) for full list).
|
||||
1. Launch the container:
|
||||
|
||||
```bash
|
||||
docker run -it --gpus 0 --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash
|
||||
```
|
||||
|
||||
This example runs the ClearML Ubuntu 22 with CUDA 12.3 container on GPU 0, which is limited to use up to 8GB of its memory.
|
||||
:::note
|
||||
--pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
|
||||
:::
|
||||
1. Run the following command inside the container to verify that the fractional gpu memory limit is working correctly:
|
||||
```bash
|
||||
nvidia-smi
|
||||
```
|
||||
Here is the expected output for the previous, 8GB limited, example on an A100:
|
||||
```bash
|
||||
+---------------------------------------------------------------------------------------+
|
||||
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|
||||
|-----------------------------------------+----------------------+----------------------+
|
||||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
||||
| | | MIG M. |
|
||||
|=========================================+======================+======================|
|
||||
| 0 A100-PCIE-40GB Off | 00000000:01:00.0 Off | N/A |
|
||||
| 32% 33C P0 66W / 250W | 0MiB / 8128MiB | 3% Default |
|
||||
| | | Disabled |
|
||||
+-----------------------------------------+----------------------+----------------------+
|
||||
|
||||
+---------------------------------------------------------------------------------------+
|
||||
| Processes: |
|
||||
| GPU GI CI PID Type Process name GPU Memory |
|
||||
| ID ID Usage |
|
||||
|=======================================================================================|
|
||||
+---------------------------------------------------------------------------------------+
|
||||
```
|
||||
#### Remote Execution
|
||||
|
||||
You can set a ClearML Agent to execute tasks in a fractional GPU container. Set an agent’s default container via its
|
||||
command line. For example, all tasks pulled from the `default` queue by this agent will be executed in the Ubuntu 22
|
||||
with CUDA 12.3 container, which is limited to use up to 8GB of its memory:
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue default --docker clearml/fractional-gpu:u22-cu12.3-8gb
|
||||
```
|
||||
|
||||
The agent’s default container can be overridden via the UI:
|
||||
1. Clone the task
|
||||
1. Set the Docker in the cloned task's **Execution** tab > **Container** section
|
||||
|
||||

|
||||
|
||||
1. Enqueue the cloned task
|
||||
|
||||
The task will be executed in the container specified in the UI.
|
||||
|
||||
For more information, see [Docker Mode](clearml_agent_execution_env.md#docker-mode).
|
||||
|
||||
#### Fractional GPU Containers on Kubernetes
|
||||
Fractional GPU containers can be used to limit the memory consumption of your Kubernetes Job/Pod, and have multiple
|
||||
containers share GPU devices without interfering with each other.
|
||||
|
||||
For example, the following configures a K8s pod to run using the `clearml/fractional-gpu:u22-cu12.3-8gb` container,
|
||||
which limits the pod to 8 GB of the GPU's memory:
|
||||
```
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: train-pod
|
||||
labels:
|
||||
app: trainme
|
||||
spec:
|
||||
hostPID: true
|
||||
containers:
|
||||
- name: train-container
|
||||
image: clearml/fractional-gpu:u22-cu12.3-8gb
|
||||
command: ['python3', '-c', 'print(f"Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}")']
|
||||
```
|
||||
|
||||
:::note
|
||||
`hostPID: true` is required to allow the driver to differentiate between the pod's processes and other host processes
|
||||
when limiting memory usage.
|
||||
:::
|
||||
|
||||
### Custom Container
|
||||
Build your own custom fractional GPU container by inheriting from one of ClearML's containers: In your Dockerfile, make
|
||||
sure to include `From <clearml_container_image>` so the container will inherit from the relevant container.
|
||||
|
||||
See example custom Dockerfiles in the [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/tree/main/examples).
|
||||
|
||||
## Kubernetes Static MIG Fractions
|
||||
Set up NVIDIA MIG (Multi-Instance GPU) support for Kubernetes to define GPU fraction profiles for specific workloads
|
||||
through your NVIDIA device plugin.
|
||||
|
||||
The ClearML Agent Helm chart lets you specify a pod template for each queue which describes the resources that the pod
|
||||
will use. The template should specify the requested GPU slices under `Containers.resources.limits` to have the pods use
|
||||
the defined resources. For example, the following configures a K8s pod to run a `3g.20gb` MIG device:
|
||||
```
|
||||
# tf-benchmarks-mixed.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: tf-benchmarks-mixed
|
||||
spec:
|
||||
restartPolicy: Never
|
||||
Containers:
|
||||
- name: tf-benchmarks-mixed
|
||||
image: ""
|
||||
command: []
|
||||
args: []
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/mig-3g.20gb: 1
|
||||
nodeSelector: #optional
|
||||
nvidia.com/gpu.product: A100-SXM4-40GB
|
||||
```
|
||||
|
||||
When tasks are added to the relevant queue, the agent pulls the task and creates a pod to execute it, using the
|
||||
specified GPU slice.
|
||||
|
||||
For example, the following configures tasks from the default queue to use `1g.5gb` MIG slices:
|
||||
```
|
||||
agentk8sglue:
|
||||
queue: default
|
||||
# …
|
||||
basePodTemplate:
|
||||
# …
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 1
|
||||
nodeSelector:
|
||||
nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
|
||||
```
|
||||
|
||||
## Dynamic GPU Fractions
|
||||
|
||||
:::important Enterprise Feature
|
||||
@ -175,6 +29,8 @@ simultaneously without worrying that one task will use all of the GPU's memory.
|
||||
You can dynamically slice GPUs on [bare metal](#bare-metal-deployment) or on [Kubernetes](#kubernetes-deployment), for
|
||||
both MIG-enabled and non-MIG devices.
|
||||
|
||||

|
||||
|
||||
### Bare Metal Deployment
|
||||
1. Install the required packages:
|
||||
|
||||
@ -211,7 +67,7 @@ the number of GPUs configured to the queue.
|
||||
Let’s say that four tasks are enqueued, one task for each of the above queues (`dual_gpus`, `quarter_gpu`, `half_gpu`,
|
||||
`single_gpu`). The agent will first pull the task from the `dual_gpus` queue since it is listed first, and will run it
|
||||
using 2 GPUs. It will next run the tasks from `quarter_gpu` and `half_gpu`--both will run on the remaining available
|
||||
GPU. This leaves the task in the `single_gpu` queue. Currently 2.75 GPUs out of the 3 are in use so the task will only
|
||||
GPU. This leaves the task in the `single_gpu` queue. Currently, 2.75 GPUs out of the 3 are in use so the task will only
|
||||
be pulled and run when enough GPUs become available.
|
||||
|
||||
### Kubernetes Deployment
|
||||
@ -356,3 +212,148 @@ Where `<gpu_fraction_value>` must be set to one of the following values:
|
||||
* "0.625"
|
||||
* "0.750"
|
||||
* "0.875"
|
||||
|
||||
## Container-based Memory Limits
|
||||
Use [`clearml-fractional-gpu`](https://github.com/allegroai/clearml-fractional-gpu)'s pre-packaged containers with
|
||||
built-in hard memory limitations. Workloads running in these containers will only be able to use up to the container's
|
||||
memory limit. Multiple isolated workloads can run on the same GPU without impacting each other.
|
||||
|
||||
### Usage
|
||||
|
||||
#### Manual Execution
|
||||
|
||||
1. Choose the container with the appropriate memory limit. ClearML supports CUDA 11.x and CUDA 12.x with memory limits
|
||||
ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/blob/main/README.md#-containers) for full list).
|
||||
1. Launch the container:
|
||||
|
||||
```bash
|
||||
docker run -it --gpus 0 --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash
|
||||
```
|
||||
|
||||
This example runs the ClearML Ubuntu 22 with CUDA 12.3 container on GPU 0, which is limited to use up to 8GB of its memory.
|
||||
:::note
|
||||
--pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
|
||||
:::
|
||||
1. Run the following command inside the container to verify that the fractional gpu memory limit is working correctly:
|
||||
```bash
|
||||
nvidia-smi
|
||||
```
|
||||
Here is the expected output for the previous, 8GB limited, example on an A100:
|
||||
```bash
|
||||
+---------------------------------------------------------------------------------------+
|
||||
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|
||||
|-----------------------------------------+----------------------+----------------------+
|
||||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
||||
| | | MIG M. |
|
||||
|=========================================+======================+======================|
|
||||
| 0 A100-PCIE-40GB Off | 00000000:01:00.0 Off | N/A |
|
||||
| 32% 33C P0 66W / 250W | 0MiB / 8128MiB | 3% Default |
|
||||
| | | Disabled |
|
||||
+-----------------------------------------+----------------------+----------------------+
|
||||
|
||||
+---------------------------------------------------------------------------------------+
|
||||
| Processes: |
|
||||
| GPU GI CI PID Type Process name GPU Memory |
|
||||
| ID ID Usage |
|
||||
|=======================================================================================|
|
||||
+---------------------------------------------------------------------------------------+
|
||||
```
|
||||
#### Remote Execution
|
||||
|
||||
You can set a ClearML Agent to execute tasks in a fractional GPU container. Set an agent’s default container via its
|
||||
command line. For example, all tasks pulled from the `default` queue by this agent will be executed in the Ubuntu 22
|
||||
with CUDA 12.3 container, which is limited to use up to 8GB of its memory:
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue default --docker clearml/fractional-gpu:u22-cu12.3-8gb
|
||||
```
|
||||
|
||||
The agent’s default container can be overridden via the UI:
|
||||
1. Clone the task
|
||||
1. Set the Docker in the cloned task's **Execution** tab > **Container** section
|
||||
|
||||

|
||||
|
||||
1. Enqueue the cloned task
|
||||
|
||||
The task will be executed in the container specified in the UI.
|
||||
|
||||
For more information, see [Docker Mode](clearml_agent_execution_env.md#docker-mode).
|
||||
|
||||
#### Fractional GPU Containers on Kubernetes
|
||||
Fractional GPU containers can be used to limit the memory consumption of your Kubernetes Job/Pod, and have multiple
|
||||
containers share GPU devices without interfering with each other.
|
||||
|
||||
For example, the following configures a K8s pod to run using the `clearml/fractional-gpu:u22-cu12.3-8gb` container,
|
||||
which limits the pod to 8 GB of the GPU's memory:
|
||||
```
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: train-pod
|
||||
labels:
|
||||
app: trainme
|
||||
spec:
|
||||
hostPID: true
|
||||
containers:
|
||||
- name: train-container
|
||||
image: clearml/fractional-gpu:u22-cu12.3-8gb
|
||||
command: ['python3', '-c', 'print(f"Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}")']
|
||||
```
|
||||
|
||||
:::note
|
||||
`hostPID: true` is required to allow the driver to differentiate between the pod's processes and other host processes
|
||||
when limiting memory usage.
|
||||
:::
|
||||
|
||||
### Custom Container
|
||||
Build your own custom fractional GPU container by inheriting from one of ClearML's containers: In your Dockerfile, make
|
||||
sure to include `From <clearml_container_image>` so the container will inherit from the relevant container.
|
||||
|
||||
See example custom Dockerfiles in the [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/tree/main/examples).
|
||||
|
||||
## Kubernetes Static MIG Fractions
|
||||
Set up NVIDIA MIG (Multi-Instance GPU) support for Kubernetes to define GPU fraction profiles for specific workloads
|
||||
through your NVIDIA device plugin.
|
||||
|
||||
The ClearML Agent Helm chart lets you specify a pod template for each queue which describes the resources that the pod
|
||||
will use. The template should specify the requested GPU slices under `Containers.resources.limits` to have the pods use
|
||||
the defined resources. For example, the following configures a K8s pod to run a `3g.20gb` MIG device:
|
||||
```
|
||||
# tf-benchmarks-mixed.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: tf-benchmarks-mixed
|
||||
spec:
|
||||
restartPolicy: Never
|
||||
Containers:
|
||||
- name: tf-benchmarks-mixed
|
||||
image: ""
|
||||
command: []
|
||||
args: []
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/mig-3g.20gb: 1
|
||||
nodeSelector: #optional
|
||||
nvidia.com/gpu.product: A100-SXM4-40GB
|
||||
```
|
||||
|
||||
When tasks are added to the relevant queue, the agent pulls the task and creates a pod to execute it, using the
|
||||
specified GPU slice.
|
||||
|
||||
For example, the following configures tasks from the default queue to use `1g.5gb` MIG slices:
|
||||
```
|
||||
agentk8sglue:
|
||||
queue: default
|
||||
# …
|
||||
basePodTemplate:
|
||||
# …
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 1
|
||||
nodeSelector:
|
||||
nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
|
||||
```
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user