mirror of
https://github.com/clearml/clearml-docs
synced 2025-03-09 21:51:47 +00:00
Reorder fractional GPU page (#875)
This commit is contained in:
parent
3c31b09793
commit
ce6cfecc8a
@ -6,161 +6,15 @@ GPUs to them. In order to optimize your compute resource usage, you can partitio
|
|||||||
device run multiple isolated workloads on separate slices that will not impact each other, and will only use the
|
device run multiple isolated workloads on separate slices that will not impact each other, and will only use the
|
||||||
fraction of GPU memory allocated to them.
|
fraction of GPU memory allocated to them.
|
||||||
|
|
||||||
ClearML provides several GPU slicing options to optimize compute resource utilization:
|
ClearML provides several GPU slicing options to optimize compute resource utilization:
|
||||||
|
* [Dynamic GPU Slicing](#dynamic-gpu-fractions): On-demand GPU slicing per task for both MIG and non-MIG devices (**Available under the ClearML Enterprise plan**):
|
||||||
|
* [Bare Metal deployment](#bare-metal-deployment)
|
||||||
|
* [Kubernetes deployment](#kubernetes-deployment)
|
||||||
* [Container-based Memory Limits](#container-based-memory-limits): Use pre-packaged containers with built-in memory
|
* [Container-based Memory Limits](#container-based-memory-limits): Use pre-packaged containers with built-in memory
|
||||||
limits to run multiple containers on the same GPU (**Available as part of the ClearML open source offering**)
|
limits to run multiple containers on the same GPU (**Available as part of the ClearML open source offering**)
|
||||||
* [Kubernetes-based Static MIG Slicing](#kubernetes-static-mig-fractions): Set up Kubernetes support for NVIDIA MIG
|
* [Kubernetes-based Static MIG Slicing](#kubernetes-static-mig-fractions): Set up Kubernetes support for NVIDIA MIG
|
||||||
(Multi-Instance GPU) to define GPU fractions for specific workloads (**Available as part of the ClearML open source offering**)
|
(Multi-Instance GPU) to define GPU fractions for specific workloads (**Available as part of the ClearML open source offering**)
|
||||||
* Dynamic GPU Slicing: On-demand GPU slicing per task for both MIG and non-MIG devices (**Available under the ClearML Enterprise plan**):
|
|
||||||
* [Bare Metal deployment](#bare-metal-deployment)
|
|
||||||
* [Kubernetes deployment](#kubernetes-deployment)
|
|
||||||
|
|
||||||
## Container-based Memory Limits
|
|
||||||
Use [`clearml-fractional-gpu`](https://github.com/allegroai/clearml-fractional-gpu)'s pre-packaged containers with
|
|
||||||
built-in hard memory limitations. Workloads running in these containers will only be able to use up to the container's
|
|
||||||
memory limit. Multiple isolated workloads can run on the same GPU without impacting each other.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
### Usage
|
|
||||||
|
|
||||||
#### Manual Execution
|
|
||||||
|
|
||||||
1. Choose the container with the appropriate memory limit. ClearML supports CUDA 11.x and CUDA 12.x with memory limits
|
|
||||||
ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/blob/main/README.md#-containers) for full list).
|
|
||||||
1. Launch the container:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker run -it --gpus 0 --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash
|
|
||||||
```
|
|
||||||
|
|
||||||
This example runs the ClearML Ubuntu 22 with CUDA 12.3 container on GPU 0, which is limited to use up to 8GB of its memory.
|
|
||||||
:::note
|
|
||||||
--pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
|
|
||||||
:::
|
|
||||||
1. Run the following command inside the container to verify that the fractional gpu memory limit is working correctly:
|
|
||||||
```bash
|
|
||||||
nvidia-smi
|
|
||||||
```
|
|
||||||
Here is the expected output for the previous, 8GB limited, example on an A100:
|
|
||||||
```bash
|
|
||||||
+---------------------------------------------------------------------------------------+
|
|
||||||
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|
|
||||||
|-----------------------------------------+----------------------+----------------------+
|
|
||||||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
|
||||||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
|
||||||
| | | MIG M. |
|
|
||||||
|=========================================+======================+======================|
|
|
||||||
| 0 A100-PCIE-40GB Off | 00000000:01:00.0 Off | N/A |
|
|
||||||
| 32% 33C P0 66W / 250W | 0MiB / 8128MiB | 3% Default |
|
|
||||||
| | | Disabled |
|
|
||||||
+-----------------------------------------+----------------------+----------------------+
|
|
||||||
|
|
||||||
+---------------------------------------------------------------------------------------+
|
|
||||||
| Processes: |
|
|
||||||
| GPU GI CI PID Type Process name GPU Memory |
|
|
||||||
| ID ID Usage |
|
|
||||||
|=======================================================================================|
|
|
||||||
+---------------------------------------------------------------------------------------+
|
|
||||||
```
|
|
||||||
#### Remote Execution
|
|
||||||
|
|
||||||
You can set a ClearML Agent to execute tasks in a fractional GPU container. Set an agent’s default container via its
|
|
||||||
command line. For example, all tasks pulled from the `default` queue by this agent will be executed in the Ubuntu 22
|
|
||||||
with CUDA 12.3 container, which is limited to use up to 8GB of its memory:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
clearml-agent daemon --queue default --docker clearml/fractional-gpu:u22-cu12.3-8gb
|
|
||||||
```
|
|
||||||
|
|
||||||
The agent’s default container can be overridden via the UI:
|
|
||||||
1. Clone the task
|
|
||||||
1. Set the Docker in the cloned task's **Execution** tab > **Container** section
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
1. Enqueue the cloned task
|
|
||||||
|
|
||||||
The task will be executed in the container specified in the UI.
|
|
||||||
|
|
||||||
For more information, see [Docker Mode](clearml_agent_execution_env.md#docker-mode).
|
|
||||||
|
|
||||||
#### Fractional GPU Containers on Kubernetes
|
|
||||||
Fractional GPU containers can be used to limit the memory consumption of your Kubernetes Job/Pod, and have multiple
|
|
||||||
containers share GPU devices without interfering with each other.
|
|
||||||
|
|
||||||
For example, the following configures a K8s pod to run using the `clearml/fractional-gpu:u22-cu12.3-8gb` container,
|
|
||||||
which limits the pod to 8 GB of the GPU's memory:
|
|
||||||
```
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Pod
|
|
||||||
metadata:
|
|
||||||
name: train-pod
|
|
||||||
labels:
|
|
||||||
app: trainme
|
|
||||||
spec:
|
|
||||||
hostPID: true
|
|
||||||
containers:
|
|
||||||
- name: train-container
|
|
||||||
image: clearml/fractional-gpu:u22-cu12.3-8gb
|
|
||||||
command: ['python3', '-c', 'print(f"Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}")']
|
|
||||||
```
|
|
||||||
|
|
||||||
:::note
|
|
||||||
`hostPID: true` is required to allow the driver to differentiate between the pod's processes and other host processes
|
|
||||||
when limiting memory usage.
|
|
||||||
:::
|
|
||||||
|
|
||||||
### Custom Container
|
|
||||||
Build your own custom fractional GPU container by inheriting from one of ClearML's containers: In your Dockerfile, make
|
|
||||||
sure to include `From <clearml_container_image>` so the container will inherit from the relevant container.
|
|
||||||
|
|
||||||
See example custom Dockerfiles in the [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/tree/main/examples).
|
|
||||||
|
|
||||||
## Kubernetes Static MIG Fractions
|
|
||||||
Set up NVIDIA MIG (Multi-Instance GPU) support for Kubernetes to define GPU fraction profiles for specific workloads
|
|
||||||
through your NVIDIA device plugin.
|
|
||||||
|
|
||||||
The ClearML Agent Helm chart lets you specify a pod template for each queue which describes the resources that the pod
|
|
||||||
will use. The template should specify the requested GPU slices under `Containers.resources.limits` to have the pods use
|
|
||||||
the defined resources. For example, the following configures a K8s pod to run a `3g.20gb` MIG device:
|
|
||||||
```
|
|
||||||
# tf-benchmarks-mixed.yaml
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Pod
|
|
||||||
metadata:
|
|
||||||
name: tf-benchmarks-mixed
|
|
||||||
spec:
|
|
||||||
restartPolicy: Never
|
|
||||||
Containers:
|
|
||||||
- name: tf-benchmarks-mixed
|
|
||||||
image: ""
|
|
||||||
command: []
|
|
||||||
args: []
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
nvidia.com/mig-3g.20gb: 1
|
|
||||||
nodeSelector: #optional
|
|
||||||
nvidia.com/gpu.product: A100-SXM4-40GB
|
|
||||||
```
|
|
||||||
|
|
||||||
When tasks are added to the relevant queue, the agent pulls the task and creates a pod to execute it, using the
|
|
||||||
specified GPU slice.
|
|
||||||
|
|
||||||
For example, the following configures tasks from the default queue to use `1g.5gb` MIG slices:
|
|
||||||
```
|
|
||||||
agentk8sglue:
|
|
||||||
queue: default
|
|
||||||
# …
|
|
||||||
basePodTemplate:
|
|
||||||
# …
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
nvidia.com/gpu: 1
|
|
||||||
nodeSelector:
|
|
||||||
nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
|
|
||||||
```
|
|
||||||
|
|
||||||
## Dynamic GPU Fractions
|
## Dynamic GPU Fractions
|
||||||
|
|
||||||
:::important Enterprise Feature
|
:::important Enterprise Feature
|
||||||
@ -175,6 +29,8 @@ simultaneously without worrying that one task will use all of the GPU's memory.
|
|||||||
You can dynamically slice GPUs on [bare metal](#bare-metal-deployment) or on [Kubernetes](#kubernetes-deployment), for
|
You can dynamically slice GPUs on [bare metal](#bare-metal-deployment) or on [Kubernetes](#kubernetes-deployment), for
|
||||||
both MIG-enabled and non-MIG devices.
|
both MIG-enabled and non-MIG devices.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
### Bare Metal Deployment
|
### Bare Metal Deployment
|
||||||
1. Install the required packages:
|
1. Install the required packages:
|
||||||
|
|
||||||
@ -211,7 +67,7 @@ the number of GPUs configured to the queue.
|
|||||||
Let’s say that four tasks are enqueued, one task for each of the above queues (`dual_gpus`, `quarter_gpu`, `half_gpu`,
|
Let’s say that four tasks are enqueued, one task for each of the above queues (`dual_gpus`, `quarter_gpu`, `half_gpu`,
|
||||||
`single_gpu`). The agent will first pull the task from the `dual_gpus` queue since it is listed first, and will run it
|
`single_gpu`). The agent will first pull the task from the `dual_gpus` queue since it is listed first, and will run it
|
||||||
using 2 GPUs. It will next run the tasks from `quarter_gpu` and `half_gpu`--both will run on the remaining available
|
using 2 GPUs. It will next run the tasks from `quarter_gpu` and `half_gpu`--both will run on the remaining available
|
||||||
GPU. This leaves the task in the `single_gpu` queue. Currently 2.75 GPUs out of the 3 are in use so the task will only
|
GPU. This leaves the task in the `single_gpu` queue. Currently, 2.75 GPUs out of the 3 are in use so the task will only
|
||||||
be pulled and run when enough GPUs become available.
|
be pulled and run when enough GPUs become available.
|
||||||
|
|
||||||
### Kubernetes Deployment
|
### Kubernetes Deployment
|
||||||
@ -356,3 +212,148 @@ Where `<gpu_fraction_value>` must be set to one of the following values:
|
|||||||
* "0.625"
|
* "0.625"
|
||||||
* "0.750"
|
* "0.750"
|
||||||
* "0.875"
|
* "0.875"
|
||||||
|
|
||||||
|
## Container-based Memory Limits
|
||||||
|
Use [`clearml-fractional-gpu`](https://github.com/allegroai/clearml-fractional-gpu)'s pre-packaged containers with
|
||||||
|
built-in hard memory limitations. Workloads running in these containers will only be able to use up to the container's
|
||||||
|
memory limit. Multiple isolated workloads can run on the same GPU without impacting each other.
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
|
||||||
|
#### Manual Execution
|
||||||
|
|
||||||
|
1. Choose the container with the appropriate memory limit. ClearML supports CUDA 11.x and CUDA 12.x with memory limits
|
||||||
|
ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/blob/main/README.md#-containers) for full list).
|
||||||
|
1. Launch the container:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run -it --gpus 0 --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash
|
||||||
|
```
|
||||||
|
|
||||||
|
This example runs the ClearML Ubuntu 22 with CUDA 12.3 container on GPU 0, which is limited to use up to 8GB of its memory.
|
||||||
|
:::note
|
||||||
|
--pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
|
||||||
|
:::
|
||||||
|
1. Run the following command inside the container to verify that the fractional gpu memory limit is working correctly:
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
Here is the expected output for the previous, 8GB limited, example on an A100:
|
||||||
|
```bash
|
||||||
|
+---------------------------------------------------------------------------------------+
|
||||||
|
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|
||||||
|
|-----------------------------------------+----------------------+----------------------+
|
||||||
|
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||||
|
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
||||||
|
| | | MIG M. |
|
||||||
|
|=========================================+======================+======================|
|
||||||
|
| 0 A100-PCIE-40GB Off | 00000000:01:00.0 Off | N/A |
|
||||||
|
| 32% 33C P0 66W / 250W | 0MiB / 8128MiB | 3% Default |
|
||||||
|
| | | Disabled |
|
||||||
|
+-----------------------------------------+----------------------+----------------------+
|
||||||
|
|
||||||
|
+---------------------------------------------------------------------------------------+
|
||||||
|
| Processes: |
|
||||||
|
| GPU GI CI PID Type Process name GPU Memory |
|
||||||
|
| ID ID Usage |
|
||||||
|
|=======================================================================================|
|
||||||
|
+---------------------------------------------------------------------------------------+
|
||||||
|
```
|
||||||
|
#### Remote Execution
|
||||||
|
|
||||||
|
You can set a ClearML Agent to execute tasks in a fractional GPU container. Set an agent’s default container via its
|
||||||
|
command line. For example, all tasks pulled from the `default` queue by this agent will be executed in the Ubuntu 22
|
||||||
|
with CUDA 12.3 container, which is limited to use up to 8GB of its memory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clearml-agent daemon --queue default --docker clearml/fractional-gpu:u22-cu12.3-8gb
|
||||||
|
```
|
||||||
|
|
||||||
|
The agent’s default container can be overridden via the UI:
|
||||||
|
1. Clone the task
|
||||||
|
1. Set the Docker in the cloned task's **Execution** tab > **Container** section
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
1. Enqueue the cloned task
|
||||||
|
|
||||||
|
The task will be executed in the container specified in the UI.
|
||||||
|
|
||||||
|
For more information, see [Docker Mode](clearml_agent_execution_env.md#docker-mode).
|
||||||
|
|
||||||
|
#### Fractional GPU Containers on Kubernetes
|
||||||
|
Fractional GPU containers can be used to limit the memory consumption of your Kubernetes Job/Pod, and have multiple
|
||||||
|
containers share GPU devices without interfering with each other.
|
||||||
|
|
||||||
|
For example, the following configures a K8s pod to run using the `clearml/fractional-gpu:u22-cu12.3-8gb` container,
|
||||||
|
which limits the pod to 8 GB of the GPU's memory:
|
||||||
|
```
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
metadata:
|
||||||
|
name: train-pod
|
||||||
|
labels:
|
||||||
|
app: trainme
|
||||||
|
spec:
|
||||||
|
hostPID: true
|
||||||
|
containers:
|
||||||
|
- name: train-container
|
||||||
|
image: clearml/fractional-gpu:u22-cu12.3-8gb
|
||||||
|
command: ['python3', '-c', 'print(f"Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}")']
|
||||||
|
```
|
||||||
|
|
||||||
|
:::note
|
||||||
|
`hostPID: true` is required to allow the driver to differentiate between the pod's processes and other host processes
|
||||||
|
when limiting memory usage.
|
||||||
|
:::
|
||||||
|
|
||||||
|
### Custom Container
|
||||||
|
Build your own custom fractional GPU container by inheriting from one of ClearML's containers: In your Dockerfile, make
|
||||||
|
sure to include `From <clearml_container_image>` so the container will inherit from the relevant container.
|
||||||
|
|
||||||
|
See example custom Dockerfiles in the [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/tree/main/examples).
|
||||||
|
|
||||||
|
## Kubernetes Static MIG Fractions
|
||||||
|
Set up NVIDIA MIG (Multi-Instance GPU) support for Kubernetes to define GPU fraction profiles for specific workloads
|
||||||
|
through your NVIDIA device plugin.
|
||||||
|
|
||||||
|
The ClearML Agent Helm chart lets you specify a pod template for each queue which describes the resources that the pod
|
||||||
|
will use. The template should specify the requested GPU slices under `Containers.resources.limits` to have the pods use
|
||||||
|
the defined resources. For example, the following configures a K8s pod to run a `3g.20gb` MIG device:
|
||||||
|
```
|
||||||
|
# tf-benchmarks-mixed.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
metadata:
|
||||||
|
name: tf-benchmarks-mixed
|
||||||
|
spec:
|
||||||
|
restartPolicy: Never
|
||||||
|
Containers:
|
||||||
|
- name: tf-benchmarks-mixed
|
||||||
|
image: ""
|
||||||
|
command: []
|
||||||
|
args: []
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
nvidia.com/mig-3g.20gb: 1
|
||||||
|
nodeSelector: #optional
|
||||||
|
nvidia.com/gpu.product: A100-SXM4-40GB
|
||||||
|
```
|
||||||
|
|
||||||
|
When tasks are added to the relevant queue, the agent pulls the task and creates a pod to execute it, using the
|
||||||
|
specified GPU slice.
|
||||||
|
|
||||||
|
For example, the following configures tasks from the default queue to use `1g.5gb` MIG slices:
|
||||||
|
```
|
||||||
|
agentk8sglue:
|
||||||
|
queue: default
|
||||||
|
# …
|
||||||
|
basePodTemplate:
|
||||||
|
# …
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
nvidia.com/gpu: 1
|
||||||
|
nodeSelector:
|
||||||
|
nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
|
||||||
|
```
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user