clearml-docs/docs/clearml_agent/clearml_agent_dynamic_gpus.md
2024-07-15 15:53:41 +03:00

1.7 KiB

title
Dynamic GPU Allocation

:::important Enterprise Feature This feature is available under the ClearML Enterprise plan :::

The ClearML Enterprise server supports dynamic allocation of GPUs based on queue properties. Agents can spin multiple Tasks from different queues based on the number of GPUs the queue needs.

dynamic-gpus enables dynamic allocation of GPUs based on queue properties. To configure the number of GPUs for a queue, use the --gpus flag to specify the active GPUs, and use the --queue flag to specify the queue name and number of GPUs:

clearml-agent daemon --dynamic-gpus --gpus 0-2 --queue dual_gpus=2 single_gpu=1

Example

Let's say a server has three queues:

  • dual_gpu
  • quad_gpu
  • opportunistic

An agent can be spun on multiple GPUs (for example: 8 GPUs, --gpus 0-7), and then attached to multiple queues that are configured to run with a certain amount of resources:

clearml-agent daemon --dynamic-gpus --gpus 0-7 --queue quad_gpu=4 dual_gpu=2 

The agent can now spin multiple Tasks from the different queues based on the number of GPUs configured to the queue. The agent will pick a Task from the quad_gpu queue, use GPUs 0-3 and spin it. Then it will pick a Task from the dual_gpu queue, look for available GPUs again and spin on GPUs 4-5.

Another option for allocating GPUs:

clearml-agent daemon --dynamic-gpus --gpus 0-7 --queue dual=2 opportunistic=1-4

Notice that a minimum and maximum value of GPUs is specified for the opportunistic queue. This means the agent will pull a Task from the opportunistic queue and allocate up to 4 GPUs based on availability (i.e. GPUs not currently being used by other agents).