mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Merge branch 'main' of https://github.com/allegroai/clearml-docs into sdk_uv
This commit is contained in:
@@ -1,292 +0,0 @@
|
||||
---
|
||||
title: Deployment
|
||||
---
|
||||
|
||||
## Spinning Up an Agent
|
||||
You can spin up an agent on any machine: on-prem and/or cloud instance. When spinning up an agent, you assign it to
|
||||
service a queue(s). Utilize the machine by enqueuing tasks to the queue that the agent is servicing, and the agent will
|
||||
pull and execute the tasks.
|
||||
|
||||
:::tip cross-platform execution
|
||||
ClearML Agent is platform agnostic. When using the ClearML Agent to execute experiments cross-platform, set platform
|
||||
specific environment variables before launching the agent.
|
||||
|
||||
For example, to run an agent on an ARM device, set the core type environment variable before spinning up the agent:
|
||||
|
||||
```bash
|
||||
export OPENBLAS_CORETYPE=ARMV8
|
||||
clearml-agent daemon --queue <queue_name>
|
||||
```
|
||||
:::
|
||||
|
||||
### Executing an Agent
|
||||
To execute an agent, listening to a queue, run:
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue <queue_name>
|
||||
```
|
||||
|
||||
### Executing in Background
|
||||
To execute an agent in the background, run:
|
||||
```bash
|
||||
clearml-agent daemon --queue <execution_queue_to_pull_from> --detached
|
||||
```
|
||||
### Stopping Agents
|
||||
To stop an agent running in the background, run:
|
||||
```bash
|
||||
clearml-agent daemon <arguments> --stop
|
||||
```
|
||||
|
||||
### Allocating Resources
|
||||
To specify GPUs associated with the agent, add the `--gpus` flag.
|
||||
|
||||
:::info Docker Mode
|
||||
Make sure to include the `--docker` flag, as GPU management through the agent is only supported in [Docker Mode](clearml_agent_execution_env.md#docker-mode).
|
||||
:::
|
||||
|
||||
To execute multiple agents on the same machine (usually assigning GPU for the different agents), run:
|
||||
```bash
|
||||
clearml-agent daemon --gpus 0 --queue default --docker
|
||||
clearml-agent daemon --gpus 1 --queue default --docker
|
||||
```
|
||||
To allocate more than one GPU, provide a list of allocated GPUs
|
||||
```bash
|
||||
clearml-agent daemon --gpus 0,1 --queue dual_gpu --docker
|
||||
```
|
||||
|
||||
### Queue Prioritization
|
||||
A single agent can listen to multiple queues. The priority is set by their order.
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue high_q low_q
|
||||
```
|
||||
This ensures the agent first tries to pull a Task from the `high_q` queue, and only if it is empty, the agent will try to pull
|
||||
from the `low_q` queue.
|
||||
|
||||
To make sure an agent pulls from all queues equally, add the `--order-fairness` flag.
|
||||
```bash
|
||||
clearml-agent daemon --queue group_a group_b --order-fairness
|
||||
```
|
||||
It will make sure the agent will pull from the `group_a` queue, then from `group_b`, then back to `group_a`, etc. This ensures
|
||||
that `group_a` or `group_b` will not be able to starve one another of resources.
|
||||
|
||||
### SSH Access
|
||||
By default, ClearML Agent maps the host's `~/.ssh` into the container's `/root/.ssh` directory (configurable,
|
||||
see [clearml.conf](../configs/clearml_conf.md#docker_internal_mounts)).
|
||||
|
||||
If you want to use existing auth sockets with ssh-agent, you can verify your host ssh-agent is working correctly with:
|
||||
|
||||
```commandline
|
||||
echo $SSH_AUTH_SOCK
|
||||
```
|
||||
|
||||
You should see a path to a temporary file, something like this:
|
||||
|
||||
```console
|
||||
/tmp/ssh-<random>/agent.<random>
|
||||
```
|
||||
|
||||
Then run your `clearml-agent` in Docker mode, which will automatically detect the `SSH_AUTH_SOCK` environment variable,
|
||||
and mount the socket into any container it spins.
|
||||
|
||||
You can also explicitly set the `SSH_AUTH_SOCK` environment variable when executing an agent. The command below will
|
||||
execute an agent in Docker mode and assign it to service a queue. The agent will have access to
|
||||
the SSH socket provided in the environment variable.
|
||||
|
||||
```
|
||||
SSH_AUTH_SOCK=<file_socket> clearml-agent daemon --gpus <your config> --queue <your queue name> --docker
|
||||
```
|
||||
|
||||
## Kubernetes
|
||||
|
||||
Agents can be deployed bare-metal or as Docker containers in a Kubernetes cluster. ClearML Agent adds missing scheduling capabilities to Kubernetes, enabling more flexible automation from code while leveraging all of ClearML Agent's features.
|
||||
|
||||
ClearML Agent is deployed onto a Kubernetes cluster using **Kubernetes-Glue**, which maps ClearML jobs directly to Kubernetes jobs. This allows seamless task execution and resource allocation across your cluster.
|
||||
|
||||
### Deployment Options
|
||||
You can deploy ClearML Agent onto Kubernetes using one of the following methods:
|
||||
|
||||
1. **ClearML Agent Helm Chart (Recommended)**:
|
||||
Use the [ClearML Agent Helm Chart](https://github.com/allegroai/clearml-helm-charts/tree/main/charts/clearml-agent) to spin up an agent pod acting as a controller. This is the recommended and scalable approach.
|
||||
|
||||
2. **K8s Glue Script**:
|
||||
Run a [K8s Glue script](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on a Kubernetes CPU node. This approach is less scalable and typically suited for simpler use cases.
|
||||
|
||||
### How It Works
|
||||
The ClearML Kubernetes-Glue performs the following:
|
||||
- Pulls jobs from the ClearML execution queue.
|
||||
- Prepares a Kubernetes job based on a provided YAML template.
|
||||
- Inside each job pod, the `clearml-agent`:
|
||||
- Installs the required environment for the task.
|
||||
- Executes and monitors the experiment process.
|
||||
|
||||
:::important Enterprise Features
|
||||
ClearML Enterprise adds advanced Kubernetes features:
|
||||
- **Multi-Queue Support**: Service multiple ClearML queues within the same Kubernetes cluster.
|
||||
- **Pod-Specific Templates**: Define resource configurations per queue using pod templates.
|
||||
|
||||
For example, you can configure resources for different queues as shown below:
|
||||
|
||||
```yaml
|
||||
agentk8sglue:
|
||||
queues:
|
||||
example_queue_1:
|
||||
templateOverrides:
|
||||
nodeSelector:
|
||||
nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 1
|
||||
example_queue_2:
|
||||
templateOverrides:
|
||||
nodeSelector:
|
||||
nvidia.com/gpu.product: A100-SXM4-40GB
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 2
|
||||
```
|
||||
:::
|
||||
|
||||
## Slurm
|
||||
|
||||
:::important Enterprise Feature
|
||||
Slurm Glue is available under the ClearML Enterprise plan.
|
||||
:::
|
||||
|
||||
Agents can be deployed bare-metal or inside [`Singularity`](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html)
|
||||
containers in Linux clusters managed with Slurm.
|
||||
|
||||
ClearML Agent Slurm Glue maps jobs to Slurm batch scripts: associate a ClearML queue to a batch script template, then
|
||||
when a Task is pushed into the queue, it will be converted and executed as an `sbatch` job according to the sbatch
|
||||
template specification attached to the queue.
|
||||
|
||||
1. Install the Slurm Glue on a machine where you can run `sbatch` / `squeue` etc.
|
||||
|
||||
```
|
||||
pip3 install -U --extra-index-url https://*****@*****.allegro.ai/repository/clearml_agent_slurm/simple clearml-agent-slurm
|
||||
```
|
||||
|
||||
1. Create a batch template. Make sure to set the `SBATCH` variables to the resources you want to attach to the queue.
|
||||
The script below sets up an agent to run bare-metal, creating a virtual environment per job. For example:
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
# available template variables (default value separator ":")
|
||||
# ${CLEARML_QUEUE_NAME}
|
||||
# ${CLEARML_QUEUE_ID}
|
||||
# ${CLEARML_WORKER_ID}.
|
||||
# complex template variables (default value separator ":")
|
||||
# ${CLEARML_TASK.id}
|
||||
# ${CLEARML_TASK.name}
|
||||
# ${CLEARML_TASK.project.id}
|
||||
# ${CLEARML_TASK.hyperparams.properties.user_key.value}
|
||||
|
||||
|
||||
# example
|
||||
#SBATCH --job-name=clearml_task_${CLEARML_TASK.id} # Job name DO NOT CHANGE
|
||||
#SBATCH --ntasks=1 # Run on a single CPU
|
||||
# #SBATCH --mem=1mb # Job memory request
|
||||
# #SBATCH --time=00:05:00 # Time limit hrs:min:sec
|
||||
#SBATCH --output=task-${CLEARML_TASK.id}-%j.log
|
||||
#SBATCH --partition debug
|
||||
#SBATCH --cpus-per-task=1
|
||||
#SBATCH --priority=5
|
||||
#SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1}
|
||||
|
||||
|
||||
${CLEARML_PRE_SETUP}
|
||||
|
||||
echo whoami $(whoami)
|
||||
|
||||
${CLEARML_AGENT_EXECUTE}
|
||||
|
||||
${CLEARML_POST_SETUP}
|
||||
```
|
||||
|
||||
Notice: If you are using Slurm with Singularity container support replace `${CLEARML_AGENT_EXECUTE}` in the batch
|
||||
template with `singularity exec ${CLEARML_AGENT_EXECUTE}`. For additional required settings, see [Slurm with Singularity](#slurm-with-singularity).
|
||||
|
||||
:::tip
|
||||
You can override the default values of a Slurm job template via the ClearML Web UI. The following command in the
|
||||
template sets the `nodes` value to be the ClearML Task’s `num_nodes` user property:
|
||||
```
|
||||
#SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1}
|
||||
```
|
||||
This user property can be modified in the UI, in the task's **CONFIGURATION > User Properties** section, and when the
|
||||
task is executed the new modified value will be used.
|
||||
:::
|
||||
|
||||
3. Launch the ClearML Agent Slurm Glue and assign the Slurm configuration to a ClearML queue. For example, the following
|
||||
associates the `default` queue to the `slurm.example.template` script, so any jobs pushed to this queue will use the
|
||||
resources set by that script.
|
||||
```
|
||||
clearml-agent-slurm --template-files slurm.example.template --queue default
|
||||
```
|
||||
|
||||
You can also pass multiple templates and queues. For example:
|
||||
```
|
||||
clearml-agent-slurm --template-files slurm.template1 slurm.template2 --queue queue1 queue2
|
||||
```
|
||||
|
||||
### Slurm with Singularity
|
||||
If you are running Slurm with Singularity containers support, set the following:
|
||||
|
||||
1. Make sure your `sbatch` template contains:
|
||||
```
|
||||
singularity exec ${CLEARML_AGENT_EXECUTE}
|
||||
```
|
||||
Additional singularity arguments can be added, for example:
|
||||
```
|
||||
singularity exec --uts ${CLEARML_AGENT_EXECUTE}`
|
||||
```
|
||||
1. Set the default Singularity container to use in your [clearml.conf](../configs/clearml_conf.md) file:
|
||||
```
|
||||
agent.default_docker.image="shub://repo/hello-world"
|
||||
```
|
||||
Or
|
||||
```
|
||||
agent.default_docker.image="docker://ubuntu"
|
||||
```
|
||||
|
||||
1. Add `--singularity-mode` to the command line, for example:
|
||||
```
|
||||
clearml-agent-slurm --singularity-mode --template-files slurm.example_singularity.template --queue default
|
||||
```
|
||||
|
||||
## Google Colab
|
||||
|
||||
ClearML Agent can run on a [Google Colab](https://colab.research.google.com/) instance. This helps users to leverage
|
||||
compute resources provided by Google Colab and send experiments for execution on it.
|
||||
|
||||
Check out [this tutorial](../guides/ide/google_colab.md) on how to run a ClearML Agent on Google Colab!
|
||||
|
||||
## Explicit Task Execution
|
||||
|
||||
ClearML Agent can also execute specific tasks directly, without listening to a queue.
|
||||
|
||||
### Execute a Task without Queue
|
||||
|
||||
Execute a Task with a `clearml-agent` worker without a queue.
|
||||
```bash
|
||||
clearml-agent execute --id <task-id>
|
||||
```
|
||||
### Clone a Task and Execute the Cloned Task
|
||||
|
||||
Clone the specified Task and execute the cloned Task with a `clearml-agent` worker without a queue.
|
||||
```bash
|
||||
clearml-agent execute --id <task-id> --clone
|
||||
```
|
||||
|
||||
### Execute Task inside a Docker
|
||||
|
||||
Execute a Task with a `clearml-agent` worker using a Docker container without a queue.
|
||||
```bash
|
||||
clearml-agent execute --id <task-id> --docker
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
Run a `clearml-agent` daemon in foreground mode, sending all output to the console.
|
||||
```bash
|
||||
clearml-agent daemon --queue default --foreground
|
||||
```
|
||||
136
docs/clearml_agent/clearml_agent_deployment_bare_metal.md
Normal file
136
docs/clearml_agent/clearml_agent_deployment_bare_metal.md
Normal file
@@ -0,0 +1,136 @@
|
||||
---
|
||||
title: Manual Deployment
|
||||
---
|
||||
|
||||
## Spinning Up an Agent
|
||||
You can spin up an agent on any machine: on-prem and/or cloud instance. When spinning up an agent, you assign it to
|
||||
service a queue(s). Utilize the machine by enqueuing tasks to the queue that the agent is servicing, and the agent will
|
||||
pull and execute the tasks.
|
||||
|
||||
:::tip cross-platform execution
|
||||
ClearML Agent is platform-agnostic. When using the ClearML Agent to execute tasks cross-platform, set platform
|
||||
specific environment variables before launching the agent.
|
||||
|
||||
For example, to run an agent on an ARM device, set the core type environment variable before spinning up the agent:
|
||||
|
||||
```bash
|
||||
export OPENBLAS_CORETYPE=ARMV8
|
||||
clearml-agent daemon --queue <queue_name>
|
||||
```
|
||||
:::
|
||||
|
||||
### Executing an Agent
|
||||
To execute an agent, listening to a queue, run:
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue <queue_name>
|
||||
```
|
||||
|
||||
### Executing in Background
|
||||
To execute an agent in the background, run:
|
||||
```bash
|
||||
clearml-agent daemon --queue <execution_queue_to_pull_from> --detached
|
||||
```
|
||||
### Stopping Agents
|
||||
To stop an agent running in the background, run:
|
||||
```bash
|
||||
clearml-agent daemon <arguments> --stop
|
||||
```
|
||||
|
||||
### Allocating Resources
|
||||
To specify GPUs associated with the agent, add the `--gpus` flag.
|
||||
|
||||
:::info Docker Mode
|
||||
Make sure to include the `--docker` flag, as GPU management through the agent is only supported in [Docker Mode](clearml_agent_execution_env.md#docker-mode).
|
||||
:::
|
||||
|
||||
To execute multiple agents on the same machine (usually assigning GPU for the different agents), run:
|
||||
```bash
|
||||
clearml-agent daemon --gpus 0 --queue default --docker
|
||||
clearml-agent daemon --gpus 1 --queue default --docker
|
||||
```
|
||||
To allocate more than one GPU, provide a list of allocated GPUs
|
||||
```bash
|
||||
clearml-agent daemon --gpus 0,1 --queue dual_gpu --docker
|
||||
```
|
||||
|
||||
### Queue Prioritization
|
||||
A single agent can listen to multiple queues. The priority is set by their order.
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue high_q low_q
|
||||
```
|
||||
This ensures the agent first tries to pull a Task from the `high_q` queue, and only if it is empty, the agent will try to pull
|
||||
from the `low_q` queue.
|
||||
|
||||
To make sure an agent pulls from all queues equally, add the `--order-fairness` flag.
|
||||
```bash
|
||||
clearml-agent daemon --queue group_a group_b --order-fairness
|
||||
```
|
||||
It will make sure the agent will pull from the `group_a` queue, then from `group_b`, then back to `group_a`, etc. This ensures
|
||||
that `group_a` or `group_b` will not be able to starve one another of resources.
|
||||
|
||||
### SSH Access
|
||||
By default, ClearML Agent maps the host's `~/.ssh` into the container's `/root/.ssh` directory (configurable,
|
||||
see [clearml.conf](../configs/clearml_conf.md#docker_internal_mounts)).
|
||||
|
||||
If you want to use existing auth sockets with ssh-agent, you can verify your host ssh-agent is working correctly with:
|
||||
|
||||
```commandline
|
||||
echo $SSH_AUTH_SOCK
|
||||
```
|
||||
|
||||
You should see a path to a temporary file, something like this:
|
||||
|
||||
```console
|
||||
/tmp/ssh-<random>/agent.<random>
|
||||
```
|
||||
|
||||
Then run your `clearml-agent` in Docker mode, which will automatically detect the `SSH_AUTH_SOCK` environment variable,
|
||||
and mount the socket into any container it spins.
|
||||
|
||||
You can also explicitly set the `SSH_AUTH_SOCK` environment variable when executing an agent. The command below will
|
||||
execute an agent in Docker mode and assign it to service a queue. The agent will have access to
|
||||
the SSH socket provided in the environment variable.
|
||||
|
||||
```
|
||||
SSH_AUTH_SOCK=<file_socket> clearml-agent daemon --gpus <your config> --queue <your queue name> --docker
|
||||
```
|
||||
|
||||
## Google Colab
|
||||
|
||||
ClearML Agent can run on a [Google Colab](https://colab.research.google.com/) instance. This helps users to leverage
|
||||
compute resources provided by Google Colab and send tasks for execution on it.
|
||||
|
||||
Check out [this tutorial](../guides/ide/google_colab.md) on how to run a ClearML Agent on Google Colab!
|
||||
|
||||
## Explicit Task Execution
|
||||
|
||||
ClearML Agent can also execute specific tasks directly, without listening to a queue.
|
||||
|
||||
### Execute a Task without Queue
|
||||
|
||||
Execute a Task with a `clearml-agent` worker without a queue.
|
||||
```bash
|
||||
clearml-agent execute --id <task-id>
|
||||
```
|
||||
### Clone a Task and Execute the Cloned Task
|
||||
|
||||
Clone the specified Task and execute the cloned Task with a `clearml-agent` worker without a queue.
|
||||
```bash
|
||||
clearml-agent execute --id <task-id> --clone
|
||||
```
|
||||
|
||||
### Execute Task inside a Docker
|
||||
|
||||
Execute a Task with a `clearml-agent` worker using a Docker container without a queue.
|
||||
```bash
|
||||
clearml-agent execute --id <task-id> --docker
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
Run a `clearml-agent` daemon in foreground mode, sending all output to the console.
|
||||
```bash
|
||||
clearml-agent daemon --queue default --foreground
|
||||
```
|
||||
51
docs/clearml_agent/clearml_agent_deployment_k8s.md
Normal file
51
docs/clearml_agent/clearml_agent_deployment_k8s.md
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
title: Kubernetes
|
||||
---
|
||||
|
||||
Agents can be deployed bare-metal or as Docker containers in a Kubernetes cluster. ClearML Agent adds missing scheduling capabilities to Kubernetes, enabling more flexible automation from code while leveraging all of ClearML Agent's features.
|
||||
|
||||
ClearML Agent is deployed onto a Kubernetes cluster using **Kubernetes-Glue**, which maps ClearML jobs directly to Kubernetes jobs. This allows seamless task execution and resource allocation across your cluster.
|
||||
|
||||
## Deployment Options
|
||||
You can deploy ClearML Agent onto Kubernetes using one of the following methods:
|
||||
|
||||
1. **ClearML Agent Helm Chart (Recommended)**:
|
||||
Use the [ClearML Agent Helm Chart](https://github.com/clearml/clearml-helm-charts/tree/main/charts/clearml-agent) to spin up an agent pod acting as a controller. This is the recommended and scalable approach.
|
||||
|
||||
2. **K8s Glue Script**:
|
||||
Run a [K8s Glue script](https://github.com/clearml/clearml-agent/blob/master/examples/k8s_glue_example.py) on a Kubernetes CPU node. This approach is less scalable and typically suited for simpler use cases.
|
||||
|
||||
## How It Works
|
||||
The ClearML Kubernetes-Glue performs the following:
|
||||
- Pulls jobs from the ClearML execution queue.
|
||||
- Prepares a Kubernetes job based on a provided YAML template.
|
||||
- Inside each job pod, the `clearml-agent`:
|
||||
- Installs the required environment for the task.
|
||||
- Executes and monitors the task process.
|
||||
|
||||
:::important Enterprise Features
|
||||
ClearML Enterprise adds advanced Kubernetes features:
|
||||
- **Multi-Queue Support**: Service multiple ClearML queues within the same Kubernetes cluster.
|
||||
- **Pod-Specific Templates**: Define resource configurations per queue using pod templates.
|
||||
|
||||
For example, you can configure resources for different queues as shown below:
|
||||
|
||||
```yaml
|
||||
agentk8sglue:
|
||||
queues:
|
||||
example_queue_1:
|
||||
templateOverrides:
|
||||
nodeSelector:
|
||||
nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 1
|
||||
example_queue_2:
|
||||
templateOverrides:
|
||||
nodeSelector:
|
||||
nvidia.com/gpu.product: A100-SXM4-40GB
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 2
|
||||
```
|
||||
:::
|
||||
107
docs/clearml_agent/clearml_agent_deployment_slurm.md
Normal file
107
docs/clearml_agent/clearml_agent_deployment_slurm.md
Normal file
@@ -0,0 +1,107 @@
|
||||
---
|
||||
title: Slurm
|
||||
---
|
||||
|
||||
:::important Enterprise Feature
|
||||
Slurm Glue is available under the ClearML Enterprise plan.
|
||||
:::
|
||||
|
||||
Agents can be deployed bare-metal or inside [`Singularity`](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html)
|
||||
containers in Linux clusters managed with Slurm.
|
||||
|
||||
ClearML Agent Slurm Glue maps jobs to Slurm batch scripts: associate a ClearML queue to a batch script template, then
|
||||
when a Task is pushed into the queue, it will be converted and executed as an `sbatch` job according to the sbatch
|
||||
template specification attached to the queue.
|
||||
|
||||
1. Install the Slurm Glue on a machine where you can run `sbatch` / `squeue` etc.
|
||||
|
||||
```
|
||||
pip3 install -U --extra-index-url https://*****@*****.allegro.ai/repository/clearml_agent_slurm/simple clearml-agent-slurm
|
||||
```
|
||||
|
||||
1. Create a batch template. Make sure to set the `SBATCH` variables to the resources you want to attach to the queue.
|
||||
The script below sets up an agent to run bare-metal, creating a virtual environment per job. For example:
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
# available template variables (default value separator ":")
|
||||
# ${CLEARML_QUEUE_NAME}
|
||||
# ${CLEARML_QUEUE_ID}
|
||||
# ${CLEARML_WORKER_ID}.
|
||||
# complex template variables (default value separator ":")
|
||||
# ${CLEARML_TASK.id}
|
||||
# ${CLEARML_TASK.name}
|
||||
# ${CLEARML_TASK.project.id}
|
||||
# ${CLEARML_TASK.hyperparams.properties.user_key.value}
|
||||
|
||||
|
||||
# example
|
||||
#SBATCH --job-name=clearml_task_${CLEARML_TASK.id} # Job name DO NOT CHANGE
|
||||
#SBATCH --ntasks=1 # Run on a single CPU
|
||||
# #SBATCH --mem=1mb # Job memory request
|
||||
# #SBATCH --time=00:05:00 # Time limit hrs:min:sec
|
||||
#SBATCH --output=task-${CLEARML_TASK.id}-%j.log
|
||||
#SBATCH --partition debug
|
||||
#SBATCH --cpus-per-task=1
|
||||
#SBATCH --priority=5
|
||||
#SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1}
|
||||
|
||||
|
||||
${CLEARML_PRE_SETUP}
|
||||
|
||||
echo whoami $(whoami)
|
||||
|
||||
${CLEARML_AGENT_EXECUTE}
|
||||
|
||||
${CLEARML_POST_SETUP}
|
||||
```
|
||||
|
||||
Notice: If you are using Slurm with Singularity container support replace `${CLEARML_AGENT_EXECUTE}` in the batch
|
||||
template with `singularity exec ${CLEARML_AGENT_EXECUTE}`. For additional required settings, see [Slurm with Singularity](#slurm-with-singularity).
|
||||
|
||||
:::tip
|
||||
You can override the default values of a Slurm job template via the ClearML Web UI. The following command in the
|
||||
template sets the `nodes` value to be the ClearML Task’s `num_nodes` user property:
|
||||
```
|
||||
#SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1}
|
||||
```
|
||||
This user property can be modified in the UI, in the task's **CONFIGURATION > User Properties** section, and when the
|
||||
task is executed the new modified value will be used.
|
||||
:::
|
||||
|
||||
3. Launch the ClearML Agent Slurm Glue and assign the Slurm configuration to a ClearML queue. For example, the following
|
||||
associates the `default` queue to the `slurm.example.template` script, so any jobs pushed to this queue will use the
|
||||
resources set by that script.
|
||||
```
|
||||
clearml-agent-slurm --template-files slurm.example.template --queue default
|
||||
```
|
||||
|
||||
You can also pass multiple templates and queues. For example:
|
||||
```
|
||||
clearml-agent-slurm --template-files slurm.template1 slurm.template2 --queue queue1 queue2
|
||||
```
|
||||
|
||||
## Slurm with Singularity
|
||||
If you are running Slurm with Singularity containers support, set the following:
|
||||
|
||||
1. Make sure your `sbatch` template contains:
|
||||
```
|
||||
singularity exec ${CLEARML_AGENT_EXECUTE}
|
||||
```
|
||||
Additional singularity arguments can be added, for example:
|
||||
```
|
||||
singularity exec --uts ${CLEARML_AGENT_EXECUTE}`
|
||||
```
|
||||
1. Set the default Singularity container to use in your [clearml.conf](../configs/clearml_conf.md) file:
|
||||
```
|
||||
agent.default_docker.image="shub://repo/hello-world"
|
||||
```
|
||||
Or
|
||||
```
|
||||
agent.default_docker.image="docker://ubuntu"
|
||||
```
|
||||
|
||||
1. Add `--singularity-mode` to the command line, for example:
|
||||
```
|
||||
clearml-agent-slurm --singularity-mode --template-files slurm.example_singularity.template --queue default
|
||||
```
|
||||
@@ -1,48 +0,0 @@
|
||||
---
|
||||
title: Building Docker Containers
|
||||
---
|
||||
|
||||
## Exporting a Task into a Standalone Docker Container
|
||||
|
||||
### Task Container
|
||||
|
||||
Build a Docker container that when launched executes a specific experiment, or a clone (copy) of that experiment.
|
||||
|
||||
- Build a Docker container that at launch will execute a specific Task:
|
||||
|
||||
```bash
|
||||
clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point reuse_task
|
||||
```
|
||||
|
||||
- Build a Docker container that at launch will clone a Task specified by Task ID, and will execute the newly cloned Task:
|
||||
|
||||
```bash
|
||||
clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point clone_task
|
||||
```
|
||||
|
||||
- Run built Docker by executing:
|
||||
|
||||
```bash
|
||||
docker run <new-docker-name>
|
||||
```
|
||||
|
||||
Check out [this tutorial](../guides/clearml_agent/executable_exp_containers.md) for building executable experiment
|
||||
containers.
|
||||
|
||||
### Base Docker Container
|
||||
|
||||
Build a Docker container according to the execution environment of a specific task.
|
||||
|
||||
```bash
|
||||
clearml-agent build --id <task-id> --docker --target <new-docker-name>
|
||||
```
|
||||
|
||||
You can add the Docker container as the base Docker image to a task (experiment), using one of the following methods:
|
||||
|
||||
- Using the **ClearML Web UI** - See [Base Docker image](../webapp/webapp_exp_tuning.md#base-docker-image) on the "Tuning
|
||||
Experiments" page.
|
||||
- In the ClearML configuration file - Use the ClearML configuration file [`agent.default_docker`](../configs/clearml_conf.md#agentdefault_docker)
|
||||
options.
|
||||
|
||||
Check out [this tutorial](../guides/clearml_agent/exp_environment_containers.md) for building a Docker container
|
||||
replicating the execution environment of an existing task.
|
||||
@@ -1,8 +1,9 @@
|
||||
---
|
||||
title: Dynamic GPU Allocation
|
||||
---
|
||||
|
||||
:::important Enterprise Feature
|
||||
This feature is available under the ClearML Enterprise plan.
|
||||
Dynamic GPU allocation is available under the ClearML Enterprise plan.
|
||||
:::
|
||||
|
||||
The ClearML Enterprise server supports dynamic allocation of GPUs based on queue properties.
|
||||
@@ -21,7 +22,7 @@ clearml-agent daemon --dynamic-gpus --gpus 0-2 --queue dual_gpus=2 single_gpu=1
|
||||
Make sure to include the `--docker` flag, as dynamic GPU allocation is only supported in [Docker Mode](clearml_agent_execution_env.md#docker-mode).
|
||||
:::
|
||||
|
||||
## Example
|
||||
#### Example
|
||||
|
||||
Let's say a server has three queues:
|
||||
* `dual_gpu`
|
||||
|
||||
@@ -2,18 +2,18 @@
|
||||
title: Environment Caching
|
||||
---
|
||||
|
||||
ClearML Agent caches virtual environments so when running experiments multiple times, there's no need to spend time reinstalling
|
||||
ClearML Agent caches virtual environments so when running tasks multiple times, there's no need to spend time reinstalling
|
||||
pre-installed packages. To make use of the cached virtual environments, enable the virtual environment reuse mechanism.
|
||||
|
||||
## Virtual Environment Reuse
|
||||
|
||||
The virtual environment reuse feature may reduce experiment startup time dramatically.
|
||||
The virtual environment reuse feature may reduce task startup time dramatically.
|
||||
|
||||
By default, ClearML uses the package manager's environment caching. This means that even if no
|
||||
new packages need to be installed, checking the list of packages can take a long time.
|
||||
|
||||
ClearML has a virtual environment reuse mechanism which, when enabled, allows using environments as-is without resolving
|
||||
installed packages. This means that when executing multiple experiments with the same package dependencies,
|
||||
installed packages. This means that when executing multiple tasks with the same package dependencies,
|
||||
the same environment will be used.
|
||||
|
||||
:::note
|
||||
|
||||
@@ -23,6 +23,7 @@ but can be overridden by command-line arguments.
|
||||
|**CLEARML_CUDNN_VERSION** | Sets the CUDNN version to be used |
|
||||
|**CLEARML_CPU_ONLY** | Force CPU only mode |
|
||||
|**CLEARML_DOCKER_SKIP_GPUS_FLAG** | Skips the GPUs flag (support for docker V18) |
|
||||
|**CLEARML_AGENT_DOCKER_ARGS_FILTERS**| Set a whitelist of allowed Docker arguments. Only arguments matching the specified patterns can be used when running a task. Use `shlex.split` whitespace-separated format. For example: `CLEARML_AGENT_DOCKER_ARGS_FILTERS="^--env$ ^-e$"`|
|
||||
|**CLEARML_AGENT_DOCKER_ARGS_HIDE_ENV** | Hide Docker environment variables containing secrets when printing out the Docker command. When printed, the variable values will be replaced by `********`. See [`agent.hide_docker_command_env_vars`](../configs/clearml_conf.md#hide_docker) |
|
||||
|**CLEARML_AGENT_DISABLE_SSH_MOUNT** | Disables the auto `.ssh` mount into the docker |
|
||||
|**CLEARML_AGENT_FORCE_CODE_DIR**| Allows overriding the remote execution code directory to bypass repository cloning and use a repo already available where the remote agent is running. |
|
||||
@@ -38,7 +39,7 @@ but can be overridden by command-line arguments.
|
||||
|**CLEARML_AGENT_EXTRA_DOCKER_ARGS** | Overrides extra docker args configuration |
|
||||
|**CLEARML_AGENT_EXTRA_DOCKER_LABELS** | List of labels to add to docker container. See [Docker documentation](https://docs.docker.com/config/labels-custom-metadata/). |
|
||||
|**CLEARML_EXTRA_PIP_INSTALL_FLAGS**| List of additional flags to use when the agent installs packages. For example: `CLEARML_EXTRA_PIP_INSTALL_FLAGS=--use-deprecated=legacy-resolver` for a single flag or `CLEARML_EXTRA_PIP_INSTALL_FLAGS="--use-deprecated=legacy-resolver --no-warn-conflicts"` for multiple flags|
|
||||
|**CLEARML_AGENT_EXTRA_PYTHON_PATH** | Sets extra python path |
|
||||
|**CLEARML_AGENT_EXTRA_PYTHON_PATH** | Sets extra Python path |
|
||||
|**CLEARML_AGENT_INITIAL_CONNECT_RETRY_OVERRIDE** | Overrides initial server connection behavior (true by default), allows explicit number to specify number of connect retries) |
|
||||
|**CLEARML_AGENT_NO_UPDATE** | Boolean. Set to `1` to skip agent update in the k8s pod container before the agent executes the task |
|
||||
|**CLEARML_AGENT_K8S_HOST_MOUNT / CLEARML_AGENT_DOCKER_HOST_MOUNT** | Specifies Agent's mount point for Docker / K8s |
|
||||
@@ -47,7 +48,7 @@ but can be overridden by command-line arguments.
|
||||
|**CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE**|Sets the PyTorch resolving mode. The options are: <ul><li>`none` - No resolving. Install PyTorch like any other package</li><li>`pip` (default) - Sets extra index based on cuda and lets pip resolve</li><li>`direct` - Resolve a direct link to the PyTorch wheel by parsing the pytorch.org pip repository, and matching the automatically detected cuda version with the required PyTorch wheel. If the exact cuda version is not found for the required PyTorch wheel, it will try a lower cuda version until a match is found</li></ul> |
|
||||
|**CLEARML_AGENT_DEBUG_INFO** | Provide additional debug information for a specific context (currently only the `docker` value is supported) |
|
||||
|**CLEARML_AGENT_CHILD_AGENTS_COUNT_CMD** | Provide an alternate bash command to list child agents while working in services mode |
|
||||
|**CLEARML_AGENT_SKIP_PIP_VENV_INSTALL** | Instead of creating a new virtual environment inheriting from the system packages, use an existing virtual environment and install missing packages directly to it. Specify the python binary of the existing virtual environment. For example: `CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/home/venv/bin/python` |
|
||||
|**CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL** | If set to `1`, the agent will not install any required python packages and will just use the preexisting python environment to run the task. |
|
||||
|**CLEARML_AGENT_SKIP_PIP_VENV_INSTALL** | Instead of creating a new virtual environment inheriting from the system packages, use an existing virtual environment and install missing packages directly to it. Specify the Python binary of the existing virtual environment. For example: `CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/home/venv/bin/python` |
|
||||
|**CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL** | If set to `1`, the agent will not install any required Python packages and will just use the preexisting Python environment to run the task. |
|
||||
|**CLEARML_AGENT_VENV_CACHE_PATH** | Overrides venv cache folder configuration |
|
||||
|**CLEARML_MULTI_NODE_SINGLE_TASK**| Control how multi-node resource monitoring is reported. The options are: <ul><li>`-1` - Only master node's (rank zero) console/resources are reported</li><li>`1` - Graph per node i.e. machine/GPU graph for every node (console output prefixed with RANK)</li><li>`2` - Series per node under a unified machine resource graph, graph per type of resource e.g. CPU/GPU utilization (console output prefixed with RANK)</li></ul>|
|
||||
|
||||
@@ -5,7 +5,7 @@ ClearML Agent has two primary execution modes: [Virtual Environment Mode](#virtu
|
||||
|
||||
## Virtual Environment Mode
|
||||
|
||||
In Virtual Environment Mode, the agent creates a virtual environment for the experiment, installs the required Python
|
||||
In Virtual Environment Mode, the agent creates a virtual environment for the task, installs the required Python
|
||||
packages based on the task specification, clones the code repository, applies the uncommitted changes and finally
|
||||
executes the code while monitoring it. This mode uses smart caching so packages and environments can be reused over
|
||||
multiple tasks (see [Virtual Environment Reuse](clearml_agent_env_caching.md#virtual-environment-reuse)).
|
||||
@@ -27,7 +27,7 @@ If you are using pyenv to control the environment where you use ClearML Agent, y
|
||||
* Install poetry with the deprecated `get-poetry.py` installer
|
||||
:::
|
||||
|
||||
## Docker Mode
|
||||
## Docker Mode
|
||||
:::note notes
|
||||
* Docker Mode is only supported in Linux.
|
||||
* Docker Mode requires docker service v19.03 or higher installed.
|
||||
@@ -43,9 +43,12 @@ ClearML Agent uses the provided default Docker container, which can be overridde
|
||||
|
||||
:::tip Setting Docker Container via UI
|
||||
You can set the docker container via the UI:
|
||||
1. Clone the experiment
|
||||
1. Clone the task
|
||||
2. Set the Docker in the cloned task's **Execution** tab **> Container** section
|
||||

|
||||
|
||||

|
||||

|
||||
|
||||
3. Enqueue the cloned task
|
||||
|
||||
The task will be executed in the container specified in the UI.
|
||||
|
||||
@@ -214,7 +214,7 @@ Where `<gpu_fraction_value>` must be set to one of the following values:
|
||||
* "0.875"
|
||||
|
||||
## Container-based Memory Limits
|
||||
Use [`clearml-fractional-gpu`](https://github.com/allegroai/clearml-fractional-gpu)'s pre-packaged containers with
|
||||
Use [`clearml-fractional-gpu`](https://github.com/clearml/clearml-fractional-gpu)'s pre-packaged containers with
|
||||
built-in hard memory limitations. Workloads running in these containers will only be able to use up to the container's
|
||||
memory limit. Multiple isolated workloads can run on the same GPU without impacting each other.
|
||||
|
||||
@@ -223,7 +223,7 @@ memory limit. Multiple isolated workloads can run on the same GPU without impact
|
||||
#### Manual Execution
|
||||
|
||||
1. Choose the container with the appropriate memory limit. ClearML supports CUDA 11.x and CUDA 12.x with memory limits
|
||||
ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/blob/main/README.md#-containers) for full list).
|
||||
ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://github.com/clearml/clearml-fractional-gpu/blob/main/README.md#-containers) for full list).
|
||||
1. Launch the container:
|
||||
|
||||
```bash
|
||||
@@ -232,7 +232,7 @@ ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://gith
|
||||
|
||||
This example runs the ClearML Ubuntu 22 with CUDA 12.3 container on GPU 0, which is limited to use up to 8GB of its memory.
|
||||
:::note
|
||||
--pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
|
||||
`--pid=host` is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
|
||||
:::
|
||||
1. Run the following command inside the container to verify that the fractional gpu memory limit is working correctly:
|
||||
```bash
|
||||
@@ -273,7 +273,8 @@ The agent’s default container can be overridden via the UI:
|
||||
1. Clone the task
|
||||
1. Set the Docker in the cloned task's **Execution** tab > **Container** section
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
1. Enqueue the cloned task
|
||||
|
||||
@@ -311,7 +312,7 @@ when limiting memory usage.
|
||||
Build your own custom fractional GPU container by inheriting from one of ClearML's containers: In your Dockerfile, make
|
||||
sure to include `From <clearml_container_image>` so the container will inherit from the relevant container.
|
||||
|
||||
See example custom Dockerfiles in the [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/tree/main/examples).
|
||||
See example custom Dockerfiles in the [clearml-fractional-gpu repository](https://github.com/clearml/clearml-fractional-gpu/tree/main/examples).
|
||||
|
||||
## Kubernetes Static MIG Fractions
|
||||
Set up NVIDIA MIG (Multi-Instance GPU) support for Kubernetes to define GPU fraction profiles for specific workloads
|
||||
|
||||
@@ -3,10 +3,10 @@ title: ClearML Agent CLI
|
||||
---
|
||||
|
||||
The following page provides a reference to `clearml-agent`'s CLI commands:
|
||||
* [build](#build) - Create a worker environment without executing an experiment.
|
||||
* [build](#build) - Create a worker environment without executing a task.
|
||||
* [config](#config) - List your ClearML Agent configuration data.
|
||||
* [daemon](#daemon) - Run a worker daemon listening to a queue for Tasks (experiments) to execute.
|
||||
* [execute](#execute) - Execute an experiment, locally without a queue.
|
||||
* [daemon](#daemon) - Run a worker daemon listening to a queue for tasks to execute.
|
||||
* [execute](#execute) - Execute a task, locally without a queue.
|
||||
* [list](#list) - List the current workers.
|
||||
|
||||
|
||||
@@ -32,6 +32,8 @@ clearml-agent build [-h] --id TASK_ID [--target TARGET]
|
||||
|
||||
### Parameters
|
||||
|
||||
<div className="tbl-cmd">
|
||||
|
||||
|Name | Description| Mandatory |
|
||||
|---|----|---|
|
||||
|`--id`| Build a worker environment for this Task ID.|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
@@ -49,6 +51,8 @@ clearml-agent build [-h] --id TASK_ID [--target TARGET]
|
||||
|`-O`| Compile optimized pyc code (see [python documentation](https://docs.python.org/3/using/cmdline.html#cmdoption-O)). Repeat for more optimization.|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|`--target`| The target folder for the virtual environment and source code that will be used at launch.|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|
||||
</div>
|
||||
|
||||
## config
|
||||
List your ClearML Agent configuration.
|
||||
|
||||
@@ -59,7 +63,7 @@ clearml-agent config [-h]
|
||||
## daemon
|
||||
|
||||
Use the `daemon` command to spin up an agent on any machine: on-prem and/or cloud instance. When spinning up an agent,
|
||||
assign it a queue(s) to service, and when experiments are added to its queues, the agent will pull and execute them.
|
||||
assign it a queue(s) to service, and when tasks are added to its queues, the agent will pull and execute them.
|
||||
|
||||
With the `daemon` command you can configure your agent's behavior: allocate resources, prioritize queues, set it to run
|
||||
in a Docker, and more.
|
||||
@@ -80,6 +84,8 @@ clearml-agent daemon [-h] [--foreground] [--queue QUEUES [QUEUES ...]] [--order-
|
||||
|
||||
### Parameters
|
||||
|
||||
<div className="tbl-cmd">
|
||||
|
||||
|Name | Description| Mandatory |
|
||||
|---|----|---|
|
||||
|`--child-report-tags`| List of tags to send with the status reports from the worker that executes a task.|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
@@ -106,6 +112,8 @@ clearml-agent daemon [-h] [--foreground] [--queue QUEUES [QUEUES ...]] [--order-
|
||||
|`--uptime`| Specify uptime for clearml-agent in `<hours> <days>` format. For example, use `17-20 TUE` to set Tuesday's uptime to 17-20. <br/><br/>NOTES:<ul><li>This feature is available under the ClearML Enterprise plan </li><li>Make sure to configure only `--uptime` or `--downtime`, but not both.</li></ul>|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|`--use-owner-token`| Generate and use the task owner's token for the execution of the task.|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|
||||
</div>
|
||||
|
||||
## execute
|
||||
|
||||
Use the `execute` command to set an agent to build and execute specific tasks directly without listening to a queue.
|
||||
@@ -123,6 +131,8 @@ clearml-agent execute [-h] --id TASK_ID [--log-file LOG_FILE] [--disable-monitor
|
||||
|
||||
### Parameters
|
||||
|
||||
<div className="tbl-cmd">
|
||||
|
||||
|Name | Description| Mandatory |
|
||||
|---|----|---|
|
||||
|`--id`| The ID of the Task to build|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
@@ -141,6 +151,8 @@ clearml-agent execute [-h] --id TASK_ID [--log-file LOG_FILE] [--disable-monitor
|
||||
|`--require-queue`| If the specified task is not queued, the execution will fail (used for 3rd party scheduler integration, e.g. K8s, SLURM, etc.)|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|`--standalone-mode`| Do not use any network connects, assume everything is pre-installed|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|
||||
</div>
|
||||
|
||||
## list
|
||||
|
||||
List information about all active workers.
|
||||
|
||||
@@ -1,120 +0,0 @@
|
||||
---
|
||||
title: Scheduling Working Hours
|
||||
---
|
||||
:::important Enterprise Feature
|
||||
This feature is available under the ClearML Enterprise plan.
|
||||
:::
|
||||
|
||||
The Agent scheduler enables scheduling working hours for each Agent. During working hours, a worker will actively poll
|
||||
queues for Tasks, fetch and execute them. Outside working hours, a worker will be idle.
|
||||
|
||||
Schedule workers by:
|
||||
|
||||
* Setting configuration file options
|
||||
* Running `clearml-agent` from the command line (overrides configuration file options)
|
||||
|
||||
Override worker schedules by:
|
||||
|
||||
* Setting runtime properties to force a worker on or off
|
||||
* Tagging a queue on or off
|
||||
|
||||
## Running clearml-agent with a Schedule (Command Line)
|
||||
|
||||
Set a schedule for a worker from the command line when running `clearml-agent`. Two properties enable setting working hours:
|
||||
|
||||
:::warning
|
||||
Use only one of these properties
|
||||
:::
|
||||
|
||||
* `uptime` - Time span during which a worker will actively poll a queue(s) for Tasks, and execute them. Outside this
|
||||
time span, the worker will be idle.
|
||||
* `downtime` - Time span during which a worker will be idle. Outside this time span, the worker will actively poll and
|
||||
execute Tasks.
|
||||
|
||||
Define `uptime` or `downtime` as `"<hours> <days>"`, where:
|
||||
|
||||
* `<hours>` - A span of hours (`00-23`) or a single hour. A single hour defines a span from that hour to midnight.
|
||||
* `<days>` - A span of days (`SUN-SAT`) or a single day.
|
||||
|
||||
Use `-` for a span, and `,` to separate individual values. To span before midnight to after midnight, use two spans.
|
||||
|
||||
For example:
|
||||
|
||||
* `"20-23 SUN"` - 8 PM to 11 PM on Sundays.
|
||||
* `"20-23 SUN,TUE"` - 8 PM to 11 PM on Sundays and Tuesdays.
|
||||
* `"20-23 SUN-TUE"` - 8 PM to 11 PM on Sundays, Mondays, and Tuesdays.
|
||||
* `"20 SUN"` - 8 PM to midnight on Sundays.
|
||||
* `"20-00,00-08 SUN"` - 8 PM to midnight and midnight to 8 AM on Sundays
|
||||
* `"20-00 SUN", "00-08 MON"` - 8 PM on Sundays to 8 AM on Mondays (spans from before midnight to after midnight).
|
||||
|
||||
## Setting Worker Schedules in the Configuration File
|
||||
|
||||
Set a schedule for a worker using configuration file options. The options are:
|
||||
|
||||
:::warning
|
||||
Use only one of these properties
|
||||
:::
|
||||
|
||||
* ``agent.uptime``
|
||||
* ``agent.downtime``
|
||||
|
||||
Use the same time span format for days and hours as is used in the command line.
|
||||
|
||||
For example, set a worker's schedule from 5 PM to 8 PM on Sunday through Tuesday, and 1 PM to 10 PM on Wednesday.
|
||||
|
||||
```
|
||||
agent.uptime: ["17-20 SUN-TUE", "13-22 WED"]
|
||||
```
|
||||
|
||||
## Overriding Worker Schedules Using Runtime Properties
|
||||
|
||||
Runtime properties override the command line uptime / downtime properties. The runtime properties are:
|
||||
|
||||
:::warning
|
||||
Use only one of these properties
|
||||
:::
|
||||
|
||||
* `force:on` - Pull and execute Tasks until the property expires.
|
||||
* `force:off` - Prevent pulling and execution of Tasks until the property expires.
|
||||
|
||||
Currently, these runtime properties can only be set using an ClearML REST API call to the `workers.set_runtime_properties`
|
||||
endpoint, as follows:
|
||||
|
||||
* The body of the request must contain the `worker-id`, and the runtime property to add.
|
||||
* An expiry date is optional. Use the format `"expiry":<time>`. For example, `"expiry":86400` will set an expiry of 24 hours.
|
||||
* To delete the property, set the expiry date to zero, `"expiry":0`.
|
||||
|
||||
For example, to force a worker on for 24 hours:
|
||||
|
||||
```
|
||||
curl --user <key>:<secret> --header "Content-Type: application/json" --data '{"worker":"<worker_id>","runtime_properties":[{"key": "force", "value": "on", "expiry": 86400}]}' http://<api-server-hostname-or-ip>:8008/workers.set_runtime_properties
|
||||
```
|
||||
|
||||
## Overriding Worker Schedules Using Queue Tags
|
||||
|
||||
Queue tags override command line and runtime properties. The queue tags are the following:
|
||||
|
||||
:::warning
|
||||
Use only one of these properties
|
||||
:::
|
||||
|
||||
* ``force_workers:on`` - Any worker listening to the queue will keep pulling Tasks from the queue.
|
||||
* ``force_workers:off`` - Prevent all workers listening to the queue from pulling Tasks from the queue.
|
||||
|
||||
Currently, you can set queue tags using an ClearML REST API call to the ``queues.update`` endpoint, or the
|
||||
APIClient. The body of the call must contain the ``queue-id`` and the tags to add.
|
||||
|
||||
For example, force workers on for a queue using the APIClient:
|
||||
|
||||
```python
|
||||
from clearml.backend_api.session.client import APIClient
|
||||
|
||||
client = APIClient()
|
||||
client.queues.update(queue="<queue_id>", tags=["force_workers:on"])
|
||||
```
|
||||
|
||||
Or, force workers on for a queue using the REST API:
|
||||
|
||||
```bash
|
||||
curl --user <key>:<secret> --header "Content-Type: application/json" --data '{"queue":"<queue_id>","tags":["force_workers:on"]}' http://<api-server-hostname-or-ip>:8008/queues.update
|
||||
```
|
||||
@@ -9,7 +9,7 @@ If ClearML was previously configured, follow [this](#adding-clearml-agent-to-a-c
|
||||
ClearML Agent specific configurations
|
||||
:::
|
||||
|
||||
To install ClearML Agent, execute
|
||||
To install [ClearML Agent](../clearml_agent.md), execute
|
||||
```bash
|
||||
pip install clearml-agent
|
||||
```
|
||||
@@ -27,7 +27,7 @@ it can't do that when running from a virtual environment.
|
||||
clearml-agent init
|
||||
```
|
||||
|
||||
The setup wizard prompts for ClearML credentials (see [here](../webapp/settings/webapp_settings_profile.md#clearml-credentials) about obtaining credentials).
|
||||
The setup wizard prompts for ClearML credentials (see [here](../webapp/settings/webapp_settings_profile.md#clearml-api-credentials) about obtaining credentials).
|
||||
```
|
||||
Please create new clearml credentials through the settings page in your `clearml-server` web app,
|
||||
or create a free account at https://app.clear.ml/settings/webapp-configuration
|
||||
@@ -146,7 +146,7 @@ In case a `clearml.conf` file already exists, add a few ClearML Agent specific c
|
||||
worker_id: ""
|
||||
}
|
||||
```
|
||||
View a complete ClearML Agent configuration file sample including an `agent` section [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf).
|
||||
View a complete ClearML Agent configuration file sample including an `agent` section [here](https://github.com/clearml/clearml-agent/blob/master/docs/clearml.conf).
|
||||
|
||||
1. Save the configuration.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user