mirror of
https://github.com/clearml/clearml-docs
synced 2025-01-31 14:37:18 +00:00
Add ClearML Agent Slurm Glue info (#736)
This commit is contained in:
parent
793b463f68
commit
036621a729
@ -327,6 +327,112 @@ There are two options for deploying the ClearML Agent to a Kubernetes cluster:
|
|||||||
|
|
||||||
For more details, see [Kubernetes integration](https://github.com/allegroai/clearml-agent#kubernetes-integration-optional).
|
For more details, see [Kubernetes integration](https://github.com/allegroai/clearml-agent#kubernetes-integration-optional).
|
||||||
|
|
||||||
|
### Slurm
|
||||||
|
|
||||||
|
:::important Enterprise Feature
|
||||||
|
Slurm Glue is available under the ClearML Enterprise plan
|
||||||
|
:::
|
||||||
|
|
||||||
|
Agents can be deployed bare-metal or inside [`Singularity`](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html)
|
||||||
|
containers in linux clusters managed with Slurm.
|
||||||
|
|
||||||
|
ClearML Agent Slurm Glue maps jobs to Slurm batch scripts: associate a ClearML queue to a batch script template, then
|
||||||
|
when a Task is pushed into the queue, it will be converted and executed as an `sbatch` job according to the sbatch
|
||||||
|
template specification attached to the queue.
|
||||||
|
|
||||||
|
1. Install the Slurm Glue on a machine where you can run `sbatch` / `squeue` etc.
|
||||||
|
|
||||||
|
```
|
||||||
|
pip3 install -U --extra-index-url https://*****@*****.allegro.ai/repository/clearml_agent_slurm/simple clearml-agent-slurm
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Create a new batch template. Make sure to set the `SBATCH` variables to the resources you want to attach to the queue.
|
||||||
|
The script below sets up an agent to run bare-metal, creating a virtual environment per job. For example:
|
||||||
|
|
||||||
|
```
|
||||||
|
#!/bin/bash
|
||||||
|
# available template variables (default value separator ":")
|
||||||
|
# ${CLEARML_QUEUE_NAME}
|
||||||
|
# ${CLEARML_QUEUE_ID}
|
||||||
|
# ${CLEARML_WORKER_ID}.
|
||||||
|
# complex template variables (default value separator ":")
|
||||||
|
# ${CLEARML_TASK.id}
|
||||||
|
# ${CLEARML_TASK.name}
|
||||||
|
# ${CLEARML_TASK.project.id}
|
||||||
|
# ${CLEARML_TASK.hyperparams.properties.user_key.value}
|
||||||
|
|
||||||
|
|
||||||
|
# example
|
||||||
|
#SBATCH --job-name=clearml_task_${CLEARML_TASK.id} # Job name DO NOT CHANGE
|
||||||
|
#SBATCH --ntasks=1 # Run on a single CPU
|
||||||
|
# #SBATCH --mem=1mb # Job memory request
|
||||||
|
# #SBATCH --time=00:05:00 # Time limit hrs:min:sec
|
||||||
|
#SBATCH --output=task-${CLEARML_TASK.id}-%j.log
|
||||||
|
#SBATCH --partition debug
|
||||||
|
#SBATCH --cpus-per-task=1
|
||||||
|
#SBATCH --priority=5
|
||||||
|
#SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1}
|
||||||
|
|
||||||
|
|
||||||
|
${CLEARML_PRE_SETUP}
|
||||||
|
|
||||||
|
echo whoami $(whoami)
|
||||||
|
|
||||||
|
${CLEARML_AGENT_EXECUTE}
|
||||||
|
|
||||||
|
${CLEARML_POST_SETUP}
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice: If you are using Slurm with Singularity container support replace `${CLEARML_AGENT_EXECUTE}` in the batch
|
||||||
|
template with `singularity exec ${CLEARML_AGENT_EXECUTE}`. For additional required settings, see [Slurm with Singularity](#slurm-with-singularity).
|
||||||
|
|
||||||
|
:::tip
|
||||||
|
You can override the default values of a Slurm job template via the ClearML Web UI. The following command in the
|
||||||
|
template sets the `nodes` value to be the ClearML Task’s `num_nodes` user property:
|
||||||
|
```
|
||||||
|
#SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1}
|
||||||
|
```
|
||||||
|
This user property can be modified in the UI, in the task's **CONFIGURATION > User Properties** section, and when the
|
||||||
|
task is executed the new modified value will be used.
|
||||||
|
:::
|
||||||
|
|
||||||
|
3. Launch the ClearML Agent Slurm Glue and assign the Slurm configuration to a ClearML queue. For example, the following
|
||||||
|
associates the `default` queue to the `slurm.example.template` script, so any jobs pushed to this queue will use the
|
||||||
|
resources set by that script.
|
||||||
|
```
|
||||||
|
clearml-agent-slurm --template-files slurm.example.template --queue default
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also pass multiple templates and queues. For example:
|
||||||
|
```
|
||||||
|
clearml-agent-slurm --template-files slurm.template1 slurm.template2 --queue queue1 queue2
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Slurm with Singularity
|
||||||
|
If you are running Slurm with Singularity containers support, set the following:
|
||||||
|
|
||||||
|
1. Make sure your `sbatch` template contains:
|
||||||
|
```
|
||||||
|
singularity exec ${CLEARML_AGENT_EXECUTE}
|
||||||
|
```
|
||||||
|
Additional singularity arguments can be added, for example:
|
||||||
|
```
|
||||||
|
singularity exec --uts ${CLEARML_AGENT_EXECUTE}`
|
||||||
|
```
|
||||||
|
1. Set the default Singularity container to use in your [clearml.conf](configs/clearml_conf.md) file:
|
||||||
|
```
|
||||||
|
agent.default_docker.image="shub://repo/hello-world"
|
||||||
|
```
|
||||||
|
Or
|
||||||
|
```
|
||||||
|
agent.default_docker.image="docker://ubuntu"
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Add `--singularity-mode` to the command line, for example:
|
||||||
|
```
|
||||||
|
clearml-agent-slurm --container-mode --template-files slurm.example_singularity.template --queue default
|
||||||
|
```
|
||||||
|
|
||||||
### Explicit Task Execution
|
### Explicit Task Execution
|
||||||
|
|
||||||
ClearML Agent can also execute specific tasks directly, without listening to a queue.
|
ClearML Agent can also execute specific tasks directly, without listening to a queue.
|
||||||
|
Loading…
Reference in New Issue
Block a user