Restructure ClearML Agent pages (#873)

2025-06-26 18:17:44 +00:00 · 2024-07-15 15:53:41 +03:00 · 2024-07-15 15:53:41 +03:00 · f6781628e0
commit f6781628e0
parent b31452f6a1
47 changed files with 1216 additions and 1200 deletions
--- a/docs/apps/clearml_session.md
+++ b/docs/apps/clearml_session.md
@ -24,7 +24,7 @@ VS Code remote sessions use ports 8878 and 8898 respectively.

 ## Prerequisites
 * `clearml` installed and configured. See [Getting Started](../getting_started/ds/ds_first_steps.md) for details.
-* At least one `clearml-agent` running on a remote host. See [installation](../clearml_agent.md#installation) for details.
+* At least one `clearml-agent` running on a remote host. See [installation](../clearml_agent/clearml_agent_setup.md#installation) for details.
 * An SSH client installed on your machine. To verify, open your terminal and execute `ssh`. If you did not receive an 
 error, you are good to go.

@ -142,7 +142,7 @@ sessions:
  maxServices: 20
 ```

-For more information, see [Kubernetes](../clearml_agent.md#kubernetes).
+For more information, see [Kubernetes](../clearml_agent/clearml_agent_deployment.md#kubernetes).


 ### Installing Requirements
--- a/docs/clearml_agent.md
+++ b/docs/clearml_agent.md
--- a/docs/clearml_agent/clearml_agent_deployment.md
+++ b/docs/clearml_agent/clearml_agent_deployment.md
@ -0,0 +1,271 @@
+---
+title: Deployment
+---
+
+## Spinning Up an Agent
+You can spin up an agent on any machine: on-prem and/or cloud instance. When spinning up an agent, you assign it to 
+service a queue(s). Utilize the machine by enqueuing tasks to the queue that the agent is servicing, and the agent will 
+pull and execute the tasks. 
+
+:::tip cross-platform execution
+ClearML Agent is platform agnostic. When using the ClearML Agent to execute experiments cross-platform, set platform 
+specific environment variables before launching the agent.
+
+For example, to run an agent on an ARM device, set the core type environment variable before spinning up the agent:
+
+```bash
+export OPENBLAS_CORETYPE=ARMV8
+clearml-agent daemon --queue <queue_name>
+```
+:::
+
+### Executing an Agent
+To execute an agent, listening to a queue, run:
+
+```bash
+clearml-agent daemon --queue <queue_name>
+```
+
+### Executing in Background
+To execute an agent in the background, run:
+```bash
+clearml-agent daemon --queue <execution_queue_to_pull_from> --detached
+```
+### Stopping Agents
+To stop an agent running in the background, run:
+```bash
+clearml-agent daemon <arguments> --stop
+```
+
+### Allocating Resources
+To specify GPUs associated with the agent, add the `--gpus` flag.
+To execute multiple agents on the same machine (usually assigning GPU for the different agents), run:
+```bash
+clearml-agent daemon --detached --queue default --gpus 0
+clearml-agent daemon --detached --queue default --gpus 1
+```
+To allocate more than one GPU, provide a list of allocated GPUs
+```bash
+clearml-agent daemon --gpus 0,1 --queue dual_gpu
+```
+
+### Queue Prioritization
+A single agent can listen to multiple queues. The priority is set by their order.
+
+```bash
+clearml-agent daemon --detached --queue high_q low_q --gpus 0
+```
+This ensures the agent first tries to pull a Task from the `high_q` queue, and only if it is empty, the agent will try to pull 
+from the `low_q` queue.
+
+To make sure an agent pulls from all queues equally, add the `--order-fairness` flag.
+```bash
+clearml-agent daemon --detached --queue group_a group_b --order-fairness  --gpus 0
+```
+It will make sure the agent will pull from the `group_a` queue, then from `group_b`, then back to `group_a`, etc. This ensures 
+that `group_a` or `group_b` will not be able to starve one another of resources.
+
+### SSH Access
+By default, ClearML Agent maps the host's `~/.ssh` into the container's `/root/.ssh` directory (configurable, 
+see [clearml.conf](../configs/clearml_conf.md#docker_internal_mounts)).
+
+If you want to use existing auth sockets with ssh-agent, you can verify your host ssh-agent is working correctly with:
+
+```commandline
+echo $SSH_AUTH_SOCK
+```
+
+You should see a path to a temporary file, something like this:
+
+```console
+/tmp/ssh-<random>/agent.<random>
+```
+
+Then run your `clearml-agent` in Docker mode, which will automatically detect the `SSH_AUTH_SOCK` environment variable, 
+and mount the socket into any container it spins. 
+
+You can also explicitly set the `SSH_AUTH_SOCK` environment variable when executing an agent. The command below will 
+execute an agent in Docker mode and assign it to service a queue. The agent will have access to 
+the SSH socket provided in the environment variable.
+
+```
+SSH_AUTH_SOCK=<file_socket> clearml-agent daemon --gpus <your config> --queue <your queue name>  --docker
+```
+
+## Kubernetes 
+Agents can be deployed bare-metal or as dockers in a Kubernetes cluster. ClearML Agent adds the missing scheduling 
+capabilities to Kubernetes, allows for more flexible automation from code, and gives access to all of ClearML Agent's 
+features.
+
+ClearML Agent is deployed onto a Kubernetes cluster through its Kubernetes-Glue which maps ClearML jobs directly to K8s 
+jobs:
+* Use the [ClearML Agent Helm Chart](https://github.com/allegroai/clearml-helm-charts/tree/main/charts/clearml-agent) to
+spin an agent pod acting as a controller. Alternatively (less recommended) run a [k8s glue script](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) 
+on a K8S cpu node
+* The ClearML K8S glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided yaml 
+template)
+* Inside each job pod the `clearml-agent` will install the ClearML task's environment and run and monitor the experiment's 
+process
+
+:::important Enterprise Feature
+The ClearML Enterprise plan supports K8S servicing multiple ClearML queues, as well as providing a pod template for each 
+queue for describing the resources for each pod to use.
+
+For example, the following configures which resources to use for `example_queue_1` and `example_queue_2`:
+
+```yaml
+agentk8sglue:
+  queues:
+    example_queue_1:
+      templateOverrides:
+        resources:
+          limits:
+            nvidia.com/gpu: 1
+      nodeSelector:
+        nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
+    example_queue_2:
+      templateOverrides:
+        resources:
+          limits:
+            nvidia.com/gpu: 2
+      nodeSelector:
+        nvidia.com/gpu.product: A100-SXM4-40GB
+```
+:::
+
+## Slurm
+
+:::important Enterprise Feature
+Slurm Glue is available under the ClearML Enterprise plan
+:::
+
+Agents can be deployed bare-metal or inside [`Singularity`](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) 
+containers in linux clusters managed with Slurm. 
+
+ClearML Agent Slurm Glue maps jobs to Slurm batch scripts: associate a ClearML queue to a batch script template, then 
+when a Task is pushed into the queue, it will be converted and executed as an `sbatch` job according to the sbatch 
+template specification attached to the queue. 
+
+1. Install the Slurm Glue on a machine where you can run `sbatch` / `squeue` etc. 
+   
+   ```
+   pip3 install -U --extra-index-url https://*****@*****.allegro.ai/repository/clearml_agent_slurm/simple clearml-agent-slurm
+   ```
+
+1. Create a batch template. Make sure to set the `SBATCH` variables to the resources you want to attach to the queue. 
+   The script below sets up an agent to run bare-metal, creating a virtual environment per job. For example:
+
+   ```
+   #!/bin/bash
+   # available template variables (default value separator ":")
+   # ${CLEARML_QUEUE_NAME}
+   # ${CLEARML_QUEUE_ID}
+   # ${CLEARML_WORKER_ID}.
+   # complex template variables  (default value separator ":")
+   # ${CLEARML_TASK.id}
+   # ${CLEARML_TASK.name}
+   # ${CLEARML_TASK.project.id}
+   # ${CLEARML_TASK.hyperparams.properties.user_key.value}
+   
+   
+   # example
+   #SBATCH --job-name=clearml_task_${CLEARML_TASK.id}       # Job name DO NOT CHANGE
+   #SBATCH --ntasks=1                    # Run on a single CPU
+   # #SBATCH --mem=1mb                   # Job memory request
+   # #SBATCH --time=00:05:00             # Time limit hrs:min:sec
+   #SBATCH --output=task-${CLEARML_TASK.id}-%j.log
+   #SBATCH --partition debug
+   #SBATCH --cpus-per-task=1
+   #SBATCH --priority=5
+   #SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1}
+   
+   
+   ${CLEARML_PRE_SETUP}
+   
+   echo whoami $(whoami)
+   
+   ${CLEARML_AGENT_EXECUTE}
+   
+   ${CLEARML_POST_SETUP}
+   ```
+
+   Notice: If you are using Slurm with Singularity container support replace `${CLEARML_AGENT_EXECUTE}` in the batch 
+   template with `singularity exec ${CLEARML_AGENT_EXECUTE}`. For additional required settings, see [Slurm with Singularity](#slurm-with-singularity).
+
+   :::tip 
+   You can override the default values of a Slurm job template via the ClearML Web UI. The following command in the 
+   template sets the `nodes` value to be the ClearML Task’s `num_nodes` user property:  
+   ```
+   #SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1}
+   ```
+   This user property can be modified in the UI, in the task's **CONFIGURATION > User Properties** section, and when the 
+   task is executed the new modified value will be used. 
+   ::: 
+
+3. Launch the ClearML Agent Slurm Glue and assign the Slurm configuration to a ClearML queue. For example, the following 
+   associates the `default` queue to the `slurm.example.template` script, so any jobs pushed to this queue will use the 
+   resources set by that script.  
+   ```
+   clearml-agent-slurm --template-files slurm.example.template --queue default
+   ```
+   
+   You can also pass multiple templates and queues. For example:
+   ```
+   clearml-agent-slurm --template-files slurm.template1 slurm.template2 --queue queue1 queue2
+   ```
+
+### Slurm with Singularity
+If you are running Slurm with Singularity containers support, set the following:
+
+1. Make sure your `sbatch` template contains:
+   ```
+   singularity exec ${CLEARML_AGENT_EXECUTE}
+   ```
+   Additional singularity arguments can be added, for example: 
+   ```
+   singularity exec --uts ${CLEARML_AGENT_EXECUTE}`
+   ``` 
+1. Set the default Singularity container to use in your [clearml.conf](../configs/clearml_conf.md) file:
+   ```
+   agent.default_docker.image="shub://repo/hello-world"
+   ```
+   Or
+   ```
+   agent.default_docker.image="docker://ubuntu"
+   ```
+
+1. Add `--singularity-mode` to the command line, for example:
+   ```
+   clearml-agent-slurm --singularity-mode --template-files slurm.example_singularity.template --queue default
+   ```
+
+## Explicit Task Execution
+
+ClearML Agent can also execute specific tasks directly, without listening to a queue.
+
+### Execute a Task without Queue
+
+Execute a Task with a `clearml-agent` worker without a queue.
+```bash
+clearml-agent execute --id <task-id>
+```
+### Clone a Task and Execute the Cloned Task
+
+Clone the specified Task and execute the cloned Task with a `clearml-agent` worker without a queue.
+```bash
+clearml-agent execute --id <task-id> --clone
+```
+
+### Execute Task inside a Docker
+
+Execute a Task with a `clearml-agent` worker using a Docker container without a queue.
+```bash
+clearml-agent execute --id <task-id> --docker
+```
+
+## Debugging
+
+Run a `clearml-agent` daemon in foreground mode, sending all output to the console.
+```bash
+clearml-agent daemon --queue default --foreground
+```
--- a/docs/clearml_agent/clearml_agent_docker.md
+++ b/docs/clearml_agent/clearml_agent_docker.md
@ -0,0 +1,48 @@
+---
+title:  Building Docker Containers
+---
+
+## Exporting a Task into a Standalone Docker Container
+
+### Task Container
+
+Build a Docker container that when launched executes a specific experiment, or a clone (copy) of that experiment.
+
+- Build a Docker container that at launch will execute a specific Task:
+
+  ```bash
+  clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point reuse_task
+  ```
+
+- Build a Docker container that at launch will clone a Task specified by Task ID, and will execute the newly cloned Task:
+
+  ```bash
+  clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point clone_task
+  ```
+
+- Run built Docker by executing:
+
+  ```bash
+  docker run <new-docker-name>
+  ```
+
+Check out [this tutorial](../guides/clearml_agent/executable_exp_containers.md) for building executable experiment 
+containers.
+
+### Base Docker Container
+
+Build a Docker container according to the execution environment of a specific task.
+
+```bash
+clearml-agent build --id <task-id> --docker --target <new-docker-name>
+```
+
+You can add the Docker container as the base Docker image to a task (experiment), using one of the following methods:
+
+- Using the **ClearML Web UI** - See [Base Docker image](../webapp/webapp_exp_tuning.md#base-docker-image) on the "Tuning
+  Experiments" page.
+- In the ClearML configuration file - Use the ClearML configuration file [`agent.default_docker`](../configs/clearml_conf.md#agentdefault_docker)
+  options.
+
+Check out [this tutorial](../guides/clearml_agent/exp_environment_containers.md) for building a Docker container 
+replicating the execution environment of an existing task.
--- a/docs/clearml_agent/clearml_agent_dynamic_gpus.md
+++ b/docs/clearml_agent/clearml_agent_dynamic_gpus.md
@ -0,0 +1,46 @@
+---
+title: Dynamic GPU Allocation
+---
+:::important Enterprise Feature
+This feature is available under the ClearML Enterprise plan
+:::
+
+The ClearML Enterprise server supports dynamic allocation of GPUs based on queue properties.
+Agents can spin multiple Tasks from different queues based on the number of GPUs the queue
+needs.
+
+`dynamic-gpus` enables dynamic allocation of GPUs based on queue properties.
+To configure the number of GPUs for a queue, use the `--gpus` flag to specify the active GPUs, and use the `--queue` 
+flag to specify the queue name and number of GPUs:
+
+```console
+clearml-agent daemon --dynamic-gpus --gpus 0-2 --queue dual_gpus=2 single_gpu=1
+```
+
+## Example
+
+Let's say a server has three queues:
+* `dual_gpu`
+* `quad_gpu`
+* `opportunistic`
+
+An agent can be spun on multiple GPUs (for example: 8 GPUs, `--gpus 0-7`), and then attached to multiple
+queues that are configured to run with a certain amount of resources:
+
+```console
+clearml-agent daemon --dynamic-gpus --gpus 0-7 --queue quad_gpu=4 dual_gpu=2 
+``` 
+
+The agent can now spin multiple Tasks from the different queues based on the number of GPUs configured to the queue.
+The agent will pick a Task from the `quad_gpu` queue, use GPUs 0-3 and spin it. Then it will pick a Task from the `dual_gpu`
+queue, look for available GPUs again and spin on GPUs 4-5.
+
+Another option for allocating GPUs:
+
+```console
+clearml-agent daemon --dynamic-gpus --gpus 0-7 --queue dual=2 opportunistic=1-4
+``` 
+
+Notice that a minimum and maximum value of GPUs is specified for the `opportunistic` queue. This means the agent
+will pull a Task from the `opportunistic` queue and allocate up to 4 GPUs based on availability (i.e. GPUs not currently
+being used by other agents).
--- a/docs/clearml_agent/clearml_agent_env_caching.md
+++ b/docs/clearml_agent/clearml_agent_env_caching.md
@ -0,0 +1,33 @@
+---
+title: Environment Caching
+---
+
+ClearML Agent caches virtual environments so when running experiments multiple times, there's no need to spend time reinstalling 
+pre-installed packages. To make use of the cached virtual environments, enable the virtual environment reuse mechanism. 
+
+## Virtual Environment Reuse
+
+The virtual environment reuse feature may reduce experiment startup time dramatically.
+
+By default, ClearML uses the package manager's environment caching. This means that even if no 
+new packages need to be installed, checking the list of packages can take a long time.
+
+ClearML has a virtual environment reuse mechanism which, when enabled, allows using environments as-is without resolving 
+installed packages. This means that when executing multiple experiments with the same package dependencies, 
+the same environment will be used.
+
+:::note
+ClearML does not support environment reuse when using Poetry package manager
+:::
+
+To enable environment reuse, modify the `clearml.conf` file and unmark the `venvs_cache` section.
+```
+venvs_cache: {
+        # maximum number of cached venvs
+        max_entries: 10
+        # minimum required free space to allow for cache entry, disable by passing 0 or negative value
+        free_space_threshold_gb: 2.0
+        # unmark to enable virtual environment caching
+        # path: ~/.clearml/venvs-cache
+    },
+```
--- a/docs/clearml_agent/clearml_agent_env_var.md
+++ b/docs/clearml_agent/clearml_agent_env_var.md
@ -6,7 +6,7 @@ This page lists the available environment variables for configuring ClearML Agen

 In addition to the environment variables listed below, ClearML also supports **dynamic environment variables** to override 
 any configuration option that appears in the [`agent`](../configs/clearml_conf.md#agent-section) section of the `clearml.conf`. 
-For more information, see [Dynamic Environment Variables](../clearml_agent.md#dynamic-environment-variables).
+For more information, see [Dynamic Environment Variables](../clearml_agent/clearml_agent_setup.md#dynamic-environment-variables).

 :::info
 ClearML's environment variables override the [clearml.conf file](../configs/clearml_conf.md), SDK, and 
@ -16,7 +16,7 @@ but can be overridden by command-line arguments.

 |Name| Description                                                                                                                                                                                                                                     |
 |---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-|**CLEARML_DOCKER_IMAGE** | Sets the default docker image to use when running an agent in [Docker mode](../clearml_agent.md#docker-mode)                                                                                                                                                                                                              |
+|**CLEARML_DOCKER_IMAGE** | Sets the default docker image to use when running an agent in [Docker mode](../clearml_agent/clearml_agent_execution_env.md#docker-mode)                                                                                                                                                                                                              |
 |**CLEARML_WORKER_NAME** | Sets the Worker's name                                                                                                                                                                                                                          |
 |**CLEARML_WORKER_ID** | Sets the Worker ID                                                                                                                                                                                                                              |
 |**CLEARML_CUDA_VERSION** | Sets the CUDA version to be used                                                                                                                                                                                                                |
--- a/docs/clearml_agent/clearml_agent_execution_env.md
+++ b/docs/clearml_agent/clearml_agent_execution_env.md
@ -0,0 +1,70 @@
+---
+title: Execution Environments
+---
+ClearML Agent has two primary execution modes: [Virtual Environment Mode](#virtual-environment-mode) and [Docker Mode](#docker-mode). 
+
+## Virtual Environment Mode 
+
+In Virtual Environment Mode, the agent creates a virtual environment for the experiment, installs the required Python 
+packages based on the task specification, clones the code repository, applies the uncommitted changes and finally 
+executes the code while monitoring it. This mode uses smart caching so packages and environments can be reused over 
+multiple tasks (see [Virtual Environment Reuse](clearml_agent_env_caching.md#virtual-environment-reuse)). 
+
+ClearML Agent supports working with one of the following package managers: 
+* [`pip`](https://en.wikipedia.org/wiki/Pip_(package_manager)) (default)
+* [`conda`](https://docs.conda.io/en/latest/)
+* [`poetry`](https://python-poetry.org/)
+
+To change the package manager used by the agent, edit the [`package_manager.type`](../configs/clearml_conf.md#agentpackage_manager) 
+field in the of the `clearml.conf`. If extra channels are needed for `conda`, add the missing channels in the 
+`package_manager.conda_channels` field in the `clearml.conf`. 
+
+:::note Using Poetry with Pyenv
+Some versions of poetry (using `install-poetry.py`) do not respect `pyenv global`.  
+If you are using pyenv to control the environment where you use ClearML Agent, you can:
+  * Use poetry v1.2 and above (which fixes [this issue](https://github.com/python-poetry/poetry/issues/5077))
+  * Install poetry with the deprecated `get-poetry.py` installer
+:::
+
+## Docker Mode 
+:::note notes
+* Docker Mode is only supported in linux.
+* Docker Mode requires docker service v19.03 or higher installed.
+:::
+
+When executing the ClearML Agent in Docker mode, it will: 
+1. Run the provided Docker container 
+1. Install ClearML Agent in the container 
+1. Execute the Task in the container, and monitor the process. 
+   
+ClearML Agent uses the provided default Docker container, which can be overridden from the UI. 
+
+:::tip Setting Docker Container via UI
+You can set the docker container via the UI: 
+1. Clone the experiment
+2. Set the Docker in the cloned task's **Execution** tab **> Container** section
+   ![Container section](../img/webapp_exp_container.png)
+3. Enqueue the cloned task
+
+The task will be executed in the container specified in the UI.
+:::
+
+All ClearML Agent flags (such as `--gpus` and `--foreground`) are applicable to Docker mode as well. 
+
+* To execute ClearML Agent in Docker mode, run: 
+   ```bash
+   clearml-agent daemon --queue <execution_queue_to_pull_from> --docker [optional default docker image to use]
+   ```
+
+* To use the current `clearml-agent` version in the Docker container, instead of the latest `clearml-agent` version that is 
+automatically installed, pass the `--force-current-version` flag:
+   ```bash
+   clearml-agent daemon --queue default --docker --force-current-version
+   ```
+
+* For Kubernetes, specify a host mount on the daemon host. Do not use the host mount inside the Docker container.
+   Set the environment variable `CLEARML_AGENT_K8S_HOST_MOUNT`.
+   For example:
+   ```
+   CLEARML_AGENT_K8S_HOST_MOUNT=/mnt/host/data:/root/.clearml
+   ``` 
--- a/docs/clearml_agent/clearml_agent_fractional_gpus.md
+++ b/docs/clearml_agent/clearml_agent_fractional_gpus.md
@ -0,0 +1,358 @@
+---
+title: Fractional GPUs
+---
+Some tasks that you send for execution need a minimal amount of compute and memory, but you end up allocating entire 
+GPUs to them. In order to optimize your compute resource usage, you can partition GPUs into slices. You can have a GPU 
+device run multiple isolated workloads on separate slices that will not impact each other, and will only use the 
+fraction of GPU memory allocated to them. 
+
+ClearML provides several GPU slicing options to optimize compute resource utilization: 
+* [Container-based Memory Limits](#container-based-memory-limits): Use pre-packaged containers with built-in memory 
+limits to run multiple containers on the same GPU (**Available as part of the ClearML open source offering**)
+* [Kubernetes-based Static MIG Slicing](#kubernetes-static-mig-fractions): Set up Kubernetes support for NVIDIA MIG 
+(Multi-Instance GPU) to define GPU fractions for specific workloads (**Available as part of the ClearML open source offering**)
+* Dynamic GPU Slicing: On-demand GPU slicing per task for both MIG and non-MIG devices (**Available under the ClearML Enterprise plan**):
+  * [Bare Metal deployment](#bare-metal-deployment) 
+  * [Kubernetes deployment](#kubernetes-deployment)
+ 
+## Container-based Memory Limits
+Use [`clearml-fractional-gpu`](https://github.com/allegroai/clearml-fractional-gpu)'s pre-packaged containers with 
+built-in hard memory limitations. Workloads running in these containers will only be able to use up to the container's 
+memory limit. Multiple isolated workloads can run on the same GPU without impacting each other. 
+
+![Fractional GPU diagram](../img/fractional_gpu_diagram.png)
+
+### Usage 
+
+#### Manual Execution 
+
+1. Choose the container with the appropriate memory limit. ClearML supports CUDA 11.x and CUDA 12.x with memory limits 
+ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/blob/main/README.md#-containers) for full list).
+1. Launch the container:
+
+   ```bash
+   docker run -it --gpus 0 --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash 
+   ```
+   
+   This example runs the ClearML Ubuntu 22 with CUDA 12.3 container on GPU 0, which is limited to use up to 8GB of its memory.
+   :::note
+   --pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
+   :::
+1. Run the following command inside the container to verify that the fractional gpu memory limit is working correctly:
+   ```bash
+   nvidia-smi
+   ```
+   Here is the expected output for the previous, 8GB limited, example on an A100: 
+   ```bash
+   +---------------------------------------------------------------------------------------+
+   | NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
+   |-----------------------------------------+----------------------+----------------------+
+   | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
+   | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
+   |                                         |                      |               MIG M. |
+   |=========================================+======================+======================|
+   |   0  A100-PCIE-40GB                Off  | 00000000:01:00.0 Off |                  N/A |
+   | 32%   33C    P0              66W / 250W |      0MiB /  8128MiB |      3%      Default |
+   |                                         |                      |             Disabled |
+   +-----------------------------------------+----------------------+----------------------+
+                                                                                            
+   +---------------------------------------------------------------------------------------+
+   | Processes:                                                                            |
+   |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
+   |        ID   ID                                                             Usage      |
+   |=======================================================================================|
+   +---------------------------------------------------------------------------------------+
+   ```
+#### Remote Execution
+
+You can set a ClearML Agent to execute tasks in a fractional GPU container. Set an agent’s default container via its 
+command line. For example, all tasks pulled from the `default` queue by this agent will be executed in the Ubuntu 22 
+with CUDA 12.3 container, which is limited to use up to 8GB of its memory:
+
+```bash
+clearml-agent daemon --queue default --docker clearml/fractional-gpu:u22-cu12.3-8gb
+```
+
+The agent’s default container can be overridden via the UI: 
+1. Clone the task
+1. Set the Docker in the cloned task's **Execution** tab > **Container** section
+   
+   ![Task container](../img/fractional_gpu_task_container.png)
+
+1. Enqueue the cloned task
+
+The task will be executed in the container specified in the UI.
+
+For more information, see [Docker Mode](clearml_agent_execution_env.md#docker-mode).
+
+#### Fractional GPU Containers on Kubernetes
+Fractional GPU containers can be used to limit the memory consumption of your Kubernetes Job/Pod, and have multiple 
+containers share GPU devices without interfering with each other.
+
+For example, the following configures a K8s pod to run using the `clearml/fractional-gpu:u22-cu12.3-8gb` container, 
+which limits the pod to 8 GB of the GPU's memory:
+```
+apiVersion: v1
+kind: Pod
+metadata:
+  name: train-pod
+  labels:
+    app: trainme
+spec:
+  hostPID: true
+  containers:
+  - name: train-container
+    image: clearml/fractional-gpu:u22-cu12.3-8gb
+    command: ['python3', '-c', 'print(f"Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}")']
+```
+
+:::note
+`hostPID: true` is required to allow the driver to differentiate between the pod's processes and other host processes 
+when limiting memory usage.
+:::
+
+### Custom Container
+Build your own custom fractional GPU container by inheriting from one of ClearML's containers: In your Dockerfile, make 
+sure to include `From <clearml_container_image>` so the container will inherit from the relevant container.
+
+See example custom Dockerfiles in the [clearml-fractional-gpu repository](https://github.com/allegroai/clearml-fractional-gpu/tree/main/examples).
+
+## Kubernetes Static MIG Fractions
+Set up NVIDIA MIG (Multi-Instance GPU) support for Kubernetes to define GPU fraction profiles for specific workloads 
+through your NVIDIA device plugin.
+
+The ClearML Agent Helm chart lets you specify a pod template for each queue which describes the resources that the pod 
+will use. The template should specify the requested GPU slices under `Containers.resources.limits` to have the pods use 
+the defined resources. For example, the following configures a K8s pod to run a `3g.20gb` MIG device:
+```
+# tf-benchmarks-mixed.yaml
+apiVersion: v1
+kind: Pod
+metadata:
+ name: tf-benchmarks-mixed
+spec:
+ restartPolicy: Never
+ Containers:
+    - name: tf-benchmarks-mixed
+    image: ""
+     command: []
+     args: []
+     resources:
+       limits:
+         nvidia.com/mig-3g.20gb: 1
+ nodeSelector:  #optional
+   nvidia.com/gpu.product: A100-SXM4-40GB
+```
+
+When tasks are added to the relevant queue, the agent pulls the task and creates a pod to execute it, using the 
+specified GPU slice.
+
+For example, the following configures tasks from the default queue to use `1g.5gb` MIG slices:
+```
+agentk8sglue:
+ queue: default
+ # …
+ basePodTemplate:
+   # …
+   resources:
+     limits:
+       nvidia.com/gpu: 1
+ nodeSelector:
+   nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb
+```
+
+## Dynamic GPU Fractions
+
+:::important Enterprise Feature
+Dynamic GPU slicing is available under the ClearML Enterprise plan. 
+:::
+
+ClearML dynamic GPU fractions provide on-the-fly, per task GPU slicing, without having to set up containers or 
+pre-configure tasks with memory limits. Specify a GPU fraction for a queue in the agent invocation, and every task the 
+agent pulls from the queue will run on a container with the specified limit. This way you can safely run multiple tasks 
+simultaneously without worrying that one task will use all of the GPU's memory. 
+
+You can dynamically slice GPUs on [bare metal](#bare-metal-deployment) or on [Kubernetes](#kubernetes-deployment), for 
+both MIG-enabled and non-MIG devices.
+
+### Bare Metal Deployment
+1. Install the required packages:
+
+   ```bash
+   pip install clearml-agent clearml-agent-fractional-gpu
+   ```
+1. Start the ClearML agent with dynamic GPU allocation. Use `--gpus` to specify the active GPUs, and use the `--queue` 
+   flag to specify the queue name(s) and number (or fraction) of GPUs to allocate to them. 
+
+   ```
+   clearml-agent daemon --dynamic-gpus --gpus 0, 1 --queue half_gpu=0.5
+   ```
+
+The agent can utilize 2 GPUs (GPUs 0 and 1). Every task enqueued to the `half_gpu` queue will be run by the agent and 
+only allocated 50% GPU memory (i.e. 4 tasks can run concurrently). 
+
+:::note
+You can allocate GPUs for a queue’s tasks by specifying either a fraction of a single GPU in increments as small as 0.125 
+(e.g. 0.125, 0.25, 0.50, etc.) or whole GPUs (e.g. 1, 2, 4, etc.). However, you cannot specify fractions greater than 
+one GPU (e.g. 1.25).
+::: 
+
+You can set up multiple queues, each allocated a different number of GPUs per task. Note that the order that the queues 
+are listed is their order of priority, so the agent will service tasks from the first listed queue before servicing 
+subsequent queues:
+```
+clearml-agent daemon --dynamic-gpus --gpus 0-2 --queue dual_gpus=2 quarter_gpu=0.25 half_gpu=0.5 single_gpu=1 
+```
+
+This agent will utilize 3 GPUs (GPUs 0, 1, and 2). The agent can spin multiple jobs from the different queues based on 
+the number of GPUs configured to the queue. 
+
+#### Example Workflow
+Let’s say that four tasks are enqueued, one task for each of the above queues (`dual_gpus`, `quarter_gpu`, `half_gpu`, 
+`single_gpu`). The agent will first pull the task from the `dual_gpus` queue since it is listed first, and will run it 
+using 2 GPUs. It will next run the tasks from `quarter_gpu` and `half_gpu`--both will run on the remaining available 
+GPU. This leaves the task in the `single_gpu` queue. Currently 2.75 GPUs out of the 3 are in use so the task will only 
+be pulled and run when enough GPUs become available. 
+
+### Kubernetes Deployment
+
+ClearML supports fractional GPUs on Kubernetes through custom Enterprise Helm Charts for both MIG and non-MIG devices: 
+* `clearml-dynamic-mig-operator` for [MIG devices](#mig-enabled-gpus)
+* `clearml-fractional-gpu-injector` for [non-MIG devices](#non-mig-devices) 
+
+For either setup, you can set up in your Enterprise ClearML Agent Helm chart the resources requirements of tasks sent to 
+each queue. When a task is enqueued in ClearML, it translates into a Kubernetes pod running on the designated device 
+with the specified fractional resource as defined in the Agent Helm chart. 
+
+#### MIG-enabled GPUs 
+The **ClearML Dynamic MIG Operator** (CDMO) chart enables running AI workloads on K8s with optimized hardware utilization 
+and workload performance by facilitating MIG GPU partitioning. Make sure you have a [MIG capable GPU](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus).
+
+##### Prepare Cluster 
+* Install the [NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator):
+
+   ```
+   helm repo add nvidia https://helm.ngc.nvidia.com
+   helm repo update
+   
+   helm install -n gpu-operator \
+        gpu-operator \
+        nvidia/gpu-operator \
+        --create-namespace \
+        --set migManager.enabled=false \
+        --set mig.strategy=mixed
+   ```
+* Enable MIG support:
+    1. Enable dynamic MIG support on your cluster by running following command on all nodes used for training (run for each GPU ID in your cluster):
+  
+       ```
+       nvidia-smi -i <gpu_id> -mig 1
+       ```
+    1. Reboot node if required.
+    1. Add following label to all nodes that will be used for training:
+  
+       ```
+       kubectl label nodes <node-name> "cdmo.clear.ml/gpu-partitioning=mig"
+       ```  
+
+##### Configure ClearML Queues
+The ClearML Enterprise plan supports K8S servicing multiple ClearML queues, as well as providing a pod template for each 
+queue for describing the resources for each pod to use.
+
+In the `values.yaml` file, set the resource requirements of each ClearML queue. For example, the following configures 
+what resources to use for the `default025` and the `default050` queues: 
+```
+agentk8sglue:
+  queues:
+default025:
+      templateOverrides:
+        labels:
+          required-resources: "0.25"
+        resources:
+          limits:
+            nvidia.com/mig-1g.10gb: 1
+default050:
+      templateOverrides:
+        labels:
+          required-resources: "0.50"
+        resources:
+          limits:
+            nvidia.com/mig-1g.10gb: 1
+```
+
+#### Non-MIG Devices
+The **Fractional GPU Injector** chart enables running AI workloads on k8s in an optimized way, allowing you to use 
+fractional GPUs on non-MIG devices.
+
+##### Requirements
+Install the [Nvidia GPU Operator](https://github.com/NVIDIA/gpu-operator) through the Helm chart. Make sure `timeSlicing` 
+is enabled.
+
+For example:
+```
+devicePlugin:
+  config:
+    name: device-plugin-config
+    create: true
+    default: "any"
+    data:
+      any: |-
+        version: v1
+        flags:
+          migStrategy: none
+        sharing:
+          timeSlicing:
+            renameByDefault: false
+            failRequestsGreaterThanOne: false
+            resources:
+              - name: nvidia.com/gpu
+                replicas: 4
+```
+
+The number of replicas is the maximum number of slices on a GPU.
+
+##### Configure ClearML Queues
+In the `values.yaml` file, set the resource requirements of each ClearML queue. When a task is enqueued to the queue, 
+it translates into a Kubernetes pod running on the designated device with the specified resource slice. The queues must 
+be configured with specific labels and annotations. For example, the following configures the `default0500` queue to use 
+50% of a GPU and the `default0250` queue to use 25% of a GPU:
+```
+agentk8sglue:
+  queues:
+    default0500:
+      templateOverrides:
+        labels:
+          required-resources: "0.5"
+          clearml-injector/fraction: "0.500"
+        resources:
+          limits:
+            nvidia.com/gpu: 1
+            clear.ml/fraction-1: "0.5"
+      queueSettings:
+        maxPods: 10
+    default0250:
+      templateOverrides:
+        labels:
+          required-resources: "0.25"
+          clearml-injector/fraction: "0.250"
+        resources:
+          limits:
+            nvidia.com/gpu: 1
+            clear.ml/fraction-1: "0.25"
+      queueSettings:
+        maxPods: 10
+```
+If a pod has a label matching the pattern `clearml-injector/fraction: "<gpu_fraction_value>"`, the injector will 
+configure that pod to utilize the specified fraction of the GPU:
+```
+labels:
+  clearml-injector/fraction: "<gpu_fraction_value>"
+```
+Where `<gpu_fraction_value>` must be set to one of the following values:
+* "0.125"
+* "0.250"
+* "0.375"
+* "0.500"
+* "0.625"
+* "0.750"
+* "0.875"
--- a/docs/clearml_agent/clearml_agent_google_colab.md
+++ b/docs/clearml_agent/clearml_agent_google_colab.md
@ -0,0 +1,8 @@
+---
+title: Google Colab
+---
+
+ClearML Agent can run on a [Google Colab](https://colab.research.google.com/) instance. This helps users to leverage 
+compute resources provided by Google Colab and send experiments for execution on it. 
+
+Check out [this tutorial](../guides/ide/google_colab.md) on how to run a ClearML Agent on Google Colab!
--- a/docs/clearml_agent/clearml_agent_scheduling.md
+++ b/docs/clearml_agent/clearml_agent_scheduling.md
@ -0,0 +1,120 @@
+---
+title: Scheduling Working Hours
+---
+:::important Enterprise Feature
+This feature is available under the ClearML Enterprise plan
+:::
+
+The Agent scheduler enables scheduling working hours for each Agent. During working hours, a worker will actively poll 
+queues for Tasks, fetch and execute them. Outside working hours, a worker will be idle.
+
+Schedule workers by:
+
+* Setting configuration file options
+* Running `clearml-agent` from the command line (overrides configuration file options)
+
+Override worker schedules by:
+
+* Setting runtime properties to force a worker on or off
+* Tagging a queue on or off
+
+## Running clearml-agent with a Schedule (Command Line)
+
+Set a schedule for a worker from the command line when running `clearml-agent`. Two properties enable setting working hours:
+
+:::warning
+Use only one of these properties
+:::
+
+* `uptime` - Time span during which a worker will actively poll a queue(s) for Tasks, and execute them. Outside this 
+  time span, the worker will be idle.
+* `downtime` - Time span during which a worker will be idle. Outside this time span, the worker will actively poll and 
+  execute Tasks.
+
+Define `uptime` or `downtime` as `"<hours> <days>"`, where:
+
+* `<hours>` - A span of hours (`00-23`) or a single hour. A single hour defines a span from that hour to midnight. 
+* `<days>` - A span of days (`SUN-SAT`) or a single day.
+
+Use `-` for a span, and `,` to separate individual values. To span before midnight to after midnight, use two spans.
+
+For example:
+
+* `"20-23 SUN"` - 8 PM to 11 PM on Sundays.
+* `"20-23 SUN,TUE"` - 8 PM to 11 PM on Sundays and Tuesdays.
+* `"20-23 SUN-TUE"` - 8 PM to 11 PM on Sundays, Mondays, and Tuesdays.
+* `"20 SUN"` - 8 PM to midnight on Sundays.
+* `"20-00,00-08 SUN"` - 8 PM to midnight and midnight to 8 AM on Sundays
+* `"20-00 SUN", "00-08 MON"` - 8 PM on Sundays to 8 AM on Mondays (spans from before midnight to after midnight).
+
+## Setting Worker Schedules in the Configuration File
+
+Set a schedule for a worker using configuration file options. The options are:
+
+:::warning
+Use only one of these properties
+:::
+
+* ``agent.uptime``
+* ``agent.downtime``
+
+Use the same time span format for days and hours as is used in the command line.
+
+For example, set a worker's schedule from 5 PM to 8 PM on Sunday through Tuesday, and 1 PM to 10 PM on Wednesday.
+
+```
+agent.uptime: ["17-20 SUN-TUE", "13-22 WED"]
+```
+
+## Overriding Worker Schedules Using Runtime Properties
+
+Runtime properties override the command line uptime / downtime properties. The runtime properties are:
+
+:::warning
+Use only one of these properties
+:::
+
+* `force:on` - Pull and execute Tasks until the property expires.
+* `force:off` - Prevent pulling and execution of Tasks until the property expires.
+
+Currently, these runtime properties can only be set using an ClearML REST API call to the `workers.set_runtime_properties`
+endpoint, as follows: 
+
+* The body of the request must contain the `worker-id`, and the runtime property to add.
+* An expiry date is optional. Use the format `"expiry":<time>`. For example,  `"expiry":86400` will set an expiry of 24 hours.
+* To delete the property, set the expiry date to zero, `"expiry":0`.
+
+For example, to force a worker on for 24 hours:
+
+```
+curl --user <key>:<secret> --header "Content-Type: application/json" --data '{"worker":"<worker_id>","runtime_properties":[{"key": "force", "value": "on", "expiry": 86400}]}' http://<api-server-hostname-or-ip>:8008/workers.set_runtime_properties
+```
+
+## Overriding Worker Schedules Using Queue Tags
+
+Queue tags override command line and runtime properties. The queue tags are the following:
+
+:::warning
+Use only one of these properties
+:::
+
+* ``force_workers:on`` - Any worker listening to the queue will keep pulling Tasks from the queue.
+* ``force_workers:off`` - Prevent all workers listening to the queue from pulling Tasks from the queue.
+
+Currently, you can set queue tags using an ClearML REST API call to the ``queues.update`` endpoint, or the 
+APIClient. The body of the call must contain the ``queue-id`` and the tags to add.
+
+For example, force workers on for a queue using the APIClient:
+
+```python
+from clearml.backend_api.session.client import APIClient
+
+client = APIClient()
+client.queues.update(queue="<queue_id>", tags=["force_workers:on"])
+```
+
+Or, force workers on for a queue using the REST API:
+
+```bash
+curl --user <key>:<secret> --header "Content-Type: application/json" --data '{"queue":"<queue_id>","tags":["force_workers:on"]}' http://<api-server-hostname-or-ip>:8008/queues.update
+```
--- a/docs/clearml_agent/clearml_agent_services_mode.md
+++ b/docs/clearml_agent/clearml_agent_services_mode.md
@ -0,0 +1,38 @@
+---
+title: Services Mode
+---
+ClearML Agent supports a **Services Mode** where, as soon as a task is launched off of its queue, the agent moves on to the 
+next task without waiting for the previous one to complete. This mode is intended for running resource-sparse tasks that 
+are usually idling, such as periodic cleanup services or a [pipeline controller](../references/sdk/automation_controller_pipelinecontroller.md). 
+
+To run a `clearml-agent` in services mode, run:
+```bash
+clearml-agent daemon --services-mode --queue services --create-queue --docker <docker_name> --cpu-only
+```
+
+To limit the number of simultaneous tasks run in services mode, pass the maximum number immediately after the 
+`--services-mode` option (for example: `--services-mode 5`).
+
+:::note Notes
+* `services-mode` currently only supports Docker mode. Each service spins on its own Docker image.
+* The default `clearml-server` configuration already runs a single `clearml-agent` in services mode that listens to the 
+  `services` queue.
+:::
+
+Launch a service task like any other task, by enqueuing it to the appropriate queue.
+
+:::warning
+Do not enqueue training or inference tasks into the services queue. They will put an unnecessary load on the server.
+:::
+
+## Setting Server Credentials
+
+Self-hosted [ClearML Server](../deploying_clearml/clearml_server.md) comes by default with a services queue.
+By default, the server is open and does not require username and password, but it can be [password-protected](../deploying_clearml/clearml_server_security.md#user-access-security).
+In case it is password-protected, the services agent will need to be configured with server credentials (associated with a user).
+
+To do that, set these environment variables on the ClearML Server machine with the appropriate credentials:
+```
+CLEARML_API_ACCESS_KEY
+CLEARML_API_SECRET_KEY
+```
--- a/docs/clearml_agent/clearml_agent_setup.md
+++ b/docs/clearml_agent/clearml_agent_setup.md
@ -0,0 +1,163 @@
+---
+title: Setup
+---
+
+## Installation 
+
+:::note
+If ClearML was previously configured, follow [this](#adding-clearml-agent-to-a-configuration-file) to add 
+ClearML Agent specific configurations
+:::
+
+To install ClearML Agent, execute
+```bash
+pip install clearml-agent
+```
+
+:::info
+Install ClearML Agent as a system Python package and not in a Python virtual environment.
+An agent that runs in Virtual Environment Mode or Conda Environment Mode needs to create virtual environments, and
+it can't do that when running from a virtual environment.
+:::
+
+## Configuration
+
+1. In a terminal session, execute
+   ```bash
+   clearml-agent init
+   ```
+
+    The setup wizard prompts for ClearML credentials (see [here](../webapp/webapp_profile.md#clearml-credentials) about obtaining credentials).
+    ```
+    Please create new clearml credentials through the settings page in your `clearml-server` web app, 
+    or create a free account at https://app.clear.ml/settings/webapp-configuration
+    
+    In the settings > workspace page, press "Create new credentials", then press "Copy to clipboard".
+
+    Paste copied configuration here:    
+    ```
+    
+    If the setup wizard's response indicates that a configuration file already exists, follow the instructions [here](#adding-clearml-agent-to-a-configuration-file). 
+   The wizard does not edit or overwrite existing configuration files.
+
+1. At the command prompt `Paste copied configuration here:`, copy and paste the ClearML credentials and press **Enter**. 
+   The setup wizard confirms the credentials. 
+        
+   ```
+   Detected credentials key="********************" secret="*******"
+   ```
+        
+1. **Enter** to accept the default server URL, which is detected from the credentials or enter a ClearML web server URL.
+
+   A secure protocol, https, must be used. **Do not use http.**
+    
+   ```
+   WEB Host configured to: [https://app.clear.ml] 
+   ```
+        
+   :::note
+   If you are using a self-hosted ClearML Server, the default URL will use your domain.        
+   :::
+   
+1. Do as above for API, URL, and file servers.
+
+1. The wizard responds with your configuration:
+   ```
+   CLEARML Hosts configuration:
+   Web App: https://app.clear.ml
+   API: https://api.clear.ml
+   File Store: https://files.clear.ml
+        
+   Verifying credentials ...
+   Credentials verified!
+   ```
+
+1. Enter your Git username and password. Leave blank for SSH key authentication or when only using public repositories.
+   
+   This is needed for cloning repositories by the agent.
+   ```
+   Enter git username for repository cloning (leave blank for SSH key authentication): []
+   Enter password for user '<username>':
+   ```     
+   The setup wizard confirms your git credentials.
+   ``` 
+   Git repository cloning will be using user=<username> password=<password>        
+   ```
+1. Enter an additional artifact repository, or press **Enter** if not required.
+   
+   This is needed for installing Python packages not found in pypi. 
+
+   ```
+   Enter additional artifact repository (extra-index-url) to use when installing python packages (leave blank if not required):
+   ```
+    The setup wizard completes.
+   
+   ```
+   New configuration stored in /home/<username>/clearml.conf
+   CLEARML-AGENT setup completed successfully.
+   ```
+   
+    The configuration file location depends upon the operating system:
+            
+    * Linux - `~/clearml.conf`
+    * Mac - `$HOME/clearml.conf`
+    * Windows - `\User\<username>\clearml.conf`
+
+1. Optionally, configure ClearML options for **ClearML Agent** (default docker, package manager, etc.). See the [ClearML Configuration Reference](../configs/clearml_conf.md)
+   and the [ClearML Agent Environment Variables reference](../clearml_agent/clearml_agent_env_var.md). 
+   
+:::note
+The ClearML Enterprise server provides a [configuration vault](../webapp/webapp_profile.md#configuration-vault), the contents 
+of which are categorically applied on top of the agent-local configuration
+:::
+
+
+### Adding ClearML Agent to a Configuration File
+
+In case a `clearml.conf` file already exists, add a few ClearML Agent specific configurations to it.<br/>
+
+**Adding ClearML Agent to a ClearML configuration file:**
+
+1. Open the ClearML configuration file for editing. Depending upon the operating system, it is:
+    * Linux - `~/clearml.conf`
+    * Mac - `$HOME/clearml.conf`
+    * Windows - `\User\<username>\clearml.conf`
+
+1. After the `api` section, add your `agent` section. For example:
+   ```
+   agent {
+       # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
+       git_user=""
+       git_pass=""
+       # all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
+       git_host=""
+   
+       # Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
+       force_git_ssh_protocol: false
+   
+       # unique name of this worker, if None, created based on hostname:process_id
+       # Overridden with os environment: CLEARML_WORKER_NAME
+       worker_id: ""
+   }   
+   ```
+   View a complete ClearML Agent configuration file sample including an `agent` section [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf).
+
+1. Save the configuration.
+
+### Dynamic Environment Variables
+Dynamic ClearML Agent environment variables can be used to override any configuration setting that appears in the [`agent`](../configs/clearml_conf.md#agent-section) 
+section of the `clearml.conf`.
+
+The environment variable's name should be `CLEARML_AGENT__AGENT__<configuration-path>`, where `<configuration-path>` 
+represents the full path to the configuration field being set. Elements of the configuration path should be separated by 
+`__` (double underscore). For example, set the `CLEARML_AGENT__AGENT__DEFAULT_DOCKER__IMAGE` environment variable to 
+deploy an agent with a different value to what is specified for `agent.default_docker.image` in the clearml.conf.
+
+:::note NOTES
+* Since configuration fields may contain JSON-parsable values, make sure to always quote strings (otherwise the agent 
+might fail to parse them)
+* To comply with environment variables standards, it is recommended to use only upper-case characters in 
+environment variable keys. For this reason, ClearML Agent will always convert the configuration path specified in the 
+dynamic environment variable's key to lower-case before overriding configuration values with the environment variable 
+value.
+:::
--- a/docs/cloud_autoscaling/autoscaling_overview.md
+++ b/docs/cloud_autoscaling/autoscaling_overview.md
@ -71,7 +71,7 @@ execute the tasks in the GPU queue.
 #### Docker
 Every task a cloud instance pulls will be run inside a docker container. When setting up an autoscaler app instance, 
 you can specify a default container to run the tasks inside. If the task has its own container configured, it will 
-override the autoscaler’s default docker image (see [Base Docker Image](../clearml_agent.md#base-docker-container)).
+override the autoscaler’s default docker image (see [Base Docker Image](../clearml_agent/clearml_agent_docker.md#base-docker-container)).

 #### Git Configuration 
 If your code is saved in a private repository, you can add your Git credentials so the ClearML Agents running on your
--- a/docs/configs/clearml_conf.md
+++ b/docs/configs/clearml_conf.md
@ -482,7 +482,7 @@ match_rules: [

 **`agent.package_manager.use_conda_base_env`** (*bool*)

-* When set to `True`, installation will be performed into the base Conda environment. Use in [Docker mode](../clearml_agent.md#docker-mode). 
+* When set to `True`, installation will be performed into the base Conda environment. Use in [Docker mode](../clearml_agent/clearml_agent_execution_env.md#docker-mode). 

 ___

--- a/docs/configs/env_vars.md
+++ b/docs/configs/env_vars.md
@ -20,7 +20,7 @@ but can be overridden by command-line arguments.
 |**CLEARML_LOG_ENVIRONMENT** | List of Environment variable names. These environment variables will be logged in the ClearML task's configuration hyperparameters `Environment` section. When executed by a ClearML agent, these values will be set in the task's execution environment. |
 |**CLEARML_TASK_NO_REUSE** | Boolean. <br/> When set to `1`, a new task is created for every execution (see Task [reuse](../clearml_sdk/task_sdk.md#task-reuse)).                                                              |
 |**CLEARML_CACHE_DIR** | Set the path for the ClearML cache directory, where ClearML stores all downloaded content.   |
-|**CLEARML_DOCKER_IMAGE** | Sets the default docker image to use when running an agent in [Docker mode](../clearml_agent.md#docker-mode).  |
+|**CLEARML_DOCKER_IMAGE** | Sets the default docker image to use when running an agent in [Docker mode](../clearml_agent/clearml_agent_execution_env.md#docker-mode).  |
 |**CLEARML_LOG_LEVEL** | Sets the ClearML package's log verbosity. Log levels adhere to [Python log levels](https://docs.python.org/3/library/logging.config.html#configuration-file-format): CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET |
 |**CLEARML_SUPPRESS_UPDATE_MESSAGE** | Boolean. <br/> When set to `1`, suppresses new ClearML package version availability message. |
 |**CLEARML_DEFAULT_OUTPUT_URI** | The default output destination for model checkpoints (snapshots) and artifacts. |
--- a/docs/deploying_clearml/clearml_server.md
+++ b/docs/deploying_clearml/clearml_server.md
@ -29,7 +29,7 @@ Use the ClearML Web UI to:

 For detailed information about the ClearML Web UI, see [User Interface](../webapp/webapp_overview.md).

-ClearML Server also comes with a [services agent](../clearml_agent.md#services-mode) preinstalled.
+ClearML Server also comes with a [services agent](../clearml_agent/clearml_agent_services_mode.md) preinstalled.

 ## Deployment

--- a/docs/fundamentals/agents_and_queues.md
+++ b/docs/fundamentals/agents_and_queues.md
@ -20,7 +20,7 @@ The agent also supports overriding parameter values on-the-fly without code modi
 ClearML [Hyperparameter Optimization](hpo.md) is implemented).  

 An agent can be associated with specific GPUs, enabling workload distribution. For example, on a machine with 8 GPUs you 
-can allocate several GPUs to an agent and use the rest for a different workload, even through another agent (see [Dynamic GPU Allocation](../clearml_agent.md#dynamic-gpu-allocation)).   
+can allocate several GPUs to an agent and use the rest for a different workload, even through another agent (see [Dynamic GPU Allocation](../clearml_agent/clearml_agent_dynamic_gpus.md)).   



@ -81,7 +81,7 @@ The Agent supports the following running modes:
 * **Virtual Environment Mode** - The agent creates a new virtual environment for the experiment, installs the required 
  python packages based on the Task specification, clones the code repository, applies the uncommitted changes and 
  finally executes the code while monitoring it. This mode uses smart caching so packages and environments can be reused
-  over multiple tasks (see [Virtual Environment Reuse](../clearml_agent.md#virtual-environment-reuse)). 
+  over multiple tasks (see [Virtual Environment Reuse](../clearml_agent/clearml_agent_env_caching.md#virtual-environment-reuse)). 

  ClearML Agent supports using the following package managers: `pip` (default), `conda`, `poetry`. 

--- a/docs/getting_started/ds/best_practices.md
+++ b/docs/getting_started/ds/best_practices.md
@ -47,7 +47,7 @@ that you need.
  accessed, [compared](../../webapp/webapp_exp_comparing.md) and [tracked](../../webapp/webapp_exp_track_visual.md).
 - [ClearML Agent](../../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code, 
  applies code patches, manages parameters (including overriding them on the fly), executes the code, and queues multiple tasks.
-  It can even [build](../../clearml_agent.md#exporting-a-task-into-a-standalone-docker-container) the docker container for you!  
+  It can even [build](../../clearml_agent/clearml_agent_docker.md#exporting-a-task-into-a-standalone-docker-container) the docker container for you!  
 - [ClearML Pipelines](../../pipelines/pipelines.md) ensure that steps run in the same order, 
  programmatically chaining tasks together, while giving an overview of the execution pipeline's status.

--- a/docs/getting_started/ds/ds_first_steps.md
+++ b/docs/getting_started/ds/ds_first_steps.md
@ -44,7 +44,7 @@ pip install clearml
   CLEARML_CONFIG_FILE = MyOtherClearML.conf
   ```
   
-   For more information about running experiments inside Docker containers, see [ClearML Agent Deployment](../../clearml_agent.md#deployment)
+   For more information about running experiments inside Docker containers, see [ClearML Agent Deployment](../../clearml_agent/clearml_agent_deployment.md)
   and [ClearML Agent Reference](../../clearml_agent/clearml_agent_ref.md).
    
   </Collapsible>
--- a/docs/getting_started/mlops/mlops_first_steps.md
+++ b/docs/getting_started/mlops/mlops_first_steps.md
@ -53,8 +53,8 @@ required python packages, and execute and monitor the process.
   (or even multiple queues), but only a single agent will pull a Task to be executed.

 :::tip Agent Deployment Modes
-ClearML Agents can be deployed in Virtual Environment Mode or Docker Mode. In [virtual environment mode](../../clearml_agent.md#execution-environments), 
-the agent creates a new venv to execute an experiment. In [Docker mode](../../clearml_agent.md#docker-mode), 
+ClearML Agents can be deployed in Virtual Environment Mode or Docker Mode. In [virtual environment mode](../../clearml_agent/clearml_agent_execution_env.md), 
+the agent creates a new venv to execute an experiment. In [Docker mode](../../clearml_agent/clearml_agent_execution_env.md#docker-mode), 
 the agent executes an experiment inside a Docker container. For more information, see [Running Modes](../../fundamentals/agents_and_queues.md#running-modes).  
 :::

--- a/docs/guides/clearml-task/clearml_task_tutorial.md
+++ b/docs/guides/clearml-task/clearml_task_tutorial.md
@ -8,7 +8,7 @@ on a remote or local machine, from a remote repository and your local machine.
 ### Prerequisites

 - [`clearml`](../../getting_started/ds/ds_first_steps.md) Python package installed and configured
- [`clearml-agent`](../../clearml_agent.md#installation) running on at least one machine (to execute the experiment), configured to listen to `default` queue 
+- [`clearml-agent`](../../clearml_agent/clearml_agent_setup.md#installation) running on at least one machine (to execute the experiment), configured to listen to `default` queue 

 ### Executing Code from a Remote Repository 

--- a/docs/guides/clearml_agent/executable_exp_containers.md
+++ b/docs/guides/clearml_agent/executable_exp_containers.md
@ -8,7 +8,7 @@ run, will automatically execute the [keras_tensorboard.py](https://github.com/al
 script.

 ## Prerequisites
-* [`clearml-agent`](../../clearml_agent.md#installation) installed and configured
+* [`clearml-agent`](../../clearml_agent/clearml_agent_setup.md#installation) installed and configured
 * [`clearml`](../../getting_started/ds/ds_first_steps.md#install-clearml) installed and configured
 * [clearml](https://github.com/allegroai/clearml) repo cloned (`git clone https://github.com/allegroai/clearml.git`)

--- a/docs/guides/clearml_agent/exp_environment_containers.md
+++ b/docs/guides/clearml_agent/exp_environment_containers.md
@ -10,7 +10,7 @@ A use case for this would be manual hyperparameter optimization, where a base ta
 be used when running optimization tasks.

 ## Prerequisites
-* [`clearml-agent`](../../clearml_agent.md#installation) installed and configured
+* [`clearml-agent`](../../clearml_agent/clearml_agent_setup.md#installation) installed and configured
 * [`clearml`](../../getting_started/ds/ds_first_steps.md#install-clearml) installed and configured
 * [clearml](https://github.com/allegroai/clearml) repo cloned (`git clone https://github.com/allegroai/clearml.git`)
  
@ -66,7 +66,7 @@ Make use of the container you've just built by having a ClearML agent make use o
   of the new Docker image, `new_docker`. See [Tuning Experiments](../../webapp/webapp_exp_tuning.md) for more task 
   modification options. 
 1. Enqueue the cloned experiment to the `default` queue.
-1. Launch a `clearml-agent` in [Docker Mode](../../clearml_agent.md#docker-mode) and assign it to the `default` queue:
+1. Launch a `clearml-agent` in [Docker Mode](../../clearml_agent/clearml_agent_execution_env.md#docker-mode) and assign it to the `default` queue:
   ```console
   clearml-agent daemon --docker --queue default
   ```
--- a/docs/guides/ide/remote_jupyter_tutorial.md
+++ b/docs/guides/ide/remote_jupyter_tutorial.md
@ -9,7 +9,7 @@ where a `clearml-agent` will run and spin an instance of the remote session.
 ## Prerequisites

 * `clearml-session` package installed (`pip install clearml-session`)
-* At least one `clearml-agent` running on a **remote** host. See [installation details](../../clearml_agent.md#installation).
+* At least one `clearml-agent` running on a **remote** host. See [installation details](../../clearml_agent/clearml_agent_setup.md#installation).
  Configure the `clearml-agent` to listen to the `default` queue (`clearml-agent daemon --queue default`)
 * An SSH client installed on machine being used. To verify, open terminal, execute `ssh`, and if no error is received,
    it should be good to go.
--- a/docs/guides/optimization/hyper-parameter-optimization/examples_hyperparam_opt.md
+++ b/docs/guides/optimization/hyper-parameter-optimization/examples_hyperparam_opt.md
@ -197,7 +197,7 @@ object, setting the following optimization parameters:
 ## Running as a Service

 The optimization can run as a service, if the `run_as_service` argument is set to `true`. For more information about 
-running as a service, see [Services Mode](../../../clearml_agent.md#services-mode).
+running as a service, see [Services Mode](../../../clearml_agent/clearml_agent_services_mode.md).

 ```python
 # if we are running as a service, just enqueue ourselves into the services queue and let it run the optimization
--- a/docs/guides/services/aws_autoscaler.md
+++ b/docs/guides/services/aws_autoscaler.md
@ -14,7 +14,7 @@ up new instances when there aren't enough to execute pending tasks.
 Run the ClearML AWS autoscaler in one of these ways:
 * Run the [aws_autoscaler.py](https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py) 
  script locally
-* Launch through your [`services` queue](../../clearml_agent.md#services-mode)
+* Launch through your [`services` queue](../../clearml_agent/clearml_agent_services_mode.md)

 :::note Default AMI
 The autoscaler service uses by default the `NVIDIA Deep Learning AMI v20.11.0-46a68101-e56b-41cd-8e32-631ac6e5d02b` AMI.
@ -140,7 +140,7 @@ Execution log https://app.clear.ml/projects/142a598b5d234bebb37a57d692f5689f/exp
 ```

 ### Remote Execution
-Using the `--remote` command line option will enqueue the autoscaler to your [`services` queue](../../clearml_agent.md#services-mode)
+Using the `--remote` command line option will enqueue the autoscaler to your [`services` queue](../../clearml_agent/clearml_agent_services_mode.md)
 once the configuration wizard is complete:

 ```bash
@ -162,7 +162,7 @@ page under **HYPERPARAMETERS > General**.

 The task can be reused to launch another autoscaler instance: clone the task, then edit its parameters for the instance 
 types and budget configuration, and enqueue the task for execution (you'll typically want to use a ClearML Agent running 
-in [services mode](../../clearml_agent.md#services-mode) for such service tasks).
+in [services mode](../../clearml_agent/clearml_agent_services_mode.md) for such service tasks).

 ### Console

--- a/docs/guides/services/cleanup_service.md
+++ b/docs/guides/services/cleanup_service.md
@ -55,7 +55,7 @@ an `APIClient` object that establishes a session with the ClearML Server, and ac
 The experiment's hyperparameters are explicitly logged to ClearML using the [`Task.connect`](../../references/sdk/task.md#connect) 
 method. View them in the WebApp, in the experiment's **CONFIGURATION** page under **HYPERPARAMETERS > General**.

-The task can be reused. Clone the task, edit its parameters, and enqueue the task to run in ClearML Agent [services mode](../../clearml_agent.md#services-mode).
+The task can be reused. Clone the task, edit its parameters, and enqueue the task to run in ClearML Agent [services mode](../../clearml_agent/clearml_agent_services_mode.md).

 ![Cleanup service configuration](../../img/example_cleanup_configuration.png)

--- a/docs/guides/services/slack_alerts.md
+++ b/docs/guides/services/slack_alerts.md
@ -44,7 +44,7 @@ to your needs and enqueue for execution directly from the ClearML UI.

 Run the monitoring service in one of these ways:
 * Run locally
-* Run in ClearML Agent [services mode](../../clearml_agent.md#services-mode)
+* Run in ClearML Agent [services mode](../../clearml_agent/clearml_agent_services_mode.md)

 To run the monitoring service:

@ -85,7 +85,7 @@ page under **HYPERPARAMETERS > Args**.
 ![Monitoring configuration](../../img/examples_slack_config.png)

 The task can be reused to launch another monitor instance: clone the task, edit its parameters, and enqueue the task for 
-execution (you'll typically want to use a ClearML Agent running in [services mode](../../clearml_agent.md#services-mode) 
+execution (you'll typically want to use a ClearML Agent running in [services mode](../../clearml_agent/clearml_agent_services_mode.md) 
 for such service tasks).

 ## Console
--- a/docs/guides/ui/tuning_exp.md
+++ b/docs/guides/ui/tuning_exp.md
@ -10,7 +10,7 @@ example script.
 * Clone the [clearml](https://github.com/allegroai/clearml) repository.
 * Install the [requirements](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/requirements.txt) 
  for the TensorFlow examples.
-* Have **ClearML Agent** [installed and configured](../../clearml_agent.md#installation).
+* Have **ClearML Agent** [installed and configured](../../clearml_agent/clearml_agent_setup.md#installation).

 ## Step 1: Run the Experiment

--- a/docs/integrations/autokeras.md
+++ b/docs/integrations/autokeras.md
@ -78,7 +78,7 @@ See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -104,7 +104,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/catboost.md
+++ b/docs/integrations/catboost.md
@ -76,7 +76,7 @@ See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -102,7 +102,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/fastai.md
+++ b/docs/integrations/fastai.md
@ -76,7 +76,7 @@ See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -102,7 +102,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/keras.md
+++ b/docs/integrations/keras.md
@ -88,7 +88,7 @@ and debug samples, plots, and scalars logged to TensorBoard

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -114,7 +114,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/lightgbm.md
+++ b/docs/integrations/lightgbm.md
@ -77,7 +77,7 @@ See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -103,7 +103,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/megengine.md
+++ b/docs/integrations/megengine.md
@ -73,7 +73,7 @@ See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -99,7 +99,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/pytorch.md
+++ b/docs/integrations/pytorch.md
@ -97,7 +97,7 @@ additional tools, like argparse, TensorBoard, and matplotlib:

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -123,7 +123,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/pytorch_lightning.md
+++ b/docs/integrations/pytorch_lightning.md
@ -103,7 +103,7 @@ See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -129,7 +129,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/scikit_learn.md
+++ b/docs/integrations/scikit_learn.md
@ -79,7 +79,7 @@ additional tools, like Matplotlib:

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -105,7 +105,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/tensorflow.md
+++ b/docs/integrations/tensorflow.md
@ -90,7 +90,7 @@ ClearML's automatic logging of parameters defined using `absl.flags`

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -116,7 +116,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/integrations/xgboost.md
+++ b/docs/integrations/xgboost.md
@ -104,7 +104,7 @@ additional tools, like Matplotlib and scikit-learn:

 ## Remote Execution
 ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
-uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent.md) listens to designated queues and when a task is enqueued, 
 the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
 experiment manager.

@ -130,7 +130,7 @@ with the new configuration on a remote machine:
 * Edit the hyperparameters and/or other details
 * Enqueue the task

-The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md).

 ### Executing a Task Remotely

--- a/docs/remote_session.md
+++ b/docs/remote_session.md
@ -16,7 +16,7 @@ meets resource needs:
 * [Clearml Session CLI](apps/clearml_session.md) - Launch an interactive JupyterLab, VS Code, and SSH session on a remote machine:
  * Automatically store and sync your [interactive session workspace](apps/clearml_session.md#storing-and-synchronizing-workspace)
  * Replicate a previously executed experiment's execution environment and [interactively execute and debug](apps/clearml_session.md#starting-a-debugging-session) it on a remote session
-  * Develop directly inside your Kubernetes pods ([see ClearML Agent](clearml_agent.md#kubernetes))
+  * Develop directly inside your Kubernetes pods ([see ClearML Agent](clearml_agent/clearml_agent_deployment.md#kubernetes))
  * And more! 
 * GUI Applications (available under ClearML Enterprise Plan) - These apps provide local links to access JupyterLab or 
  VS Code on a remote machine over a secure and encrypted SSH connection, letting you use the IDE as if you're running 
--- a/docs/webapp/applications/apps_aws_autoscaler.md
+++ b/docs/webapp/applications/apps_aws_autoscaler.md
@ -38,7 +38,7 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
 * **Workers Prefix** (optional) - A Prefix added to workers' names, associating them with this autoscaler
 * **Polling Interval** (optional) - Time period in minutes at which the designated queue is polled for new tasks
 * **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in 
-[Docker mode](../../clearml_agent.md#docker-mode) 
+[Docker mode](../../clearml_agent/clearml_agent_execution_env.md#docker-mode) 
  * **Base Docker Image** (optional) - Available when `Use docker mode` is selected: Default Docker image in which the 
  ClearML Agent will run. Provide an image stored in a Docker artifactory so instances can automatically fetch it
 * **Compute Resources**
--- a/docs/webapp/applications/apps_gcp_autoscaler.md
+++ b/docs/webapp/applications/apps_gcp_autoscaler.md
@ -39,7 +39,7 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
    * Git User 
    * Git Password / Personal Access Token
 * **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in 
-[Docker mode](../../clearml_agent.md#docker-mode) 
+[Docker mode](../../clearml_agent/clearml_agent_execution_env.md#docker-mode) 
 * **Base Docker Image** (optional) - Available when `Use docker mode` is selected. Default Docker image in which the ClearML Agent will run. Provide an image stored in a 
  Docker artifactory so VM instances can automatically fetch it
 * **Compute Resources**
--- a/docs/webapp/webapp_exp_track_visual.md
+++ b/docs/webapp/webapp_exp_track_visual.md
@ -88,7 +88,7 @@ using to set up an environment (`pip` or `conda`) are available. Select which re

 ### Container
 The Container section list the following information:
-* Image - a pre-configured Docker that ClearML Agent will use to remotely execute this experiment (see [Building Docker containers](../clearml_agent.md#exporting-a-task-into-a-standalone-docker-container))
+* Image - a pre-configured Docker that ClearML Agent will use to remotely execute this experiment (see [Building Docker containers](../clearml_agent/clearml_agent_docker.md))
 * Arguments - add Docker arguments
 * Setup shell script - a bash script to be executed inside the Docker before setting up the experiment's environment

--- a/docs/webapp/webapp_exp_tuning.md
+++ b/docs/webapp/webapp_exp_tuning.md
@ -70,7 +70,7 @@ Select source code by changing any of the following:


 #### Base Docker Image
-Select a pre-configured Docker that **ClearML Agent** will use to remotely execute this experiment (see [Building Docker containers](../clearml_agent.md#exporting-a-task-into-a-standalone-docker-container)).
+Select a pre-configured Docker that **ClearML Agent** will use to remotely execute this experiment (see [Building Docker containers](../clearml_agent/clearml_agent_docker.md)).

 **To add, change, or delete a base Docker image:**

--- a/sidebars.js
+++ b/sidebars.js
@ -36,7 +36,13 @@ module.exports = {
        {'ClearML Fundamentals': ['fundamentals/projects', 'fundamentals/task', 'fundamentals/hyperparameters', 'fundamentals/artifacts', 'fundamentals/logger', 'fundamentals/agents_and_queues',
            'fundamentals/hpo']},
        {'ClearML SDK': ['clearml_sdk/clearml_sdk', 'clearml_sdk/task_sdk', 'clearml_sdk/model_sdk', 'clearml_sdk/apiclient_sdk']},
-        'clearml_agent',
+        {'ClearML Agent':
+            ['clearml_agent', 'clearml_agent/clearml_agent_setup', 'clearml_agent/clearml_agent_deployment',
+            'clearml_agent/clearml_agent_execution_env', 'clearml_agent/clearml_agent_env_caching',
+            'clearml_agent/clearml_agent_dynamic_gpus', 'clearml_agent/clearml_agent_fractional_gpus',
+            'clearml_agent/clearml_agent_services_mode', 'clearml_agent/clearml_agent_docker',
+            'clearml_agent/clearml_agent_google_colab', 'clearml_agent/clearml_agent_scheduling'
+        ]},
        {'Cloud Autoscaling': [
            'cloud_autoscaling/autoscaling_overview',
             {'Autoscaler Apps': [