diff --git a/docs/fundamentals/agents_and_queues.md b/docs/fundamentals/agents_and_queues.md index 2b5396ec..f379b18c 100644 --- a/docs/fundamentals/agents_and_queues.md +++ b/docs/fundamentals/agents_and_queues.md @@ -9,23 +9,24 @@ The ClearML Agent is the base for **Automation** in ClearML and can be leveraged (e.g. a [monitor and alert service](https://github.com/allegroai/clearml/tree/master/examples/services/monitoring)) and more. ## What Does a ClearML Agent Do? -An agent (also referred to as a Worker) allows users to execute code on any machine it's installed on, thus facilitating the -scaling of data science work beyond one's own machine. +An agent (also referred to as a worker) allows users to execute code on any machine it's installed on, thus facilitating the +scaling of data science work beyond one's own machine. + The agent takes care of deploying the code to the target machine as well as setting up the entire execution environment: from installing required packages to setting environment variables, -all leading to executing the code (supporting both virtual environment or flexible docker container configurations) +all leading to executing the code (supporting both virtual environment or flexible docker container configurations). -The Agent also supports overriding parameter values on-the-fly without code modification, thus enabling no-code experimentation (This is also the foundation on which -ClearML [Hyper Parameter Optimization](hpo.md) is implemented). -An agent can be associated with specific GPUs, enabling workload distribution. For example, on a machine with 8 GPUs you can allocate several GPUs to an agent and use the rest for a different workload -(even through another agent). +The agent also supports overriding parameter values on-the-fly without code modification, thus enabling no-code experimentation (this is also the foundation on which +ClearML [Hyperparameter Optimization](hpo.md) is implemented). + +An agent can be associated with specific GPUs, enabling workload distribution. For example, on a machine with 8 GPUs you +can allocate several GPUs to an agent and use the rest for a different workload, even through another agent (see [Dynamic GPU Allocation](../clearml_agent.md#dynamic-gpu-allocation)). ## What is a Queue? -A ClearML queue is an ordered list of Tasks scheduled for execution. -A queue can be serviced by one or multiple ClearML agents. +A ClearML queue is an ordered list of Tasks scheduled for execution. A queue can be serviced by one or multiple agents. Agents servicing a queue pull the queued tasks in order and execute them. A ClearML Agent can service multiple queues in either of the following modes: @@ -50,18 +51,19 @@ The diagram above demonstrates a typical flow where an agent executes a task: 1. Set up the python environment and required packages. 1. The task's script/code is executed. -While the agent is running, it continuously reports system metrics to the ClearML Server (These can be monitored in the **Workers and Queues** page). +While the agent is running, it continuously reports system metrics to the ClearML Server (these can be monitored in the +[**Workers and Queues**](../webapp/webapp_workers_queues.md) page). ## Resource Management -Installing an Agent on machines allows it to monitor all the machine's status (GPU \ CPU \ Memory \ Network \ Disk IO). -When managing multiple machines, this allows users to have an overview of their entire HW resources. What is the status of each machine, what is the expected workload -on each machine and so on. +Installing an Agent on machines allows it to monitor all the machine's status (GPU / CPU / Memory / Network / Disk IO). +When managing multiple machines, this allows users to have an overview of their HW resources: the status of each machine, +the expected workload on each machine, etc. ![image](../img/agents_queues_resource_management.png) You can organize your queues according to resource usage. Say you have a single-GPU machine. You can create a queue called -"single-gpu-queue" and assign the machine's agent, as well as other single-GPU agents to that queue. This way you will know +`single-gpu-queue` and assign the machine's agent, as well as other single-GPU agents to that queue. This way you will know that Tasks assigned to that queue will be executed by a single GPU machine. While the agents are up and running in your machines, you can access these resources from any machine by enqueueing a @@ -76,7 +78,7 @@ Agents can be deployed bare-metal, with multiple instances allocating specific GPUs to the agents. They can also be deployed as dockers in a Kubernetes cluster. The Agent has three running modes: -- Docker mode: The agent spins a docker image based on the Task’s definition then inside the docker the agent will clone +- Docker Mode: The agent spins a docker image based on the Task’s definition then inside the docker the agent will clone the specified repository/code, apply the original execution’s uncommitted changes, install the required python packages and start executing the code while monitoring it. - Virtual Environment Mode: The agent creates a new virtual environment for the experiment, installs the required python diff --git a/docs/guides/optimization/hyper-parameter-optimization/examples_hyperparam_opt.md b/docs/guides/optimization/hyper-parameter-optimization/examples_hyperparam_opt.md index 430581dd..5c02f9ac 100644 --- a/docs/guides/optimization/hyper-parameter-optimization/examples_hyperparam_opt.md +++ b/docs/guides/optimization/hyper-parameter-optimization/examples_hyperparam_opt.md @@ -187,8 +187,7 @@ Specify the remaining parameters, including the time limit per Task (minutes), p ## Running as a Service The optimization can run as a service, if the `run_as_service` argument is set to `true`. For more information about -running as a service, see [ClearML Agent services container](../../../clearml_agent.md#services-mode) -on "Concepts and Architecture" page. +running as a service, see [Services Mode](../../../clearml_agent.md#services-mode). ```python # if we are running as a service, just enqueue ourselves into the services queue and let it run the optimization diff --git a/docs/webapp/webapp_exp_table.md b/docs/webapp/webapp_exp_table.md index 1ef9fe4c..5888d220 100644 --- a/docs/webapp/webapp_exp_table.md +++ b/docs/webapp/webapp_exp_table.md @@ -124,7 +124,7 @@ that allow each operation. | Action | Description | States Valid for the Action | State Transition | |---|---|---|---| | View details | View experiment details in the experiments table, the [info panel](webapp_exp_track_visual#info-panel) (keep the experiments table in view), or the [full screen details view](webapp_exp_track_visual#full-screen-details-view). | Any state | None | -| Manage a queue | If an experiment is *Pending* in a queue, view the utilization of that queue, manage that queue (remove experiments and change the order of experiments), and view information about the worker(s) listening to the queue. See the [Workers and queues](webapp_workers_queues) page. | *Enqueued* | None | +| Manage a queue | If an experiment is *Pending* in a queue, view the utilization of that queue, manage that queue (remove experiments and change the order of experiments), and view information about the worker(s) listening to the queue. See the [Workers and Queues](webapp_workers_queues) page. | *Enqueued* | None | | View a worker | If an experiment is *Running*, view resource utilization, worker details, and queues to which a worker is listening. | *Running* | None | | Share | For **ClearML Hosted Service** users only, [share](webapp_exp_sharing) an experiment and its model with a **ClearML Hosted Service** user in another workspace. | Any state | None | | Archive | To more easily work with active experiments, move an experiment to the archive. See [Archiving](webapp_archiving). | Any state | None | diff --git a/docs/webapp/webapp_exp_track_visual.md b/docs/webapp/webapp_exp_track_visual.md index a13392e3..3c4ace0a 100644 --- a/docs/webapp/webapp_exp_track_visual.md +++ b/docs/webapp/webapp_exp_track_visual.md @@ -29,7 +29,7 @@ table / full screen**. ### Info Panel -The info panel keeps the experiment table in view so that [experiment actions](webapp_exp_table#clearml-actions-from-the-experiments-table) +The info panel keeps the experiment table in view so that [experiment actions](webapp_exp_table#experiment-actions) can be performed from the table (as well as the menu in the info panel).