Small edits (#121)

This commit is contained in:
pollfly 2021-11-27 15:51:04 +02:00 committed by GitHub
parent 8f70c7cdc8
commit 8ddd32b4d3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 20 additions and 19 deletions

View File

@ -9,23 +9,24 @@ The ClearML Agent is the base for **Automation** in ClearML and can be leveraged
(e.g. a [monitor and alert service](https://github.com/allegroai/clearml/tree/master/examples/services/monitoring)) and more.
## What Does a ClearML Agent Do?
An agent (also referred to as a Worker) allows users to execute code on any machine it's installed on, thus facilitating the
scaling of data science work beyond one's own machine.
An agent (also referred to as a worker) allows users to execute code on any machine it's installed on, thus facilitating the
scaling of data science work beyond one's own machine.
The agent takes care of deploying the code to the target machine as well as setting up the entire execution environment:
from installing required packages to setting environment variables,
all leading to executing the code (supporting both virtual environment or flexible docker container configurations)
all leading to executing the code (supporting both virtual environment or flexible docker container configurations).
The Agent also supports overriding parameter values on-the-fly without code modification, thus enabling no-code experimentation (This is also the foundation on which
ClearML [Hyper Parameter Optimization](hpo.md) is implemented).
An agent can be associated with specific GPUs, enabling workload distribution. For example, on a machine with 8 GPUs you can allocate several GPUs to an agent and use the rest for a different workload
(even through another agent).
The agent also supports overriding parameter values on-the-fly without code modification, thus enabling no-code experimentation (this is also the foundation on which
ClearML [Hyperparameter Optimization](hpo.md) is implemented).
An agent can be associated with specific GPUs, enabling workload distribution. For example, on a machine with 8 GPUs you
can allocate several GPUs to an agent and use the rest for a different workload, even through another agent (see [Dynamic GPU Allocation](../clearml_agent.md#dynamic-gpu-allocation)).
## What is a Queue?
A ClearML queue is an ordered list of Tasks scheduled for execution.
A queue can be serviced by one or multiple ClearML agents.
A ClearML queue is an ordered list of Tasks scheduled for execution. A queue can be serviced by one or multiple agents.
Agents servicing a queue pull the queued tasks in order and execute them.
A ClearML Agent can service multiple queues in either of the following modes:
@ -50,18 +51,19 @@ The diagram above demonstrates a typical flow where an agent executes a task:
1. Set up the python environment and required packages.
1. The task's script/code is executed.
While the agent is running, it continuously reports system metrics to the ClearML Server (These can be monitored in the **Workers and Queues** page).
While the agent is running, it continuously reports system metrics to the ClearML Server (these can be monitored in the
[**Workers and Queues**](../webapp/webapp_workers_queues.md) page).
## Resource Management
Installing an Agent on machines allows it to monitor all the machine's status (GPU \ CPU \ Memory \ Network \ Disk IO).
When managing multiple machines, this allows users to have an overview of their entire HW resources. What is the status of each machine, what is the expected workload
on each machine and so on.
Installing an Agent on machines allows it to monitor all the machine's status (GPU / CPU / Memory / Network / Disk IO).
When managing multiple machines, this allows users to have an overview of their HW resources: the status of each machine,
the expected workload on each machine, etc.
![image](../img/agents_queues_resource_management.png)
You can organize your queues according to resource usage. Say you have a single-GPU machine. You can create a queue called
"single-gpu-queue" and assign the machine's agent, as well as other single-GPU agents to that queue. This way you will know
`single-gpu-queue` and assign the machine's agent, as well as other single-GPU agents to that queue. This way you will know
that Tasks assigned to that queue will be executed by a single GPU machine.
While the agents are up and running in your machines, you can access these resources from any machine by enqueueing a
@ -76,7 +78,7 @@ Agents can be deployed bare-metal, with multiple instances allocating
specific GPUs to the agents. They can also be deployed as dockers in a Kubernetes cluster.
The Agent has three running modes:
- Docker mode: The agent spins a docker image based on the Tasks definition then inside the docker the agent will clone
- Docker Mode: The agent spins a docker image based on the Tasks definition then inside the docker the agent will clone
the specified repository/code, apply the original executions uncommitted changes, install the required python packages
and start executing the code while monitoring it.
- Virtual Environment Mode: The agent creates a new virtual environment for the experiment, installs the required python

View File

@ -187,8 +187,7 @@ Specify the remaining parameters, including the time limit per Task (minutes), p
## Running as a Service
The optimization can run as a service, if the `run_as_service` argument is set to `true`. For more information about
running as a service, see [ClearML Agent services container](../../../clearml_agent.md#services-mode)
on "Concepts and Architecture" page.
running as a service, see [Services Mode](../../../clearml_agent.md#services-mode).
```python
# if we are running as a service, just enqueue ourselves into the services queue and let it run the optimization

View File

@ -124,7 +124,7 @@ that allow each operation.
| Action | Description | States Valid for the Action | State Transition |
|---|---|---|---|
| View details | View experiment details in the experiments table, the [info panel](webapp_exp_track_visual#info-panel) (keep the experiments table in view), or the [full screen details view](webapp_exp_track_visual#full-screen-details-view). | Any state | None |
| Manage a queue | If an experiment is *Pending* in a queue, view the utilization of that queue, manage that queue (remove experiments and change the order of experiments), and view information about the worker(s) listening to the queue. See the [Workers and queues](webapp_workers_queues) page. | *Enqueued* | None |
| Manage a queue | If an experiment is *Pending* in a queue, view the utilization of that queue, manage that queue (remove experiments and change the order of experiments), and view information about the worker(s) listening to the queue. See the [Workers and Queues](webapp_workers_queues) page. | *Enqueued* | None |
| View a worker | If an experiment is *Running*, view resource utilization, worker details, and queues to which a worker is listening. | *Running* | None |
| Share | For **ClearML Hosted Service** users only, [share](webapp_exp_sharing) an experiment and its model with a **ClearML Hosted Service** user in another workspace. | Any state | None |
| Archive | To more easily work with active experiments, move an experiment to the archive. See [Archiving](webapp_archiving). | Any state | None |

View File

@ -29,7 +29,7 @@ table / full screen**.
### Info Panel
The info panel keeps the experiment table in view so that [experiment actions](webapp_exp_table#clearml-actions-from-the-experiments-table)
The info panel keeps the experiment table in view so that [experiment actions](webapp_exp_table#experiment-actions)
can be performed from the table (as well as the menu in the info panel).
<details className="cml-expansion-panel screenshot">