mirror of
https://github.com/clearml/clearml-docs
synced 2025-04-02 00:33:56 +00:00
Small edits (#121)
This commit is contained in:
parent
8f70c7cdc8
commit
8ddd32b4d3
@ -9,23 +9,24 @@ The ClearML Agent is the base for **Automation** in ClearML and can be leveraged
|
||||
(e.g. a [monitor and alert service](https://github.com/allegroai/clearml/tree/master/examples/services/monitoring)) and more.
|
||||
|
||||
## What Does a ClearML Agent Do?
|
||||
An agent (also referred to as a Worker) allows users to execute code on any machine it's installed on, thus facilitating the
|
||||
scaling of data science work beyond one's own machine.
|
||||
An agent (also referred to as a worker) allows users to execute code on any machine it's installed on, thus facilitating the
|
||||
scaling of data science work beyond one's own machine.
|
||||
|
||||
The agent takes care of deploying the code to the target machine as well as setting up the entire execution environment:
|
||||
from installing required packages to setting environment variables,
|
||||
all leading to executing the code (supporting both virtual environment or flexible docker container configurations)
|
||||
all leading to executing the code (supporting both virtual environment or flexible docker container configurations).
|
||||
|
||||
The Agent also supports overriding parameter values on-the-fly without code modification, thus enabling no-code experimentation (This is also the foundation on which
|
||||
ClearML [Hyper Parameter Optimization](hpo.md) is implemented).
|
||||
An agent can be associated with specific GPUs, enabling workload distribution. For example, on a machine with 8 GPUs you can allocate several GPUs to an agent and use the rest for a different workload
|
||||
(even through another agent).
|
||||
The agent also supports overriding parameter values on-the-fly without code modification, thus enabling no-code experimentation (this is also the foundation on which
|
||||
ClearML [Hyperparameter Optimization](hpo.md) is implemented).
|
||||
|
||||
An agent can be associated with specific GPUs, enabling workload distribution. For example, on a machine with 8 GPUs you
|
||||
can allocate several GPUs to an agent and use the rest for a different workload, even through another agent (see [Dynamic GPU Allocation](../clearml_agent.md#dynamic-gpu-allocation)).
|
||||
|
||||
|
||||
|
||||
## What is a Queue?
|
||||
|
||||
A ClearML queue is an ordered list of Tasks scheduled for execution.
|
||||
A queue can be serviced by one or multiple ClearML agents.
|
||||
A ClearML queue is an ordered list of Tasks scheduled for execution. A queue can be serviced by one or multiple agents.
|
||||
Agents servicing a queue pull the queued tasks in order and execute them.
|
||||
|
||||
A ClearML Agent can service multiple queues in either of the following modes:
|
||||
@ -50,18 +51,19 @@ The diagram above demonstrates a typical flow where an agent executes a task:
|
||||
1. Set up the python environment and required packages.
|
||||
1. The task's script/code is executed.
|
||||
|
||||
While the agent is running, it continuously reports system metrics to the ClearML Server (These can be monitored in the **Workers and Queues** page).
|
||||
While the agent is running, it continuously reports system metrics to the ClearML Server (these can be monitored in the
|
||||
[**Workers and Queues**](../webapp/webapp_workers_queues.md) page).
|
||||
|
||||
## Resource Management
|
||||
Installing an Agent on machines allows it to monitor all the machine's status (GPU \ CPU \ Memory \ Network \ Disk IO).
|
||||
When managing multiple machines, this allows users to have an overview of their entire HW resources. What is the status of each machine, what is the expected workload
|
||||
on each machine and so on.
|
||||
Installing an Agent on machines allows it to monitor all the machine's status (GPU / CPU / Memory / Network / Disk IO).
|
||||
When managing multiple machines, this allows users to have an overview of their HW resources: the status of each machine,
|
||||
the expected workload on each machine, etc.
|
||||
|
||||

|
||||
|
||||
|
||||
You can organize your queues according to resource usage. Say you have a single-GPU machine. You can create a queue called
|
||||
"single-gpu-queue" and assign the machine's agent, as well as other single-GPU agents to that queue. This way you will know
|
||||
`single-gpu-queue` and assign the machine's agent, as well as other single-GPU agents to that queue. This way you will know
|
||||
that Tasks assigned to that queue will be executed by a single GPU machine.
|
||||
|
||||
While the agents are up and running in your machines, you can access these resources from any machine by enqueueing a
|
||||
@ -76,7 +78,7 @@ Agents can be deployed bare-metal, with multiple instances allocating
|
||||
specific GPUs to the agents. They can also be deployed as dockers in a Kubernetes cluster.
|
||||
|
||||
The Agent has three running modes:
|
||||
- Docker mode: The agent spins a docker image based on the Task’s definition then inside the docker the agent will clone
|
||||
- Docker Mode: The agent spins a docker image based on the Task’s definition then inside the docker the agent will clone
|
||||
the specified repository/code, apply the original execution’s uncommitted changes, install the required python packages
|
||||
and start executing the code while monitoring it.
|
||||
- Virtual Environment Mode: The agent creates a new virtual environment for the experiment, installs the required python
|
||||
|
@ -187,8 +187,7 @@ Specify the remaining parameters, including the time limit per Task (minutes), p
|
||||
## Running as a Service
|
||||
|
||||
The optimization can run as a service, if the `run_as_service` argument is set to `true`. For more information about
|
||||
running as a service, see [ClearML Agent services container](../../../clearml_agent.md#services-mode)
|
||||
on "Concepts and Architecture" page.
|
||||
running as a service, see [Services Mode](../../../clearml_agent.md#services-mode).
|
||||
|
||||
```python
|
||||
# if we are running as a service, just enqueue ourselves into the services queue and let it run the optimization
|
||||
|
@ -124,7 +124,7 @@ that allow each operation.
|
||||
| Action | Description | States Valid for the Action | State Transition |
|
||||
|---|---|---|---|
|
||||
| View details | View experiment details in the experiments table, the [info panel](webapp_exp_track_visual#info-panel) (keep the experiments table in view), or the [full screen details view](webapp_exp_track_visual#full-screen-details-view). | Any state | None |
|
||||
| Manage a queue | If an experiment is *Pending* in a queue, view the utilization of that queue, manage that queue (remove experiments and change the order of experiments), and view information about the worker(s) listening to the queue. See the [Workers and queues](webapp_workers_queues) page. | *Enqueued* | None |
|
||||
| Manage a queue | If an experiment is *Pending* in a queue, view the utilization of that queue, manage that queue (remove experiments and change the order of experiments), and view information about the worker(s) listening to the queue. See the [Workers and Queues](webapp_workers_queues) page. | *Enqueued* | None |
|
||||
| View a worker | If an experiment is *Running*, view resource utilization, worker details, and queues to which a worker is listening. | *Running* | None |
|
||||
| Share | For **ClearML Hosted Service** users only, [share](webapp_exp_sharing) an experiment and its model with a **ClearML Hosted Service** user in another workspace. | Any state | None |
|
||||
| Archive | To more easily work with active experiments, move an experiment to the archive. See [Archiving](webapp_archiving). | Any state | None |
|
||||
|
@ -29,7 +29,7 @@ table / full screen**.
|
||||
|
||||
### Info Panel
|
||||
|
||||
The info panel keeps the experiment table in view so that [experiment actions](webapp_exp_table#clearml-actions-from-the-experiments-table)
|
||||
The info panel keeps the experiment table in view so that [experiment actions](webapp_exp_table#experiment-actions)
|
||||
can be performed from the table (as well as the menu in the info panel).
|
||||
|
||||
<details className="cml-expansion-panel screenshot">
|
||||
|
Loading…
Reference in New Issue
Block a user