mirror of
https://github.com/clearml/clearml-docs
synced 2025-03-03 18:53:37 +00:00
Refactor MLOps First Steps page (#125)
This commit is contained in:
parent
276115afda
commit
2d6c75e3aa
@ -6,89 +6,133 @@ title: First Steps
|
||||
This tutorial assumes that you've already [signed up](https://app.community.clear.ml) to ClearML
|
||||
:::
|
||||
|
||||
MLOps is all about automation! We'll discuss the need for automation and the tools ClearML offers for automation, orchestration and tracking!<br/>
|
||||
ClearML provides tools for **automation**, **orchestration**, and **tracking**, all key in performing effective MLOps.
|
||||
|
||||
Effective MLOps relies on being able to scale work beyond one's own computer. Moving from your own machine can be inefficient,
|
||||
assuming that you have all the drivers and applications installed, you still need to manage multiple python environments
|
||||
for different packages / package versions, or worse - manage different dockers for different package versions.
|
||||
Effective MLOps relies on the ability to scale work beyond one's own computer. Moving from your own machine can be time-consuming.
|
||||
Even assuming that you have all the drivers and applications installed, you still need to manage multiple python environments
|
||||
for different packages / package versions, or worse - manage different Dockers for different package versions.
|
||||
|
||||
Not to mention, when working on remote machines, executing experiments and tracking what's running where and making sure machines are fully utilized at all times
|
||||
becomes a daunting task.
|
||||
Not to mention, when working on remote machines, executing experiments, tracking what's running where, and making sure machines
|
||||
are fully utilized at all times become daunting tasks.
|
||||
|
||||
This can create overhead that derails you from the core work!
|
||||
This can create overhead that derails you from your core work!
|
||||
|
||||
ClearML Agent was designed to deal with these and more! It is a module responsible for executing experiments
|
||||
on remote machines, on premise, or in the cloud!
|
||||
ClearML Agent was designed to deal with such issues and more! It is a tool responsible for executing experiments on r
|
||||
emote machines: on-premises or in the cloud! ClearML Agent provides the means to reproduce and track experiments in your
|
||||
machine of choice through the ClearML WebApp with no need for additional code.
|
||||
|
||||
It will set up the environment for the specific Task (inside a docker, or bare-metal) install the required python packages and execute & monitor the process itself.
|
||||
The agent will set up the environment for a specific Task’s execution (inside a Docker, or bare-metal), install the
|
||||
required python packages, and execute & monitor the process.
|
||||
|
||||
## Spin up an Agent
|
||||
|
||||
First, let's install the agent!
|
||||
## Set up an Agent
|
||||
|
||||
```bash
|
||||
pip install clearml-agent
|
||||
```
|
||||
1. Let's install the agent!
|
||||
|
||||
Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this:
|
||||
```bash
|
||||
pip install clearml-agent
|
||||
```
|
||||
|
||||
```bash
|
||||
clearml-agent init
|
||||
```
|
||||
1. Connect the agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this:
|
||||
|
||||
:::note
|
||||
If you've already created credentials, you can copy-paste the default agent section from [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L15) (this is optional. If the section is not provided the default values will be used)
|
||||
:::
|
||||
```bash
|
||||
clearml-agent init
|
||||
```
|
||||
|
||||
Start the agent's daemon. The agent will start pulling Tasks from the assigned queue (default in our case), and execute them one after the other.
|
||||
:::note
|
||||
If you've already created credentials, you can copy-paste the default agent section from [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L15) (this is optional. If the section is not provided the default values will be used)
|
||||
:::
|
||||
|
||||
1. Start the agent's daemon and assign it to a [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue).
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue default
|
||||
```
|
||||
|
||||
A queue is an ordered list of Tasks that are scheduled for execution. The agent will pull Tasks from its assigned
|
||||
queue (`default` in this case), and execute them one after the other. Multiple agents can listen to the same queue
|
||||
(or even multiple queues), but only a single agent will pull a Task to be executed.
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue default
|
||||
```
|
||||
|
||||
## Clone an Experiment
|
||||
Creating a new "job" to be executed, is essentially cloning a Task in the system, then enqueueing the Task in one of the execution queues for the agent to execute it.
|
||||
When cloning a Task we are creating another copy of the Task in a *draft* mode, allowing us to edit the Task's environment definitions.
|
||||
Experiments already in the system can be reproduced for validation, or used as a baseline for further experimentation.
|
||||
Cloning a task duplicates the task’s configuration, but not its outputs.
|
||||
|
||||
We can edit the git / code references, control the python packages to be installed, specify docker container image to be used, or change the hyper-parameters and configuration files.
|
||||
Once we are done, enqueuing the Task in one of the execution queues will put it in the execution queue.
|
||||
Multiple agents can listen to the same queue (or even multiple queues), but only a single agent will pick the Task to be executed.
|
||||
**To clone an experiment in the ClearML WebApp:**
|
||||
1. Click on any project card to open its [experiments table](../../webapp/webapp_exp_table.md)
|
||||
1. Right click one of the experiments on the table
|
||||
1. Click **Clone** in the context menu, which will open a **CLONE EXPERIMENT** window.
|
||||
1. Click **CLONE** in the window.
|
||||
|
||||
You can clone an experiments from our [examples](https://app.community.clear.ml/projects/764d8edf41474d77ad671db74583528d/experiments) project and enqueue it to a queue!
|
||||
The newly cloned experiment will appear and its info panel will slide open. The cloned experiment is in draft mode, so
|
||||
it can be modified. You can edit the Git / code references, control the python packages to be installed, specify the
|
||||
Docker container image to be used, or change the hyperparameters and configuration files. See [Modifying Experiments](../../webapp/webapp_exp_tuning.md#modifying-experiments) for more information about editing experiments in the UI.
|
||||
|
||||
### Accessing Previously Executed Experiments
|
||||
All executed Tasks in the system can be accessed based on the unique Task ID, or by searching for the Task based on its properties.
|
||||
For example:
|
||||
## Enqueue an Experiment
|
||||
Once you have set up an experiment, it is now time to execute it.
|
||||
|
||||
**To execute an experiment through the ClearML WebApp:**
|
||||
1. Right click your draft experiment (the context menu is also available through the <img src="/docs/latest/icons/ico-bars-menu.svg" className="icon size-md space-sm" />
|
||||
button on the top right of the experiment’s info panel)
|
||||
1. Click **ENQUEUE,** which will open the **ENQUEUE EXPERIMENT** window
|
||||
1. In the window, select `default` in the queue menu
|
||||
1. Click **ENQUEUE**
|
||||
|
||||
This action pushes the experiment into the `default` queue. The experiment's status becomes *Pending* until an agent
|
||||
assigned to the queue fetches it, at which time the experiment’s status becomes *Running*. The agent executes the
|
||||
experiment, and the experiment can be [tracked and its results visualized](../../webapp/webapp_exp_track_visual.md).
|
||||
|
||||
|
||||
## Programmatic Interface
|
||||
|
||||
The cloning, modifying, and enqueuing actions described above can also be performed programmatically.
|
||||
|
||||
### First steps
|
||||
#### Access Previously Executed Experiments
|
||||
All Tasks in the system can be accessed through their unique Task ID, or based on their properties using the [`Task.get_task`](../../references/sdk/task.md#taskget_task)
|
||||
method. For example:
|
||||
```python
|
||||
from clearml import Task
|
||||
executed_task = Task.get_task(task_id='aabbcc')
|
||||
```
|
||||
|
||||
## Log Hyperparameters
|
||||
Hyperparameters are an integral part of Machine Learning code as it lets you control the code without directly modifying it.
|
||||
Once a specific Task object has been obtained, it can be cloned, modified, and more. See [Advanced Usage](#advanced-usage).
|
||||
|
||||
Hyperparameters can be added from anywhere in your code, and ClearML supports [multiple](../../fundamentals/hyperparameters.md) ways to obtain them!
|
||||
#### Clone an Experiment
|
||||
|
||||
ClearML also allows users to change and track hyperparameter values without changing the code itself.
|
||||
When a cloned experiment is executed by an Agent, it will override the default values with new ones.
|
||||
To duplicate an experiment, use the [`Task.clone`](../../references/sdk/task.md#taskclone) method, and input either a
|
||||
Task object or the Task’s ID as the `source_task` argument.
|
||||
```python
|
||||
cloned_task = Task.clone(source_task=executed_task)
|
||||
```
|
||||
|
||||
It's also possible to programmatically change cloned experiments' parameters.
|
||||
#### Enqueue an Experiment
|
||||
To enqueue the task, use the [`Task.enqueue`](../../references/sdk/task.md#taskenqueue) method, and input the Task object
|
||||
with the `task` argument, and the queue to push the task into with `queue_name`.
|
||||
|
||||
```python
|
||||
Task.enqueue(task=cloned_task, queue_name='default')
|
||||
```
|
||||
|
||||
### Advanced Usage
|
||||
Before execution, there are a variety of programmatic methods which can be used to manipulate a task object.
|
||||
|
||||
#### Modify Hyperparameters
|
||||
[Hyperparameters](../../fundamentals/hyperparameters.md) are an integral part of Machine Learning code as they let you
|
||||
control the code without directly modifying it. Hyperparameters can be added from anywhere in your code, and ClearML supports multiple ways to obtain them!
|
||||
|
||||
Users can programmatically change cloned experiments' parameters.
|
||||
|
||||
For example:
|
||||
```python
|
||||
from clearml import Task
|
||||
cloned_task = Task.clone(task_id='aabbcc')
|
||||
cloned_task.set_parameter(name='internal/magic', value=42)
|
||||
Task.enqueue(cloned_task, queue_name='default')
|
||||
```
|
||||
|
||||
|
||||
## Logging Artifacts
|
||||
Artifacts are a great way to pass and reuse data between Tasks in the system.
|
||||
From anywhere in the code you can upload [multiple](../../fundamentals/artifacts.md#logging-artifacts) types of data, object and files.
|
||||
Artifacts are the base of ClearML's [Data Management](../../clearml_data/clearml_data.md) solution and as a way to communicate complex objects between different
|
||||
stages of a [pipeline](../../fundamentals/pipelines.md).
|
||||
#### Report Artifacts
|
||||
Artifacts are files created by your task. Users can upload [multiple types of data](../../fundamentals/artifacts.md#logging-artifacts),
|
||||
objects and files to a task anywhere from code.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
@ -97,9 +141,9 @@ Task.current_task().upload_artifact(name='a_file', artifact_object='local_file.b
|
||||
Task.current_task().upload_artifact(name='numpy', artifact_object=np.ones(4,4))
|
||||
```
|
||||
|
||||
Artifacts serve as a great way to pass and reuse data between tasks. Artifacts can be [retrieved](../../fundamentals/artifacts.md#using-artifacts)
|
||||
by accessing the Task that created them. These artifacts can be modified and uploaded to other tasks.
|
||||
|
||||
### Using Artifacts
|
||||
Artifacts can be retrieved by [accessing](../../fundamentals/artifacts.md#using-artifacts) the Task that created it.
|
||||
```python
|
||||
from clearml import Task
|
||||
executed_task = Task.get_task(task_id='aabbcc')
|
||||
@ -109,16 +153,20 @@ local_file = executed_task.artifacts['file'].get_local_copy()
|
||||
a_numpy = executed_task.artifacts['numpy'].get()
|
||||
```
|
||||
|
||||
### Models
|
||||
Models are a special type of artifact that's automatically logged.
|
||||
By facilitating the communication of complex objects between tasks, artifacts serve as the foundation of ClearML's [Data Management](../../clearml_data/clearml_data.md)
|
||||
and [pipeline](../../fundamentals/pipelines.md) solutions.
|
||||
|
||||
#### Log Models
|
||||
Logging models into the model repository is the easiest way to integrate the development process directly with production.
|
||||
Any model stored by a supported framework (Keras / TF / PyTorch / Joblib etc.) will be automatically logged into ClearML.
|
||||
|
||||
Any model stored by the supported frameworks (Keras / TF /PyTorch / Joblib) will be automatically logged into ClearML.
|
||||
Models can be automatically stored on a preferred storage medium (s3 bucket, google storage, etc...).
|
||||
ClearML also offers methods to explicitly log models. Models can be automatically stored on a preferred storage medium
|
||||
(s3 bucket, google storage, etc.).
|
||||
|
||||
#### Log Metrics
|
||||
Log as many metrics as you want from your processes using the [Logger](../../fundamentals/logger.md) module. This
|
||||
improves the visibility of your processes’ progress.
|
||||
|
||||
## Log Metrics
|
||||
Log as many metrics from your processes! It improves visibility on their progress.
|
||||
Use the Logger class from to report scalars and plots.
|
||||
```python
|
||||
from clearml import Logger
|
||||
Logger.current_logger().report_scalar(
|
||||
@ -129,19 +177,20 @@ Logger.current_logger().report_scalar(
|
||||
)
|
||||
```
|
||||
|
||||
You can later analyze reported scalars
|
||||
You can also retrieve reported scalars for programmatic analysis:
|
||||
```python
|
||||
from clearml import Task
|
||||
executed_task = Task.get_task(task_id='aabbcc')
|
||||
# get a summary of the min/max/last value of all reported scalars
|
||||
min_max_vlues = executed_task.get_last_scalar_metrics()
|
||||
min_max_values = executed_task.get_last_scalar_metrics()
|
||||
# get detailed graphs of all scalars
|
||||
full_scalars = executed_task.get_reported_scalars()
|
||||
```
|
||||
|
||||
## Track Experiments
|
||||
You can also search and query Tasks in the system.
|
||||
Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more!
|
||||
#### Query Experiments
|
||||
You can also search and query Tasks in the system. Use the [`Task.get_tasks`](../../references/sdk/task.md#taskget_tasks)
|
||||
method to retrieve Task objects and filter based on the specific values of the Task - status, parameters, metrics and more!
|
||||
|
||||
```python
|
||||
from clearml import Task
|
||||
tasks = Task.get_tasks(
|
||||
@ -151,13 +200,12 @@ tasks = Task.get_tasks(
|
||||
)
|
||||
```
|
||||
|
||||
## Manage Your Data
|
||||
Data is probably one of the biggest factors that determines the success of a project.
|
||||
Associating the data a model used to the model's configuration, code and results (such as accuracy) is key to deducing meaningful insights into how
|
||||
models behave.
|
||||
#### Manage Your Data
|
||||
Data is probably one of the biggest factors that determines the success of a project. Associating a model’s data with
|
||||
the model's configuration, code, and results (such as accuracy) is key to deducing meaningful insights into model behavior.
|
||||
|
||||
[ClearML Data](../../clearml_data/clearml_data.md) allows you to version your data so it's never lost, fetch it from every machine with minimal code changes,
|
||||
and associate data to experiment results.
|
||||
|
||||
Logging data can be done via command line, or via code. If any preprocessing code is involved, ClearML logs it as well! Once data is logged, it can be used by other experiments.
|
||||
[ClearML Data](../../clearml_data/clearml_data.md) allows you to version your data, so it's never lost, fetch it from every
|
||||
machine with minimal code changes, and associate data to experiment results.
|
||||
|
||||
Logging data can be done via command line, or programmatically. If any preprocessing code is involved, ClearML logs it
|
||||
as well! Once data is logged, it can be used by other experiments.
|
||||
|
Loading…
Reference in New Issue
Block a user