Initial commit

This commit is contained in:
allegroai
2021-05-14 02:48:51 +03:00
parent dc5a4e8a0d
commit 77c9a91a95
645 changed files with 37481 additions and 14 deletions

View File

@@ -0,0 +1,39 @@
---
title: Best Practices
---
In short - **automate everything** :)
From training models to data processing to deploying to production.
## Development - Preparing for Automation
Basically track everything, there is nothing that is not worth having visibility to.
If you are afraid of clutter, use the archive option, and set up your own cleanup service (see [here](../../guides/services/cleanup_service) how)
- Track the code base. there is no reason not to add metrics to any process in your workflow, even if it is not directly ML. Visibility is key to iterative improvement of your code \ workflow.
- Create per-project [leader-boards](../../webapp/webapp_exp_track_visual.md) based on custom columns
(hyper parameters and performance accuracy), and bookmark them (full URL will always reproduce the same view & table).
- Share experiments with your colleagues and team-leaders.
Invite more people to see how your project is progressing, and suggest they add metric reporting for their own.
These metrics can later be part of your own in-house monitoring solution, don't let good data go to waste :)
## Clone Tasks
In order to define a Task in ClearML we have two options
- Run the actual code with task.init call. This will create and auto-populate the Task in CleaML (including Git Repo/Python Packages/ Command line etc.)
- Register local/remote code repository with `clearml-task`. Dee [details](../../apps/clearml_task.md)
Once we have a Task in ClearML, we can clone and edit its definition in the UI. Then launch it on one of our nodes with [ClearML Agent](../../clearml_agent.md)
## Advanced Automation
- Create daily/weekly cron jobs for retraining best performing models on.
- Create data monitoring & scheduling and launch inference jobs to test performance on any new coming dataset.
- Once there are two or more experiments that run after another, group them together into a [pipeline](../../fundamentals/pipelines.md)
## Manage your data
Use [ClearML Data](../../clearml_data.md) to version your data, then link it to running experiments for easy reproduction.
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure)
ClearML Data supports efficient Dataset storage and caching, differentiable & compressed
## Scale Your Work
Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (Remote or local) and manage
training workload with it. <br/>
Improve team collaboration by transparent resource monitoring, always know what is running where.

View File

@@ -0,0 +1,146 @@
---
title: First Steps
---
:::note
This tutorial assumes that you've already [signed up](https://app.community.clear.ml) to ClearML
:::
MLOps is all about automation! We'll discuss the need for automation and the Tools ClearML offers for automation, orchestration and tracking!<br/>
Effective MLOps relies on being able to scale work beyond one's own computer. Moving from your own machine can be inefficient,
assuming that you have all the drivers and applications installed, you still need to manage multiple python environments
for different packages \ package versions, or worst - manage different docker for different package versions.<br/>
Not to mention, when working on remote machines, executing experiments and tracking what's running where and making sure they are fully utilized at all times
becomes a daunting task.<br/>
This can create overhead that derails you from the core work!
ClearML Agent was designed to deal with these and more! It is a module responsible executing experiments,
on remote machines, on premise or in the cloud!<br/>
It will setup the environment for the specific Task (inside a docker, or bare-metal) install the required python packages and execute & monitor the process itself.
## Spin up an Agent
First, let's install the agent!
```bash
pip install clearml-agent
```
Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this:
```bash
clearml-init
```
:::note
If you've already created credentials, you can copy-paste the default agent section from [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L15) (this is obviously optional if the section is not provided the default values will be used)
:::
Start the agent's daemon. The agent will start pulling Tasks from the assigned queue(default in our case), and execute them one after the other.
```bash
clearml-agent daemon --queue default
```
## Clone an Experiment
Creating a new "job" to be executed, is essentially cloning a Task in the system, then enqueueing the Task in one of the execution queues for the agent to execute it.
When cloning a Task we are creating another copy of the Task in a *draft* mode, allowing us to edit the Task's environment definitions. <br/>
We can edit the git \ code references, control the python packages to be installed, specify docker container image to be used, or change the hyper-parameters and configuration files.
Once we are done, enqueuing the Task in one of the execution queues will put it in the execution queue.
Multiple agents can listen to the same queue (or even multiple queues), but only a single agent will pick the Task to be executed.
You can clone an experiments from our [examples](https://app.community.clear.ml/projects/764d8edf41474d77ad671db74583528d/experiments) project and enqueue it to a queue!
### Accessing Previously Executed Experiments
All executed Tasks in the system can be accessed based on the unique Task ID, or by searching for the Task based on its properties.
For example:
```python
from clearml import Task
executed_task = Task.get_task(task_id='aabbcc')
```
## Log Hyperparameters
Hyperparameters are an integral part of Machine Learning code as it lets you control the code without directly modifying it.<br/>
Hyperparameters can be added from anywhere in your code, and ClearML supports [multiple](../../fundamentals/hyperparameters.md) ways to obtain them!
ClearML also allows users to change and track hyperparameter values without changing the code itself.
When a cloned experiment is executed by an Agent, it will override the default values with new ones.
It's also possible to programatically change cloned experiments' parameters
For example:
```python
from clearml import Task
cloned_task = Task.clone(task_id='aabbcc')
cloned_task.set_parameter(name='internal/magic', value=42)
Task.enqueue(cloned_task, queue_name='default')
```
## Logging Artifacts
Artifacts are a great way to pass and reuse data between Tasks in the system.
From anywhere in the code you can upload [multiple](../../fundamentals/artifacts.md#logging-artifacts) types of data, object and files.
Artifacts are the base of ClearML's [Data Management](../../clearml_data.md) solution and as a way to communicate complex objects between different
stages of a [pipeline](../../fundamentals/pipelines.md)
```python
import numpy as np
from clearml import Task
Task.current_task().upload_artifact(name='a_file', artifact_object='local_file.bin')
Task.current_task().upload_artifact(name='numpy', artifact_object=np.ones(4,4))
```
### Using Artifacts
Artifacts can be retrieved by [accessing](../../fundamentals/artifacts.md#uing-artifacts) the Task that created it.
```python
from clearml import Task
executed_task = Task.get_task(task_id='aabbcc')
# artifact as a file
local_file = executed_task.artifacts['file'].get_local_copy()
# artifact as object
a_numpy = executed_task.artifacts['numpy'].get()
```
### Models
Model are a special type of artifact that's automatically logged.
Logging models into the model repository is the easiest way to integrate the development process directly with production.<br/>
Any model stored by the supported frameworks (Keras \ TF \PyTorch \ Joblib) will be automatically logged into ClearML.
Models can be automatically stored on a preferred storage medium (s3 bucket, google storage, etc...).
## Log Metrics
Log as many metrics from your processes! It improves visibility on their progress.
Use the Logger class from to report scalars and plots.
```python
from clearml import Logger
Logger.current_logger().report_scalar(graph='metric', series='variant', value=13.37, iteration=counter)
```
You can later analyze reported scalars
```python
from clearml import Task
executed_task = Task.get_task(task_id='aabbcc')
# get a summary of the min/max/last value of all reported scalars
min_max_vlues = executed_task.get_last_scalar_metrics()
# get detialed graphs of all scalars
full_scalars = executed_task.get_reported_scalars()
```
## Track Experiments
You can also search and query Tasks in the system.
Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more!
```python
from clearml import Task
tasks = Task.get_tasks(project_name='examples', task_name='partial_name_match', task_filter={'status': 'in_proress'})
```
## Manage Your Data
Data is probably one of the biggest factors that determines the success of a project.
Associating the data a model used to the model's configuration, code and results (such as accuracy) is key to deducing meaningful insights into how
models behave. <br/>
[ClearML Data](../../clearml_data.md) allows you to version your data so it's never lost, fetch it from every machine with minimal code changes
and associate data to experiments results.
Logging data can be done via command line, or via code. If any preprocessing code is involved, ClearML logs it as well!<br/>
Once data is logged, it can be used by other experiments.

View File

@@ -0,0 +1,103 @@
---
title: Next Steps
---
Once Tasks are defined and in the ClearML system, they can be chained together to create Pipelines.
Pipelines provide users with a greater level of abstraction and automation, with Tasks running one after the other.<br/>
Tasks can interface with other Tasks in the pipeline and leverage other Tasks' work products.<br/>
We'll go through a scenario where users create a Dataset, process the data then consume it with another task, all running as a pipeline.
## Building Tasks
### Dataset Creation
Let's assume we have some code that extracts data from a production Database into a local folder.
Our goal is to create an immutable copy of the data to be used by further steps:
```bash
clearml-data create --project data --name dataset
clearml-data sync --folder ./from_production
```
We could also add a Tag `latest` to the Dataset, marking it as the latest version.
### Preprocessing Data
The second step is to preprocess the date. First we need to access it, then we want to modify it
and lastly we want to create a new version of the data.
```python
# create a task for the data processing part
task = Task.init(project_name='data', task_name='ingest', task_type='data_processing')
# get the v1 dataset
dataset = Dataset.get(dataset_project='data', dataset_name='dataset_v1')
# get a local mutable copy of the dataset
dataset_folder = dataset.get_mutable_local_copy(target_folder='work_dataset', overwrite=True)
# change some files in the `./work_dataset` folder
...
# create a new version of the dataset with the pickle file
new_dataset = Dataset.create(
dataset_project='data', dataset_name='dataset_v2',
parent_datasets=[dataset],
use_current_task=True, # this will make sure we have the creation code and the actual dataset artifacts on the same Task
)
new_dataset.sync_folder(local_path=dataset_folder)
new_dataset.upload()
new_dataset.finalize()
# now let's remove the previous dataset tag
dataset.tags = []
new_dataset.tags = ['latest']
```
We passed the `parents` argument when we created v2 of the Dataset, this inherits all the parent's version content.
This will not only help us in tracing back dataset changes with full genealogy, but will also make our storage more efficient,
as it will only store the files that were changed \ added from the parent versions.
When we will later need access to the Dataset it will automatically merge the files from all parent versions
in a fully automatic and transparent process, as if they were always part of the requested Dataset.
### Training
We can now train our model with the **latest** Dataset we have in the system.
We will do that by getting the instance of the Dataset based on the `latest` tag
(if by any chance we have two Datasets with the same tag we will get the newest).
Once we have the dataset we can request a local copy of the data. All local copy requests are cached,
which means that if we are accessing the same dataset multiple times we will not have any unnecessary downloads.
```python
# create a task for the model training
task = Task.init(project_name='data', task_name='ingest', task_type='training')
# get the latest dataset with the tag `latest`
dataset = Dataset.get(dataset_tags='latest')
# get a cached copy of the Dataset files
dataset_folder = dataset.get_local_copy()
# train our model here
```
## Building the Pipeline
Now that we have the data creation step, and the data training step, let's create a pipeline that when executed,
will first run the first and then run the second.
It is important to remember that pipelines are Tasks by themselves and can also be automated by other pipelines (i.e. pipelines of pipelines).
```python
pipe = PipelineController(
always_create_task=True,
pipeline_project='data', pipeline_name='pipeline demo',
)
pipe.add_step(
name='step 1 data',
base_task_id='cbc84a74288e459c874b54998d650214', # Put the task ID here
)
pipe.add_step(
name='step 2 train',
parents=['step 1 data', ],
base_task_id='cbc84a74288e459c874b54998d650214', # Put the task ID here
)
```
We could also pass the parameters from one step to the other (for example `Task.id`).
See more in the full pipeline documentation [here](../../fundamentals/pipelines.md).