mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Initial commit
This commit is contained in:
12
docs/getting_started/architecture.md
Normal file
12
docs/getting_started/architecture.md
Normal file
@@ -0,0 +1,12 @@
|
||||
---
|
||||
title: ClearML Modules
|
||||
---
|
||||
|
||||
- **ClearML Python Package** (clearml) for integrating **ClearML** into your existing code-base.
|
||||
- **ClearML Server** (clearml-server) storing experiment, model, and workflow data, and supporting the Web UI experiment manager. It is also the control plane for the ML-Ops.
|
||||
- **ClearML Agent** (clearml-agent) The ML-Ops orchestration agent. Enabling experiment and workflow reproducibility, and scalability.
|
||||
- **ClearML Data** (clearml-data) data management and versioning on top of file-systems/object-storage.
|
||||
- **ClearML Session** (clearml-session) Launch remote instances of Jupyter Notebooks and VSCode.
|
||||
solutions combined with the clearml-server control plain.
|
||||
|
||||

|
||||
70
docs/getting_started/ds/best_practices.md
Normal file
70
docs/getting_started/ds/best_practices.md
Normal file
@@ -0,0 +1,70 @@
|
||||
---
|
||||
title: Best Practices
|
||||
---
|
||||
|
||||
This section talks about what made us design ClearML the way we did and how does it reflect on ML \ DL workflows.
|
||||
While ClearML was designed to fit into any workflow, we do feel that working as we describe below brings a lot of advantages from organizing one's workflow
|
||||
and furthermore, preparing it to scale in the long term.
|
||||
|
||||
:::important
|
||||
The below is only our opinion. ClearML was designed to fit into any workflow whether it conforms to our way or not!
|
||||
:::
|
||||
|
||||
## Develop Locally
|
||||
|
||||
**Work on a machine that is easily managable!**
|
||||
|
||||
During early stages of model development, while code is still being modified heavily, this is the usual setup we'd expect to see used by data scientists:
|
||||
|
||||
- A local development machine, usually a laptop (and usually using only CPU) with a fraction of the dataset for faster iterations - this is used for writing the training pipeline code, ensuring it knows to parse the data
|
||||
and there are no glaring bugs.
|
||||
- A workstation with a GPU, usually with a limited amount of memory for small batch-sizes. This is used to train the model and ensure the model we chose makes sense and that the training
|
||||
procedure works. Can be used to provide initial models for testing.
|
||||
|
||||
The abovementioned setups might be folded into each other and that's great! If you have a GPU machine for each researcher that's awesome!
|
||||
The goal of this phase is to get a code, dataset and environment set-up so we can start digging to find the best model!
|
||||
|
||||
- [ClearML SDK](../../clearml_sdk.md) should be integrated into your code (Check out our [getting started](ds_first_steps.md)).
|
||||
This helps visualizing the results and track progress.
|
||||
- [ClearML Agent](../../clearml_agent.md) helps moving your work to other machines without the hassle of rebuilding the environment every time,
|
||||
while also creating an easy queue interface that easily allows you to just drop your experiments to be executed one by one
|
||||
(Great for ensuring that the GPUs are churning during the weekend).
|
||||
- [ClearML Session](../../apps/clearml_session.md) helps with developing on remote machines, just like you'd develop on you local laptop!
|
||||
|
||||
## Train Remotely
|
||||
|
||||
In this phase, we scale our training efforts, and try to come up with the best code \ parameter \ data combination that
|
||||
yields the best performing model for our task!
|
||||
|
||||
- The real training (usually) should **not** be executed on your development machine.
|
||||
- Training sessions should be launched and monitored from a web UI.
|
||||
- You should continue coding while experiments are being executed without interrupting them.
|
||||
- Stop optimizing your code because your machine struggles, and run it on a beefier machine (cloud \ on-prem).
|
||||
|
||||
Visulization and comparisons dashboards keep your sanity at bay! In this stage we usually have a docker container with all the binaries
|
||||
that we need.
|
||||
- [ClearML SDK](../../clearml_sdk.md) ensures that all the metrics, parameters and Models are automatically logged and can later be
|
||||
accessed, [compared](../../webapp/webapp_exp_comparing.md) and [tracked](../../webapp/webapp_exp_track_visual.md).
|
||||
- [ClearML Agent](../../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code
|
||||
, apply code patches, manage parameters (Including overriding them on the fly), execute the code and queue multiple tasks
|
||||
It can even [build](../../clearml_agent.md#buildingdockercontainers) the docker container for you!
|
||||
-[ClearML Pipelines](../../fundamentals/pipelines.md) ensures that steps run in the same order,
|
||||
programatically chaining tasks together, while giving an overview of the execution pipeline's status.<br/>
|
||||
|
||||
**Your entire environment should magically be able to run on any machine, without you working hard.**
|
||||
|
||||
## Track EVERYTHING
|
||||
|
||||
We believe that you should track everything! From obscure parameters to weird metrics, it's impossible to know what will end up
|
||||
improving our results later on!
|
||||
|
||||
- Make sure experiments are reproducible! ClearML logs code, parameters, environment in a single, easily searchable place.
|
||||
- Development is not linear. Configuration \ Parameters should not be stored in your git
|
||||
they are temporary, and we constantly change them. But we still need to log them because who knows one day...
|
||||
- Uncommitted changes to your code should be stored for later forensics in case that magic number actually saved the day. Not every line change should be committed.
|
||||
- Mark potentially good experiments, make them the new baseline for comparison.
|
||||
|
||||
## Visibility Matters
|
||||
|
||||
While it's possible to track experiments with one tool, and pipeline them with another, we believe that having
|
||||
everything under the same roof benefits you great! It's
|
||||
52
docs/getting_started/ds/ds_first_steps.md
Normal file
52
docs/getting_started/ds/ds_first_steps.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: First Steps
|
||||
---
|
||||
|
||||
|
||||
## Install ClearML
|
||||
|
||||
First, [sign up for free](https://app.community.clear.ml)
|
||||
|
||||
Install the clearml python package:
|
||||
```bash
|
||||
pip install clearml
|
||||
```
|
||||
|
||||
Connect your computer to the server by [creating credentials](https://app.community.clear.ml/profile), then run the below and follow the setup instructions:
|
||||
```bash
|
||||
clearml-init
|
||||
```
|
||||
|
||||
|
||||
## Auto-log experiment
|
||||
|
||||
In ClearML, experiments are organized as [Tasks](../../fundamentals/task).
|
||||
|
||||
ClearML will automatically log your experiment and code once you integrate the ClearML [SDK](../../clearml_sdk.md) with your code.
|
||||
At the begging of your code, import the clearml package
|
||||
|
||||
```python
|
||||
From clearml import Task
|
||||
```
|
||||
|
||||
:::note
|
||||
To ensure full automatic logging it is recommended to import the ClearML package at the top of your entry script.
|
||||
:::
|
||||
|
||||
Then initialize the Task object in your `main()` function, or the beginning of the script.
|
||||
|
||||
```python
|
||||
Task = Task.init(project_name=”great project”, task_name=”best experiment”)
|
||||
```
|
||||
|
||||
Task name is not unique, it's possible to have multiple experiments with the same name.
|
||||
If the project does not already exist, a new one will be created automatically.
|
||||
|
||||
|
||||
**That’s it!** You are done integrating ClearML with your code :)
|
||||
|
||||
Now, [command-line arguments](../../fundamentals/hyperparameters.md#argument-parser), [console output](../../fundamentals/logger#types-of-logged-results) as well as Tensorboard and Matplotlib will automatically be logged in the UI under the created Task.
|
||||
<br/>
|
||||
|
||||
Sit back, relax, and watch your models converge :) or continue to see what else can be done with ClearML [here](ds_second_steps.md).
|
||||
|
||||
170
docs/getting_started/ds/ds_second_steps.md
Normal file
170
docs/getting_started/ds/ds_second_steps.md
Normal file
@@ -0,0 +1,170 @@
|
||||
---
|
||||
title: Next Steps
|
||||
---
|
||||
|
||||
So, we've [already](ds_first_steps.md) installed ClearML's python package and ran our first experiment!
|
||||
|
||||
Now, we'll learn how to track Hyperparameters, Artifacts and Metrics!
|
||||
|
||||
## Accessing Experiments
|
||||
|
||||
Every previously executed experiment is stored as a Task.
|
||||
A Task has a project and a name, both can be changed after the experiment has been executed.
|
||||
A Task is also automatically assigned an auto-generated unique identifier (UUID string) that cannot be changed and will always locate the same Task in the system.<br/>
|
||||
It's possible to retrieve a Task object programmatically by querying the system based on either the Task ID,
|
||||
or project & name combination. It's also possible to query tasks based on their properties, like Tags.
|
||||
|
||||
``` python
|
||||
prev_task = Task.get_task(task_id=’123456deadbeef’)
|
||||
```
|
||||
|
||||
Once we have a Task object we can query the state of the Task, get its Model, scalars, parameters, etc.
|
||||
|
||||
## Log Hyperparameters
|
||||
|
||||
For full reproducibility, it's paramount to save Hyperparameters for each experiment. Since Hyperparameters can have substantial impact
|
||||
on Model performance, saving and comparing these between experiments is sometimes the key to understand model behavior.
|
||||
|
||||
ClearML supports logging `argparse` module arguments out of the box, so once integrating it into the code, it will automatically log all parameters provided to the argument parser.<br/>
|
||||
It's also possible to log parameter dictionaries (very useful when parsing an external config file and storing as a dict object),
|
||||
whole configuration files or even custom objects or [Hydra](https://hydra.cc/docs/intro/) configurations!
|
||||
|
||||
```python
|
||||
params_dictionary = {'epochs': 3, 'lr': 0.4}
|
||||
task.connect(params_dictionary)
|
||||
```
|
||||
|
||||
Check [this](../../fundamentals/hyperparameters.md) out for all Hyperparameter logging options.
|
||||
|
||||
## Log Artifacts
|
||||
|
||||
ClearML allows you to easily store the output products of an experiment - Model snapshot \ weights file, a preprocessing of your data, feature representation of data and more!
|
||||
|
||||
Essentially artifacts are files (or python objects) uploaded from a script and are stored alongside the Task.
|
||||
These Artifacts can be easily accessed by the web UI or programmatically.
|
||||
Artifacts can be stored anywhere, either on the ClearML server, or any object storage solution or shared folder.<br/>
|
||||
see all [storage capabilities](../../integrations/storage).
|
||||
|
||||
|
||||
### Adding artifacts
|
||||
|
||||
Uploading a local file containing the preprocessed results of the data:
|
||||
```python
|
||||
task.upload_artifact(‘/path/to/preprocess_data.csv’, name=’data’)
|
||||
```
|
||||
|
||||
We can also upload an entire folder with all its content by passing the folder (the folder will be zipped and uploaded as a single zip file)
|
||||
```python
|
||||
task.upload_artifact(‘/path/to/folder/’, name=’folder’)
|
||||
```
|
||||
|
||||
Lastly we can upload an instance of an object, Numpy/Pandas/PIL Images are supported with npz/csv.gz/jpg formats accordingly.
|
||||
If the object type is unknown ClearML pickles it and uploads the pickle file.
|
||||
|
||||
```python
|
||||
task.upload_artifacts(my_numpy_matrix, name=’features’)
|
||||
```
|
||||
|
||||
Check out all [artifact logging](../../fundamentals/artifacts.md) options.
|
||||
|
||||
### Using Artifacts
|
||||
|
||||
Logged Artifacts can be used by other Tasks, whether it's a pretrains Model or processed data.
|
||||
To use an Artifact, first we have to get an instance of the Task that originally created it,
|
||||
then we either download it and get it's path, or get the Artifact object directly.<br/>
|
||||
For example, using a previously generated preprocessed data.
|
||||
|
||||
```python
|
||||
preprocess_task = Task.get_task(task_id=’preprocessing_task_id’)
|
||||
local_csv = preprocess_task.artifacts[’data’].get_local_copy()
|
||||
```
|
||||
|
||||
The `task.artifacts` is a dictionary where the keys are the Artifacts names and the returned object is the Artifact object.
|
||||
Calling ‘get_local_copy()’ will return a local cached copy of the artifact,
|
||||
this means that the next time we execute the code we will not need to download the artifact again.
|
||||
Calling 'get()' will get a deserialized pickled object.
|
||||
Check out the [artifacts retrieval](https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py) example code.
|
||||
|
||||
### Models
|
||||
|
||||
Models are a special kind artifact.
|
||||
Models created by popular frameworks (such as Pytorch, Tensorflow, Scikit-learn) are automatically logged by ClearML.
|
||||
All snapshots are automatically logged, in order to make sure we also automatically upload the model snapshot (instead of saving its local path)
|
||||
we need to pass a storage location for the model files to be uploaded to.
|
||||
|
||||
For example uploading all snapshots to our S3 bucket:
|
||||
```python
|
||||
task = Task.init(project_name=’examples’, task_name=’storing model’, output_uri=’s3://my_models/’)
|
||||
```
|
||||
|
||||
From now on, whenever the framework (TF/Keras/PyTroch etc.) will be storing a snapshot, the model file will automatically get uploaded to our bucket under a specific folder for the experiment.
|
||||
|
||||
Loading models by a framework is also logged by the system, these models appear under the “Input Models” section, under the Artifacts tab.
|
||||
|
||||
Check out model snapshots examples for [TF](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py),
|
||||
[PyTorch](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py),
|
||||
[Keras](https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py),
|
||||
[Scikit-Learn](https://github.com/allegroai/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_joblib_example.py).
|
||||
|
||||
#### Loading Models
|
||||
Loading a previously trained model is quite similar to loading artifacts.
|
||||
|
||||
```python
|
||||
prev_task = Task.get_task(task_id=’the_training_task’)
|
||||
last_snapshot = prev_task.models[‘output’][-1]
|
||||
local_weights_path = last_snapshot.get_local_copy()
|
||||
```
|
||||
|
||||
Like before we have to get the instance of the Task training the original weights files, then we can query the task for its output models (a list of snapshots), and get the latest snapshot.
|
||||
:::note
|
||||
Using Tensorflow, the snapshots are stored in a folder, meaning the ‘local_weights_path’ will point to a folder containing our requested snapshot.
|
||||
:::
|
||||
As with Artifacts all models are cached, meaning the next time we will run this code, no model will need to be downloaded.
|
||||
Once one of the frameworks will load the weights file, the running Task will be automatically updated with “Input Model” pointing directly to the original training Task’s Model.
|
||||
This feature allows you to easily get a full genealogy of every trained and used model by your system!
|
||||
|
||||
## Log Metrics
|
||||
|
||||
Full metrics logging is the key to finding the best performing model!
|
||||
By default, everything that's reported to Tensorboard & Matplotlib is automatically captured and logged.<br/>
|
||||
Since not all metrics are tracked that way, it's also possible to manually report metrics using the `logger` object.<br/>
|
||||
It's possible to log everything, from time series data to confusion matrices to HTML, Audio and Video, to custom plotly graphs! Everything goes!<br/>
|
||||
|
||||

|
||||
|
||||
Once everything is neatly logged and displayed, using the [comparison tool](../../webapp/webapp_exp_comparing) makes it easy to find the best configuration!
|
||||
|
||||
|
||||
## Track Experiments
|
||||
|
||||
The experiment table is a powerful tool for creating dashboards and views of your own projects, your team's projects, or the entire development.
|
||||
|
||||

|
||||
|
||||
|
||||
### Creating Leaderboards
|
||||
The [experiments table](../../webapp/webapp_exp_table.md) can be customized to your own needs, adding desired views of parameters, metrics and tags.
|
||||
It's possible to filter and sort based on parameters and metrics, so creating custom views is simple and flexible.
|
||||
|
||||
Create a dashboard for a project, presenting the latest Models and their accuracy scores, for immediate insights.
|
||||
|
||||
It can also be used as a live leaderboard, showing the best performing experiments' status, updated in real time.
|
||||
This is helpful to monitor your projects' progress, and share it across the organization.<br/>
|
||||
Any page is sharable by copying the URL from the address bar, allowing you to bookmark leaderboards or send an exact view of a specific experiment or a comparison view.<br/>
|
||||
It's also possible to tag Tasks for visibility and filtering allowing you to add more information on the execution of the experiment.
|
||||
Later you can search based on task name and tag in the search bar, and filter experiments based on their tags, parameters, status and more.
|
||||
|
||||
## What's next?
|
||||
|
||||
This covers the Basics of ClearML! Running through this guide we've learned how to log Parameters, Artifacts and Metrics!
|
||||
|
||||
If you want to learn more look at how we see the data science process in our [best practices](best_practices.md) page,
|
||||
or check these pages out:
|
||||
|
||||
- Scale you work and deploy [ClearML Agents](../../clearml_agent.md)
|
||||
- Develop on remote machines with [ClearML Session](../../apps/clearml_session.md)
|
||||
- Structure your work and put it into [Pipelines](../../fundamentals/pipelines.md)
|
||||
- Improve your experiments with [HyperParameter Optimization](https://github.com/allegroai/clearml/tree/master/examples/optimization/hyper-parameter-optimization)
|
||||
- Check out ClearML's integrations to [external libraries](../../integrations/libraries.md).
|
||||
|
||||
|
||||
21
docs/getting_started/main.md
Normal file
21
docs/getting_started/main.md
Normal file
@@ -0,0 +1,21 @@
|
||||
---
|
||||
id: main
|
||||
title: What is ClearML?
|
||||
slug: /
|
||||
---
|
||||
|
||||
ClearML is an open source platform that automates and simplifies developing and managing machine learning solutions
|
||||
for thousands of data science teams all over the world.
|
||||
It is designed as an end-to-end MLOps suite allowing you to focus on developing your ML code & automation,
|
||||
while ClearML ensures your work is reproducible and scalable.
|
||||
|
||||
## What can you do with ClearML?
|
||||
|
||||
- Track and upload metrics and models with only 2 lines of code
|
||||
- Create a bot that sends you a slack message whenever you model improves in accuracy
|
||||
- Automatically scale AWS instances according to your resources needs
|
||||
- Reproduce experiments with 3 mouse clicks
|
||||
- Much More!
|
||||
|
||||
#### Who We Are
|
||||
ClearML is supported by you :heart: and by the team behind [allegro.ai](https://www.allegro.ai) , where we build even more MLOps for enterprise companies.
|
||||
39
docs/getting_started/mlops/mlops_best_practices.md
Normal file
39
docs/getting_started/mlops/mlops_best_practices.md
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
title: Best Practices
|
||||
---
|
||||
|
||||
In short - **automate everything** :)
|
||||
From training models to data processing to deploying to production.
|
||||
|
||||
## Development - Preparing for Automation
|
||||
Basically track everything, there is nothing that is not worth having visibility to.
|
||||
If you are afraid of clutter, use the archive option, and set up your own cleanup service (see [here](../../guides/services/cleanup_service) how)
|
||||
|
||||
- Track the code base. there is no reason not to add metrics to any process in your workflow, even if it is not directly ML. Visibility is key to iterative improvement of your code \ workflow.
|
||||
- Create per-project [leader-boards](../../webapp/webapp_exp_track_visual.md) based on custom columns
|
||||
(hyper parameters and performance accuracy), and bookmark them (full URL will always reproduce the same view & table).
|
||||
- Share experiments with your colleagues and team-leaders.
|
||||
Invite more people to see how your project is progressing, and suggest they add metric reporting for their own.
|
||||
These metrics can later be part of your own in-house monitoring solution, don't let good data go to waste :)
|
||||
|
||||
## Clone Tasks
|
||||
In order to define a Task in ClearML we have two options
|
||||
- Run the actual code with ‘task.init’ call. This will create and auto-populate the Task in CleaML (including Git Repo/Python Packages/ Command line etc.)
|
||||
- Register local/remote code repository with `clearml-task`. Dee [details](../../apps/clearml_task.md)
|
||||
|
||||
Once we have a Task in ClearML, we can clone and edit its definition in the UI. Then launch it on one of our nodes with [ClearML Agent](../../clearml_agent.md)
|
||||
|
||||
## Advanced Automation
|
||||
- Create daily/weekly cron jobs for retraining best performing models on.
|
||||
- Create data monitoring & scheduling and launch inference jobs to test performance on any new coming dataset.
|
||||
- Once there are two or more experiments that run after another, group them together into a [pipeline](../../fundamentals/pipelines.md)
|
||||
|
||||
## Manage your data
|
||||
Use [ClearML Data](../../clearml_data.md) to version your data, then link it to running experiments for easy reproduction.
|
||||
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure)
|
||||
ClearML Data supports efficient Dataset storage and caching, differentiable & compressed
|
||||
|
||||
## Scale Your Work
|
||||
Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (Remote or local) and manage
|
||||
training workload with it. <br/>
|
||||
Improve team collaboration by transparent resource monitoring, always know what is running where.
|
||||
146
docs/getting_started/mlops/mlops_first_steps.md
Normal file
146
docs/getting_started/mlops/mlops_first_steps.md
Normal file
@@ -0,0 +1,146 @@
|
||||
---
|
||||
title: First Steps
|
||||
---
|
||||
|
||||
:::note
|
||||
This tutorial assumes that you've already [signed up](https://app.community.clear.ml) to ClearML
|
||||
:::
|
||||
|
||||
MLOps is all about automation! We'll discuss the need for automation and the Tools ClearML offers for automation, orchestration and tracking!<br/>
|
||||
|
||||
Effective MLOps relies on being able to scale work beyond one's own computer. Moving from your own machine can be inefficient,
|
||||
assuming that you have all the drivers and applications installed, you still need to manage multiple python environments
|
||||
for different packages \ package versions, or worst - manage different docker for different package versions.<br/>
|
||||
Not to mention, when working on remote machines, executing experiments and tracking what's running where and making sure they are fully utilized at all times
|
||||
becomes a daunting task.<br/>
|
||||
This can create overhead that derails you from the core work!
|
||||
|
||||
ClearML Agent was designed to deal with these and more! It is a module responsible executing experiments,
|
||||
on remote machines, on premise or in the cloud!<br/>
|
||||
It will setup the environment for the specific Task (inside a docker, or bare-metal) install the required python packages and execute & monitor the process itself.
|
||||
|
||||
## Spin up an Agent
|
||||
|
||||
First, let's install the agent!
|
||||
|
||||
```bash
|
||||
pip install clearml-agent
|
||||
```
|
||||
|
||||
Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this:
|
||||
|
||||
```bash
|
||||
clearml-init
|
||||
```
|
||||
|
||||
:::note
|
||||
If you've already created credentials, you can copy-paste the default agent section from [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L15) (this is obviously optional if the section is not provided the default values will be used)
|
||||
:::
|
||||
|
||||
Start the agent's daemon. The agent will start pulling Tasks from the assigned queue(default in our case), and execute them one after the other.
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue default
|
||||
```
|
||||
|
||||
## Clone an Experiment
|
||||
Creating a new "job" to be executed, is essentially cloning a Task in the system, then enqueueing the Task in one of the execution queues for the agent to execute it.
|
||||
When cloning a Task we are creating another copy of the Task in a *draft* mode, allowing us to edit the Task's environment definitions. <br/>
|
||||
We can edit the git \ code references, control the python packages to be installed, specify docker container image to be used, or change the hyper-parameters and configuration files.
|
||||
Once we are done, enqueuing the Task in one of the execution queues will put it in the execution queue.
|
||||
Multiple agents can listen to the same queue (or even multiple queues), but only a single agent will pick the Task to be executed.
|
||||
|
||||
You can clone an experiments from our [examples](https://app.community.clear.ml/projects/764d8edf41474d77ad671db74583528d/experiments) project and enqueue it to a queue!
|
||||
|
||||
### Accessing Previously Executed Experiments
|
||||
All executed Tasks in the system can be accessed based on the unique Task ID, or by searching for the Task based on its properties.
|
||||
For example:
|
||||
|
||||
```python
|
||||
from clearml import Task
|
||||
executed_task = Task.get_task(task_id='aabbcc')
|
||||
```
|
||||
|
||||
## Log Hyperparameters
|
||||
Hyperparameters are an integral part of Machine Learning code as it lets you control the code without directly modifying it.<br/>
|
||||
Hyperparameters can be added from anywhere in your code, and ClearML supports [multiple](../../fundamentals/hyperparameters.md) ways to obtain them!
|
||||
|
||||
ClearML also allows users to change and track hyperparameter values without changing the code itself.
|
||||
When a cloned experiment is executed by an Agent, it will override the default values with new ones.
|
||||
|
||||
It's also possible to programatically change cloned experiments' parameters
|
||||
For example:
|
||||
```python
|
||||
from clearml import Task
|
||||
cloned_task = Task.clone(task_id='aabbcc')
|
||||
cloned_task.set_parameter(name='internal/magic', value=42)
|
||||
Task.enqueue(cloned_task, queue_name='default')
|
||||
```
|
||||
|
||||
|
||||
## Logging Artifacts
|
||||
Artifacts are a great way to pass and reuse data between Tasks in the system.
|
||||
From anywhere in the code you can upload [multiple](../../fundamentals/artifacts.md#logging-artifacts) types of data, object and files.
|
||||
Artifacts are the base of ClearML's [Data Management](../../clearml_data.md) solution and as a way to communicate complex objects between different
|
||||
stages of a [pipeline](../../fundamentals/pipelines.md)
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from clearml import Task
|
||||
Task.current_task().upload_artifact(name='a_file', artifact_object='local_file.bin')
|
||||
Task.current_task().upload_artifact(name='numpy', artifact_object=np.ones(4,4))
|
||||
```
|
||||
|
||||
|
||||
### Using Artifacts
|
||||
Artifacts can be retrieved by [accessing](../../fundamentals/artifacts.md#uing-artifacts) the Task that created it.
|
||||
```python
|
||||
from clearml import Task
|
||||
executed_task = Task.get_task(task_id='aabbcc')
|
||||
# artifact as a file
|
||||
local_file = executed_task.artifacts['file'].get_local_copy()
|
||||
# artifact as object
|
||||
a_numpy = executed_task.artifacts['numpy'].get()
|
||||
```
|
||||
|
||||
### Models
|
||||
Model are a special type of artifact that's automatically logged.
|
||||
Logging models into the model repository is the easiest way to integrate the development process directly with production.<br/>
|
||||
Any model stored by the supported frameworks (Keras \ TF \PyTorch \ Joblib) will be automatically logged into ClearML.
|
||||
Models can be automatically stored on a preferred storage medium (s3 bucket, google storage, etc...).
|
||||
|
||||
## Log Metrics
|
||||
Log as many metrics from your processes! It improves visibility on their progress.
|
||||
Use the Logger class from to report scalars and plots.
|
||||
```python
|
||||
from clearml import Logger
|
||||
Logger.current_logger().report_scalar(graph='metric', series='variant', value=13.37, iteration=counter)
|
||||
```
|
||||
|
||||
You can later analyze reported scalars
|
||||
```python
|
||||
from clearml import Task
|
||||
executed_task = Task.get_task(task_id='aabbcc')
|
||||
# get a summary of the min/max/last value of all reported scalars
|
||||
min_max_vlues = executed_task.get_last_scalar_metrics()
|
||||
# get detialed graphs of all scalars
|
||||
full_scalars = executed_task.get_reported_scalars()
|
||||
```
|
||||
|
||||
## Track Experiments
|
||||
You can also search and query Tasks in the system.
|
||||
Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more!
|
||||
```python
|
||||
from clearml import Task
|
||||
tasks = Task.get_tasks(project_name='examples', task_name='partial_name_match', task_filter={'status': 'in_proress'})
|
||||
```
|
||||
|
||||
## Manage Your Data
|
||||
Data is probably one of the biggest factors that determines the success of a project.
|
||||
Associating the data a model used to the model's configuration, code and results (such as accuracy) is key to deducing meaningful insights into how
|
||||
models behave. <br/>
|
||||
[ClearML Data](../../clearml_data.md) allows you to version your data so it's never lost, fetch it from every machine with minimal code changes
|
||||
and associate data to experiments results.
|
||||
Logging data can be done via command line, or via code. If any preprocessing code is involved, ClearML logs it as well!<br/>
|
||||
Once data is logged, it can be used by other experiments.
|
||||
|
||||
103
docs/getting_started/mlops/mlops_second_steps.md
Normal file
103
docs/getting_started/mlops/mlops_second_steps.md
Normal file
@@ -0,0 +1,103 @@
|
||||
---
|
||||
title: Next Steps
|
||||
---
|
||||
|
||||
Once Tasks are defined and in the ClearML system, they can be chained together to create Pipelines.
|
||||
Pipelines provide users with a greater level of abstraction and automation, with Tasks running one after the other.<br/>
|
||||
Tasks can interface with other Tasks in the pipeline and leverage other Tasks' work products.<br/>
|
||||
We'll go through a scenario where users create a Dataset, process the data then consume it with another task, all running as a pipeline.
|
||||
|
||||
|
||||
## Building Tasks
|
||||
### Dataset Creation
|
||||
|
||||
Let's assume we have some code that extracts data from a production Database into a local folder.
|
||||
Our goal is to create an immutable copy of the data to be used by further steps:
|
||||
|
||||
```bash
|
||||
clearml-data create --project data --name dataset
|
||||
clearml-data sync --folder ./from_production
|
||||
```
|
||||
|
||||
We could also add a Tag `latest` to the Dataset, marking it as the latest version.
|
||||
|
||||
### Preprocessing Data
|
||||
The second step is to preprocess the date. First we need to access it, then we want to modify it
|
||||
and lastly we want to create a new version of the data.
|
||||
|
||||
```python
|
||||
# create a task for the data processing part
|
||||
task = Task.init(project_name='data', task_name='ingest', task_type='data_processing')
|
||||
|
||||
# get the v1 dataset
|
||||
dataset = Dataset.get(dataset_project='data', dataset_name='dataset_v1')
|
||||
|
||||
# get a local mutable copy of the dataset
|
||||
dataset_folder = dataset.get_mutable_local_copy(target_folder='work_dataset', overwrite=True)
|
||||
# change some files in the `./work_dataset` folder
|
||||
...
|
||||
# create a new version of the dataset with the pickle file
|
||||
new_dataset = Dataset.create(
|
||||
dataset_project='data', dataset_name='dataset_v2',
|
||||
parent_datasets=[dataset],
|
||||
use_current_task=True, # this will make sure we have the creation code and the actual dataset artifacts on the same Task
|
||||
)
|
||||
new_dataset.sync_folder(local_path=dataset_folder)
|
||||
new_dataset.upload()
|
||||
new_dataset.finalize()
|
||||
# now let's remove the previous dataset tag
|
||||
dataset.tags = []
|
||||
new_dataset.tags = ['latest']
|
||||
```
|
||||
|
||||
We passed the `parents` argument when we created v2 of the Dataset, this inherits all the parent's version content.
|
||||
This will not only help us in tracing back dataset changes with full genealogy, but will also make our storage more efficient,
|
||||
as it will only store the files that were changed \ added from the parent versions.
|
||||
When we will later need access to the Dataset it will automatically merge the files from all parent versions
|
||||
in a fully automatic and transparent process, as if they were always part of the requested Dataset.
|
||||
|
||||
### Training
|
||||
We can now train our model with the **latest** Dataset we have in the system.
|
||||
We will do that by getting the instance of the Dataset based on the `latest` tag
|
||||
(if by any chance we have two Datasets with the same tag we will get the newest).
|
||||
Once we have the dataset we can request a local copy of the data. All local copy requests are cached,
|
||||
which means that if we are accessing the same dataset multiple times we will not have any unnecessary downloads.
|
||||
|
||||
```python
|
||||
# create a task for the model training
|
||||
task = Task.init(project_name='data', task_name='ingest', task_type='training')
|
||||
|
||||
# get the latest dataset with the tag `latest`
|
||||
dataset = Dataset.get(dataset_tags='latest')
|
||||
|
||||
# get a cached copy of the Dataset files
|
||||
dataset_folder = dataset.get_local_copy()
|
||||
|
||||
# train our model here
|
||||
```
|
||||
|
||||
## Building the Pipeline
|
||||
|
||||
Now that we have the data creation step, and the data training step, let's create a pipeline that when executed,
|
||||
will first run the first and then run the second.
|
||||
It is important to remember that pipelines are Tasks by themselves and can also be automated by other pipelines (i.e. pipelines of pipelines).
|
||||
|
||||
```python
|
||||
pipe = PipelineController(
|
||||
always_create_task=True,
|
||||
pipeline_project='data', pipeline_name='pipeline demo',
|
||||
)
|
||||
|
||||
pipe.add_step(
|
||||
name='step 1 data',
|
||||
base_task_id='cbc84a74288e459c874b54998d650214', # Put the task ID here
|
||||
)
|
||||
pipe.add_step(
|
||||
name='step 2 train',
|
||||
parents=['step 1 data', ],
|
||||
base_task_id='cbc84a74288e459c874b54998d650214', # Put the task ID here
|
||||
)
|
||||
```
|
||||
|
||||
We could also pass the parameters from one step to the other (for example `Task.id`).
|
||||
See more in the full pipeline documentation [here](../../fundamentals/pipelines.md).
|
||||
Reference in New Issue
Block a user