Restructure docs for platform components and use case clarity (#1048)

This commit is contained in:
Noam Wasersprung
2025-02-23 17:33:55 +02:00
committed by GitHub
parent 535e08efa8
commit 567af28632
128 changed files with 4370 additions and 1404 deletions

View File

@@ -24,7 +24,7 @@ During early stages of model development, while code is still being modified hea
These setups can be folded into each other and that's great! If you have a GPU machine for each researcher, that's awesome!
The goal of this phase is to get a code, dataset, and environment set up, so you can start digging to find the best model!
- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) should be integrated into your code (check out [Getting Started](ds_first_steps.md)).
- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) should be integrated into your code (check out [ClearML Setup](../../clearml_sdk/clearml_sdk_setup.md)).
This helps visualizing the results and tracking progress.
- [ClearML Agent](../../clearml_agent.md) helps moving your work to other machines without the hassle of rebuilding the environment every time,
while also creating an easy queue interface that easily lets you drop your tasks to be executed one by one
@@ -47,7 +47,7 @@ that you need.
accessed, [compared](../../webapp/webapp_exp_comparing.md) and [tracked](../../webapp/webapp_exp_track_visual.md).
- [ClearML Agent](../../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code,
applies code patches, manages parameters (including overriding them on the fly), executes the code, and queues multiple tasks.
It can even [build](../../clearml_agent/clearml_agent_docker.md#exporting-a-task-into-a-standalone-docker-container) the docker container for you!
It can even [build](../../clearml_agent/clearml_agent_docker_exec#exporting-a-task-into-a-standalone-docker-container) the docker container for you!
- [ClearML Pipelines](../../pipelines/pipelines.md) ensure that steps run in the same order,
programmatically chaining tasks together, while giving an overview of the execution pipeline's status.

View File

@@ -1,142 +0,0 @@
---
title: First Steps
---
## Install ClearML
First, [sign up for free](https://app.clear.ml).
Install the `clearml` Python package:
```bash
pip install clearml
```
## Connect ClearML SDK to the Server
### Local Python
1. Execute the following command to run the ClearML setup wizard:
```bash
clearml-init
```
:::note
The wizard does not edit or overwrite existing configuration files, so the above command will not work if a `clearml.conf`
file already exists.
:::
<Collapsible type="info" title="Learn about creating multiple ClearML configuration files">
Additional ClearML configuration files can be created, for example, to use inside Docker containers when executing
a Task.
Use the `--file` option for `clearml-init`.
```
clearml-init --file MyOtherClearML.conf
```
and then specify it using the ``CLEARML_CONFIG_FILE`` environment variable inside the container:
```
CLEARML_CONFIG_FILE = MyOtherClearML.conf
```
For more information about running tasks inside Docker containers, see [ClearML Agent Deployment](../../clearml_agent/clearml_agent_deployment.md)
and [ClearML Agent Reference](../../clearml_agent/clearml_agent_ref.md).
</Collapsible>
1. The setup wizard prompts for ClearML credentials.
```console
Please create new clearml credentials through the settings page in your `clearml-server` web app (e.g. http://localhost:8080//settings/workspace-configuration),
or create a free account at https://app.clear.ml/settings/workspace-configuration
In the settings page, press "Create new credentials", then press "Copy to clipboard".
Paste copied configuration here:
```
1. Get ClearML credentials. Open the ClearML Web UI in a browser. On the [**SETTINGS > WORKSPACE**](https://app.clear.ml/settings/workspace-configuration)
page, click **Create new credentials**.
The **LOCAL PYTHON** tab shows the data required by the setup wizard (a copy to clipboard action is available on
hover).
1. At the command prompt `Paste copied configuration here:`, copy and paste the ClearML credentials.
The setup wizard verifies the credentials.
```console
Detected credentials key="********************" secret="*******"
CLEARML Hosts configuration:
Web App: https://app.<your-domain>
API: https://api.<your-domain>
File Store: https://files.<your-domain>
Verifying credentials ...
Credentials verified!
New configuration stored in /home/<username>/clearml.conf
CLEARML setup completed successfully.
```
Now you can integrate ClearML into your code! Continue [here](#auto-log-experiment).
### Jupyter Notebook
To use ClearML with Jupyter Notebook, you need to configure ClearML Server access credentials for your notebook.
1. Get ClearML credentials. Open the ClearML Web UI in a browser. On the [**SETTINGS > WORKSPACE**](https://app.clear.ml/settings/workspace-configuration)
page, click **Create new credentials**. The **JUPYTER NOTEBOOK** tab shows the commands required to configure your
notebook (a copy to clipboard action is available on hover)
1. Add these commands to your notebook
Now you can use ClearML in your notebook!
## Auto-log Experiment
In ClearML, experiments are organized as [Tasks](../../fundamentals/task.md).
ClearML automatically logs your task and code, including outputs and parameters from popular ML frameworks,
once you integrate the ClearML [SDK](../../clearml_sdk/clearml_sdk.md) with your code. To control what ClearML automatically logs, see this [FAQ](../../faq.md#controlling_logging).
At the beginning of your code, import the `clearml` package:
```python
from clearml import Task
```
:::tip Full Automatic Logging
To ensure full automatic logging, it is recommended to import the `clearml` package at the top of your entry script.
:::
Then initialize the Task object in your `main()` function, or the beginning of the script.
```python
task = Task.init(project_name='great project', task_name='best task')
```
If the project does not already exist, a new one is created automatically.
The console should display the following output:
```
ClearML Task: created new task id=1ca59ef1f86d44bd81cb517d529d9e5a
2021-07-25 13:59:09
ClearML results page: https://app.clear.ml/projects/4043a1657f374e9298649c6ba72ad233/experiments/1ca59ef1f86d44bd81cb517d529d9e5a/output/log
2021-07-25 13:59:16
```
**That's it!** You are done integrating ClearML with your code :)
Now, [command-line arguments](../../fundamentals/hyperparameters.md#tracking-hyperparameters), [console output](../../fundamentals/logger.md#types-of-logged-results) as well as Tensorboard and Matplotlib will automatically be logged in the UI under the created Task.
Sit back, relax, and watch your models converge :) or continue to see what else can be done with ClearML [here](ds_second_steps.md).
## YouTube Playlist
Or watch the **Getting Started** playlist on ClearML's YouTube Channel!
[![Watch the video](https://img.youtube.com/vi/bjWwZAzDxTY/hqdefault.jpg)](https://www.youtube.com/watch?v=bjWwZAzDxTY&list=PLMdIlCuMqSTnoC45ME5_JnsJX0zWqDdlO&index=2)

View File

@@ -1,193 +0,0 @@
---
title: Next Steps
---
So, you've already [installed ClearML's Python package](ds_first_steps.md) and run your first task!
Now, you'll learn how to track Hyperparameters, Artifacts, and Metrics!
## Accessing Tasks
Every previously executed experiment is stored as a Task.
A Task's project and name can be changed after it has been executed.
A Task is also automatically assigned an auto-generated unique identifier (UUID string) that cannot be changed and always locates the same Task in the system.
Retrieve a Task object programmatically by querying the system based on either the Task ID,
or project and name combination. You can also query tasks based on their properties, like tags (see [Querying Tasks](../../clearml_sdk/task_sdk.md#querying--searching-tasks)).
```python
prev_task = Task.get_task(task_id='123456deadbeef')
```
Once you have a Task object you can query the state of the Task, get its model(s), scalars, parameters, etc.
## Log Hyperparameters
For full reproducibility, it's paramount to save each task's hyperparameters. Since hyperparameters can have substantial impact
on model performance, saving and comparing them between tasks is sometimes the key to understanding model behavior.
ClearML supports logging `argparse` module arguments out of the box, so once ClearML is integrated into the code, it automatically logs all parameters provided to the argument parser.
You can also log parameter dictionaries (very useful when parsing an external configuration file and storing as a dict object),
whole configuration files, or even custom objects or [Hydra](https://hydra.cc/docs/intro/) configurations!
```python
params_dictionary = {'epochs': 3, 'lr': 0.4}
task.connect(params_dictionary)
```
See [Configuration](../../clearml_sdk/task_sdk.md#configuration) for all hyperparameter logging options.
## Log Artifacts
ClearML lets you easily store the output products of a task: Model snapshot / weights file, a preprocessing of your data, feature representation of data and more!
Essentially, artifacts are files (or Python objects) uploaded from a script and are stored alongside the Task.
These artifacts can be easily accessed by the web UI or programmatically.
Artifacts can be stored anywhere, either on the ClearML server, or any object storage solution or shared folder.
See all [storage capabilities](../../integrations/storage.md).
### Adding Artifacts
Upload a local file containing the preprocessed results of the data:
```python
task.upload_artifact(name='data', artifact_object='/path/to/preprocess_data.csv')
```
You can also upload an entire folder with all its content by passing the folder (the folder will be zipped and uploaded as a single zip file).
```python
task.upload_artifact(name='folder', artifact_object='/path/to/folder/')
```
Lastly, you can upload an instance of an object; Numpy/Pandas/PIL Images are supported with `npz`/`csv.gz`/`jpg` formats accordingly.
If the object type is unknown, ClearML pickles it and uploads the pickle file.
```python
numpy_object = np.eye(100, 100)
task.upload_artifact(name='features', artifact_object=numpy_object)
```
For more artifact logging options, see [Artifacts](../../clearml_sdk/task_sdk.md#artifacts).
### Using Artifacts
Logged artifacts can be used by other Tasks, whether it's a pre-trained Model or processed data.
To use an artifact, first you have to get an instance of the Task that originally created it,
then you either download it and get its path, or get the artifact object directly.
For example, using a previously generated preprocessed data.
```python
preprocess_task = Task.get_task(task_id='preprocessing_task_id')
local_csv = preprocess_task.artifacts['data'].get_local_copy()
```
`task.artifacts` is a dictionary where the keys are the artifact names, and the returned object is the artifact object.
Calling `get_local_copy()` returns a local cached copy of the artifact. Therefore, next time you execute the code, you don't
need to download the artifact again.
Calling `get()` gets a deserialized pickled object.
Check out the [artifacts retrieval](https://github.com/clearml/clearml/blob/master/examples/reporting/artifacts_retrieval.py) example code.
### Models
Models are a special kind of artifact.
Models created by popular frameworks (such as PyTorch, TensorFlow, Scikit-learn) are automatically logged by ClearML.
All snapshots are automatically logged. In order to make sure you also automatically upload the model snapshot (instead of saving its local path),
pass a storage location for the model files to be uploaded to.
For example, upload all snapshots to an S3 bucket:
```python
task = Task.init(
project_name='examples',
task_name='storing model',
output_uri='s3://my_models/'
)
```
Now, whenever the framework (TensorFlow/Keras/PyTorch etc.) stores a snapshot, the model file is automatically uploaded to the bucket to a specific folder for the task.
Loading models by a framework is also logged by the system; these models appear in a task's **Artifacts** tab,
under the "Input Models" section.
Check out model snapshots examples for [TensorFlow](https://github.com/clearml/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py),
[PyTorch](https://github.com/clearml/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py),
[Keras](https://github.com/clearml/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py),
[scikit-learn](https://github.com/clearml/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_joblib_example.py).
#### Loading Models
Loading a previously trained model is quite similar to loading artifacts.
```python
prev_task = Task.get_task(task_id='the_training_task')
last_snapshot = prev_task.models['output'][-1]
local_weights_path = last_snapshot.get_local_copy()
```
Like before, you have to get the instance of the task training the original weights files, then you can query the task for its output models (a list of snapshots), and get the latest snapshot.
:::note
Using TensorFlow, the snapshots are stored in a folder, meaning the `local_weights_path` will point to a folder containing your requested snapshot.
:::
As with artifacts, all models are cached, meaning the next time you run this code, no model needs to be downloaded.
Once one of the frameworks will load the weights file, the running task will be automatically updated with "Input Model" pointing directly to the original training Task's Model.
This feature lets you easily get a full genealogy of every trained and used model by your system!
## Log Metrics
Full metrics logging is the key to finding the best performing model!
By default, ClearML automatically captures and logs everything reported to TensorBoard and Matplotlib.
Since not all metrics are tracked that way, you can also manually report metrics using a [`Logger`](../../fundamentals/logger.md) object.
You can log everything, from time series data and confusion matrices to HTML, Audio, and Video, to custom plotly graphs! Everything goes!
![Experiment plots](../../img/report_plotly.png#light-mode-only)
![Experiment plots](../../img/report_plotly_dark.png#dark-mode-only)
Once everything is neatly logged and displayed, use the [comparison tool](../../webapp/webapp_exp_comparing.md) to find the best configuration!
## Track Tasks
The task table is a powerful tool for creating dashboards and views of your own projects, your team's projects, or the entire development.
![Task table](../../img/webapp_experiment_table.png#light-mode-only)
![Task table](../../img/webapp_experiment_table_dark.png#dark-mode-only)
### Creating Leaderboards
Customize the [task table](../../webapp/webapp_exp_table.md) to fit your own needs, adding desired views of parameters, metrics, and tags.
You can filter and sort based on parameters and metrics, so creating custom views is simple and flexible.
Create a dashboard for a project, presenting the latest Models and their accuracy scores, for immediate insights.
It can also be used as a live leaderboard, showing the best performing tasks' status, updated in real time.
This is helpful to monitor your projects' progress, and to share it across the organization.
Any page is sharable by copying the URL from the address bar, allowing you to bookmark leaderboards or to send an exact view of a specific task or a comparison page.
You can also tag Tasks for visibility and filtering allowing you to add more information on the execution of the task.
Later you can search based on task name in the search bar, and filter tasks based on their tags, parameters, status, and more.
## What's Next?
This covers the basics of ClearML! Running through this guide you've learned how to log Parameters, Artifacts and Metrics!
If you want to learn more look at how we see the data science process in our [best practices](best_practices.md) page,
or check these pages out:
- Scale you work and deploy [ClearML Agents](../../clearml_agent.md)
- Develop on remote machines with [ClearML Session](../../apps/clearml_session.md)
- Structure your work and put it into [Pipelines](../../pipelines/pipelines.md)
- Improve your tasks with [Hyperparameter Optimization](../../fundamentals/hpo.md)
- Check out ClearML's integrations with your favorite ML frameworks like [TensorFlow](../../integrations/tensorflow.md),
[PyTorch](../../integrations/pytorch.md), [Keras](../../integrations/keras.md),
and more
## YouTube Playlist
All these tips and tricks are also covered in ClearML's **Getting Started** series on YouTube. Go check it out :)
[![Watch the video](https://img.youtube.com/vi/kyOfwVg05EM/hqdefault.jpg)](https://www.youtube.com/watch?v=kyOfwVg05EM&list=PLMdIlCuMqSTnoC45ME5_JnsJX0zWqDdlO&index=3)