Reorder and edit fundamentals/task sections (#5)

Refactor Task description - Clarify workflow and lifecycle
This commit is contained in:
pollfly 2021-07-08 14:50:12 +03:00 committed by GitHub
parent 9e7dfb49c6
commit cce29c3c1e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -2,119 +2,123 @@
title: Task / Experiment
---
ClearML Task lies at the heart of ClearML's experiment manager. A Task is an object that holds
all the execution information: Code, Environment, Parameters, Artifacts and Results.
**ClearML Task** lies at the heart of ClearML's experiment manager.
A Task is a single code execution session. To transform an existing script into a Task, one must call [Task.init()](../references/sdk/task.md#taskinit)
which creates a Task object that automatically captures:
* Git information
* Python environment
* Parameters in the code
* Uncommitted code
* Outputs of the execution (e.g. console outputs, Tensorboard, logs etc.)
A Task is a single code execution session, which can represent an experiment, a step in a workflow, a workflow controller,
or any custom implementation you choose.
Previously executed Tasks can be accessed and utilized with code. It's possible to copy a Task multiple times and modify its:
* Arguments
* Environment (e.g. repo commit ID, Python package)
* Configurations (e.g. command line arguments, configuration file etc.).
To transform an existing script into a **ClearML Task**, one must call the [Task.init()](../references/sdk/task.md#taskinit) method
and specify a task name and its project. This creates a Task object that automatically captures code execution
information as well as execution outputs.
In ClearML, Tasks are organized into projects, and Tasks can be identified either by a project name & task name combination
or by a unique ID.
All the information captured by a task is by default uploaded to the [ClearML Server](../deploying_clearml/clearml_server.md)
and it can be visualized in the [ClearML WebApp](../webapp/webapp_overview.md) (UI). ClearML can also be configured to upload
model checkpoints, artifacts, and charts to cloud storage (see [Storage](../integrations/storage.md)).
In the UI and code, tasks are grouped into projects, which are logical entities similar to folders. Users can decide
how to group tasks, though different models or objectives are usually grouped into different projects.
Projects can be divided into sub-projects (and sub-sub-projects, etc.) just like files and subdirectories on a
computer, making experiment organization easier.
Tasks that are in the system can be accessed and utilized with code. To [access a task](#accessing-tasks), it can be identified either by a
project name & task name combination or by a unique ID.
It's possible to copy ([clone](../webapp/webapp_exp_reproducing.md)) a task multiple times and to modify it for re-execution.
### Projects and Sub Projects
In ClearML, Tasks are organized into projects. Projects are logical entities (similar to folders) that group tasks. Users can decide
how to group tasks, but different models or objectives are usually grouped into different projects.
Projects can be further divided into sub-projects (and sub-sub-projects, etc.)
just like files and subdirectories on a computer, making experiment organization easier.
## Task sections
A Task is comprised of multiple sections, linked together for traceability.
After a Task has been initialized, it's possible to track, visualize, and, depending on its status, edit Task details, including:
* [Execution Information](#execution)
* [Configuration Parameters](#configuration)
* [Artifacts](#artifacts).
The sections of **ClearML Task** are made up of the information that a task captures and stores, which consists of code
execution details and execution outputs. This information is used for tracking
and visualizing results, reproducing, tuning, and comparing experiments, and executing workflows.
The captured [code execution information](../webapp/webapp_exp_track_visual.md#execution-details) includes:
* Git information
* Uncommitted code modifications
* Python environment
* Execution [configuration](../webapp/webapp_exp_track_visual.md#configuration)
### Execution
The environment for executing the experiment.
The captured [execution output](../webapp/webapp_exp_track_visual.md#experiment-results) includes:
* [Console output](../webapp/webapp_exp_track_visual.md#console)
* [Scalars](../webapp/webapp_exp_track_visual.md#scalars)
* [Plots](../webapp/webapp_exp_track_visual.md#other-plots)
* [Debug samples](../webapp/webapp_exp_track_visual.md#debug-samples)
* [Models](artifacts.md#models)
#### Source code:
- Repository / Commit - Saves a reference to the git repository and specific commit ID of the current experiment.
- Script Path - Stores the entry point script for the experiment.
- Working Directory - The working directory for the current experiment. This is relative to the root git repository folder.
To view a more in depth description of each task section, see [Tracking Experiments and Visualizing Results](../webapp/webapp_exp_track_visual.md).
#### Uncommitted changes
Stores the uncommitted changes of the current experiment. If the experiment has no git repository, it will store the
entire experiment script file here (ClearML only stores a single file, when using more than a single script for
an experiment please use git :smile: )
## Task types
#### Installed packages
Stores a list of all the packages that the experiment is using, including the specific version of the packages.
Only directly imported packages will appear here. This is done to make sure the important packages and versions used
by the experiment are captured.
The section itself is fully compatible with the Python `requirements.txt` standard, and is fully editable.
Tasks have a *type* attribute, which denotes their purpose (Training / Testing / Data processing). This helps to further
organize projects and ensure tasks are easy to [search and find](#querying--searching-tasks). The default task type is *training*.
Available task types are:
- Experimentation
#### Base docker image
Specify the required docker image for remote execution of the code (see [ClearML Agent](../clearml_agent)).
A remote machine will execute the entire experiment inside the requested docker.
It's also possible to add parameters for the docker execution. For example:
`nvcr.io/nvidia/pytorch:20.11-py3 --ipc=host`
- *training*, *testing*, *inference*
#### Output destination
Storage target to Automatically uploads all models / snapshots. This is applicable
mostly when an experiment is executed by an agent, read more on [Agents](../clearml_agent.md) and [Storage](../integrations/storage) integration here.
- Other workflows
- *controller*, *optimizer*
- *monitor*, *service*, *application*
- *data_processing*, *qc*
- *custom*
### Configuration
Configurations are a set of arguments / dictionaries / files used to define the experiment (read more [here](hyperparameters)).
## Task lifecycle
#### User properties
Editable key / value store, which enables adding information to an experiment after execution, making it easier to search / filter.
ClearML Tasks are created in one of the following methods:
* Manually running code that is instrumented with the ClearML SDK and invokes `Task.init()`.
* Cloning an existing task.
* Creating a task via CLI using [clearml-task](../apps/clearml_task.md).
#### Hyperparameters
- Args - Command line arguments of the experiment process .`argparse` values are automatically detected and logged here.
- Environment - Specific [Environment variables](../configs/env_vars.md) to be logged.
- General - The default section name for a general purpose dictionary of parameters that are logged. See the 'name'
parameter of [`task_connect`](../references/sdk/task#connect).
- *user_section* - Custom section for logged python dictionaries & objects that are logged.
### Logging Task Information
#### Configuration object:
- General - Default section for a dictionary or configuration file to store as plain test configuration. Modifiable when executed
by an agent.
- *user_section* - Support for multiple configuration files (or dictionaries), name each configuration section. Modifiable
when executed by an agent.
![Logging ClearML Task information diagram](../img/clearml_logging_diagram.png)
### Artifacts
Artifacts are a way to store the outputs of an experiment, and later use those outputs as inputs in other processes.
The above diagram describes how execution information is recorded when running code instrumented with ClearML:
See more information on [Artifacts](artifacts).
1. Once a ClearML Task is initialized, ClearML automatically logs the complete environment information
including:
* Source code
* Python environment
* Configuration parameters.
1. As the execution progresses, any outputs produced are recorded including:
* Console logs
* Metrics and graphs
* Models and other artifacts
1. Once the script terminates, the task will change its status to either `Completed`, `Failed`, or `Aborted` (see [Task states](#task-states) below).
#### Models
- **Input Model** - Any model weights file loaded by the experiment will appear here.
- **Output Model** - Any stored weights file / model will be logged here. This is useful for searching and connecting output models to
inference pipelines for production automation.
All information logged can be viewed in the [task details UI](../webapp/webapp_exp_track_visual.md).
### Results
Results recorded in the task. Supports text, graphs, plots, images audio and more including automatic reports by Tensorboard and Matplotlib.
See [logger](logger).
### Cloning Tasks
![ClearML Task lifecycle diagram](../img/clearml_task_life_cycle_diagram.png)
#### Console
Stdout and stderr outputs will appear here automatically.
The above diagram demonstrates how a previously run task can be used as a baseline for experimentation:
#### Scalars
Any time-series graphs appear here such as Tensorboard scalar, scalar reporting from code and machine performance (CPU / GPU / Net etc.).
1. A previously run task is cloned, creating a new task, in `Draft` mode (see [Task states](#task-states) below).
The new task retains all the source task's configuration. The original task's outputs are not carried over.
1. The new task's configuration is modified to reflect the desired parameters for the new execution.
1. The new task is enqueued for execution.
1. A `clearml-agent` servicing the queue pulls the new task and executes it (where ClearML again logs all the execution outputs).
#### Plots
Non-time-series plots appear here, such as Tensorboard Histograms \ Distribution and Matplotlib plots (with exception to `imshow` plots). <br/>
It's also possible to report plots directly to ClearML (e.g. scatter 2d / 3d tables, generic plotly objects etc).
## Task states
#### Debug samples
Any media (image / audio / html) is saved here.
Media reported to Tensorboard is saved here as well as images shown with `Matplotlib.plot.imshow`.<br/>
The state of a Task represents its stage in the Task lifecycle. It indicates whether the Task is read-write (editable) or
read-only. For each state, a state transition indicates which actions can be performed on an experiment, and the new state
after performing an action.
The following table describes the Task states and state transitions.
| State | Description / Usage | State Transition |
|---|---|---|
| *Draft* | The experiment is editable. Only experiments in *Draft* mode are editable. The experiment is not running locally or remotely. | If the experiment is enqueued for a [worker](../fundamentals/agents_and_queues.md) to fetch and execute, the state becomes *Pending*. |
| *Pending* | The experiment was enqueued and is waiting in a queue for a worker to fetch and execute it. | If the experiment is dequeued, the state becomes *Draft*. |
| *Running* | The experiment is running locally or remotely. | If the experiment is manually or programmatically terminated, the state becomes *Aborted*. |
| *Completed* | The experiment ran and terminated successfully. | If the experiment is reset or cloned, the state of the cloned experiment or newly cloned experiment becomes *Draft*. Resetting deletes the logs and output of a previous run. Cloning creates an exact, editable copy. |
| *Failed* | The experiment ran and terminated with an error. | The same as *Completed*. |
| *Aborted* | The experiment ran, and was manually or programmatically terminated. | The same as *Completed*. |
| *Published* | The experiment is read-only. Publish an experiment to prevent changes to its inputs and outputs. | A *Published* experiment cannot be reset. If it is cloned, the state of the newly cloned experiment becomes *Draft*. |
It's also possible to manually report media / link an experiment produces with the Logger interface. See [Logger.report_media](../references/sdk/logger.md#report_media).<br/>
## Usage
@ -145,13 +149,14 @@ task = Task.init(
)
```
Once a Task is created, the Task object can be accessed from anywhere in the code by calling [`Task.current_task()`](../references/sdk/task.md#taskcurrent_task).
Once a Task is created, the Task object can be accessed from anywhere in the code by calling [`Task.current_task`](../references/sdk/task.md#taskcurrent_task).
If multiple Tasks need to be created in the same process (for example, for logging multiple manual runs),
make sure we close a Task, before initializing a new one. To close a task simply call `task.close()`
make sure to close a Task, before initializing a new one. To close a task simply call `task.close`
(see example [here](https://github.com/allegroai/clearml/blob/master/examples/advanced/multiple_tasks_single_process.py)).
Projects can be divided into sub-projects, just like folders are broken into subfolders.
When initializing a Task, its project needs to be specified. If the project entered does not exist, it will be created.
Projects can be divided into sub-projects, just like folders are broken into sub-folders.
For example:
```python
Task.init(project_name='main_project/sub_project', task_name='test')
@ -169,6 +174,7 @@ In order to mitigate the clutter that a multitude of debugging Tasks might creat
* The previous Task execution did not have any artifacts / models
It's possible to always create a new Task by passing `reuse_last_task_id=False`.
See full `Task.init` documentation [here](../references/sdk/task.md#taskinit).
### Empty Task Creation
@ -226,7 +232,7 @@ task_list = Task.get_tasks(
task_filter=None # Optional[Dict]
)
```
We can search for tasks by either their UUID or their project \ name combination
It's possible to also filter Tasks by passing filtering rules to `task_filter`.
For example:
```python
@ -295,6 +301,8 @@ task.execute_remotely(
Once the function is called on the machine, it will stop the local process and enqueue the current Task into the *default*
queue. From there, an agent will be able to pick it up and launch it.
See the [Remote Execution](https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py) example.
#### Remote Function Execution
A specific function can also be launched on a remote machine with `create_function_task`.
@ -318,74 +326,3 @@ Like any other arguments, they can be changed from the UI or programmatically.
Function Tasks must be created from within a regular Task, created by calling `Task.init()`
:::
## Task Lifecycle
ClearML Tasks are created in one of the following methods:
* Manually running code that is instrumented with the ClearML SDK and invokes `Task.init()`.
* Cloning an existing task.
* Creating a task via CLI using [clearml-task](../apps/clearml_task.md).
### Logging Task Information
![image](../img/clearml_logging_diagram.png)
The above diagram describes how execution information is recorded when running code instrumented with ClearML:
1. Once a ClearML Task is initialized, ClearML automatically logs the complete environment information
including:
* Source code
* Python environment
* Configuration parameters.
1. As the execution progresses, any outputs produced are recorded including:
* Console logs
* Metrics and graphs
* Models and other artifacts
1. Once the script terminates, the Task will change its status to either `Completed`, `Failed`, or `Aborted`.
All information logged can be viewed in the [task details UI](../webapp/webapp_exp_track_visual.md).
### Cloning Tasks
![image](../img/clearml_task_life_cycle_diagram.png)
The above diagram demonstrates how a previously run task can be used as a baseline for experimentation:
1. A previously run task is cloned, creating a new task, in *draft* mode.
The new task retains all of the source task's configuration. The original task's outputs are not carried over.
1. The new task's configuration is modified to reflect the desired parameters for the new execution.
1. The new task is enqueued for execution.
1. A `clearml-agent` servicing the queue pulls the new task and executes it (where ClearML again logs all of the execution outputs).
### Task states
The state of a Task represents its stage in the Task lifecycle. It indicates whether the Task is read-write (editable) or
read-only. For each state, a state transition indicates which actions can be performed on an experiment, and the new state
after performing an action.
The following table describes Task the states and state transitions.
| State | Description / Usage | State Transition |
|---|---|---|
| *Draft* | The experiment is editable. Only experiments in *Draft* mode are editable. The experiment is not running locally or remotely. | If the experiment is enqueued for a [worker](../fundamentals/agents_and_queues.md) to fetch and execute, the state becomes *Pending*. |
| *Pending* | The experiment was enqueued and is waiting in a queue for a worker to fetch and execute it. | If the experiment is dequeued, the state becomes *Draft*. |
| *Running* | The experiment is running locally or remotely. | If the experiment is manually or programmatically terminated, the state becomes *Aborted*. |
| *Completed* | The experiment ran and terminated successfully. | If the experiment is reset or cloned, the state of the cloned experiment or newly cloned experiment becomes *Draft*. Resetting deletes the logs and output of a previous run. Cloning creates an exact, editable copy. |
| *Failed* | The experiment ran and terminated with an error. | The same as *Completed*. |
| *Aborted* | The experiment ran, and was manually or programmatically terminated. | The same as *Completed*. |
| *Published* | The experiment is read-only. Publish an experiment to prevent changes to its inputs and outputs. | A *Published* experiment cannot be reset. If it is cloned, the state of the newly cloned experiment becomes *Draft*. |
## Task types
Tasks also have a *type* attribute, which denotes their purpose (Training / Testing / Data processing). This helps to further
organize projects and ensure Tasks are easy to search and find. The default Task type is *training*.
Available Task types are:
- Experimentation
- *training*, *testing*, *inference*
- Other workflows
- *controller*, *optimizer*
- *monitor*, *service*, *application*
- *data_processing*, *qc*
- *custom*