clearml-docs/task.md at 6df1cd956192c09ee147e7682d0819cb912d48c6

mirror of https://github.com/clearml/clearml-docs synced 2025-01-31 14:37:18 +00:00

2022-12-27 16:01:47 +02:00

11 KiB

Raw Blame History

title
Tasks

ClearML Task lies at the heart of ClearML's experiment manager.

A Task is a single code execution session, which can represent an experiment, a step in a workflow, a workflow controller, or any custom implementation you choose.

To transform an existing script into a ClearML Task, one must call the Task.init() method and specify a task name and its project. This creates a Task object that automatically captures code execution information as well as execution outputs.

All the information captured by a task is by default uploaded to the ClearML Server, and it can be visualized in the ClearML WebApp (UI). ClearML can also be configured to upload model checkpoints, artifacts, and charts to cloud storage (see Storage). Additionally, you can work with tasks in Offline Mode, in which all information is saved in a local folder (see Storing Task Data Offline).

In the UI and code, tasks are grouped into projects, which are logical entities similar to folders. Users can decide how to group tasks, though different models or objectives are usually grouped into different projects.

Tasks that are in the system can be accessed and utilized with code. To access a task, it can be identified either by a project name & task name combination or by a unique ID.

It's possible to copy (clone) a task multiple times and to modify it for re-execution.

Task Sections

The sections of ClearML Task are made up of the information that a task captures and stores, which consists of code execution details and execution outputs. This information is used for tracking and visualizing results, reproducing, tuning, and comparing experiments, and executing workflows.

The captured code execution information includes:

Git information
Uncommitted code modifications
Python environment
Execution configuration and hyperparameters

The captured execution output includes:

To view a more in depth description of each task section, see Tracking Experiments and Visualizing Results.

Execution Configuration

ClearML logs a task’s hyperparameters specified as command line arguments, environment or code level variables. This allows experiments to be reproduced, and their hyperparameters and results can be saved and compared, which is key to understanding model behavior.

Hyperparameters can be added from anywhere in your code, and ClearML provides multiple ways to log them. If you specify your parameters using popular python packages, such as argparse and click, all you need to do is initialize a task, and ClearML will automatically log the parameters. ClearML also provides methods to explicitly report parameters.

When executing a task through a ClearML agent, the ClearML instrumentation of your code allows for using the ClearML UI to override the original specified values of your parameters.

See Hyperparameters for more information.

Artifacts

ClearML allows easy storage of experiments' output products as artifacts that can later be accessed easily and used, through the web UI or programmatically.

ClearML provides methods to easily track files generated throughout your experiments’ execution such as:

Numpy objects
Pandas DataFrames
PIL
Files and folders
Python objects
and more!

Most importantly, ClearML also logs experiments’ input and output models as well as interim model snapshots (see Models).

Logging Artifacts

ClearML provides an explicit logging interface that supports manually reporting a variety of artifacts. Any type of artifact can be logged to a task using the Task.upload_artifact method. See more details in the Artifacts Reporting example.

ClearML can be configured to upload artifacts to any of the supported types of storage, which include local and shared folders, AWS S3 buckets, Google Cloud Storage, and Azure Storage. For more information, see Storage.

:::note Debug Sample Storage Debug samples are handled differently, see Logger.set_default_upload_destination :::

Accessing Artifacts

Artifacts that have been logged can be accessed by other tasks through the task they are attached to, and then retrieving the artifact with one of its following methods:

get_local_copy() - caches the files for later use and returns a path to the cached file.
get() - use for Python objects. The method that returns the Python object.

See more details in the Using Artifacts example.

Task Types

Tasks have a type attribute, which denotes their purpose (e.g. training / testing / data processing). This helps to further organize projects and ensure tasks are easy to search and find. Available task types are:

training (default) - Training a model
testing - Testing a component, for example model performance
inference - Model inference job (e.g. offline / batch model execution)
controller - A task that lays out the logic for other tasks’ interactions, manual or automatic (e.g. a pipeline controller)
optimizer - A specific type of controller for optimization tasks (e.g. hyperparameter optimization)
service - Long lasting or recurring service (e.g. server cleanup, auto ingress, sync services etc.)
monitor - A specific type of service for monitoring
application - A task implementing custom applicative logic, like auto-scaler or clearml-session
data_processing - Any data ingress / preprocessing (see ClearML Data)
qc - Quality Control (e.g. evaluating model performance vs. blind dataset)
custom - A task not matching any of the above

Task Lifecycle

ClearML Tasks are created in one of the following methods:

Manually running code that is instrumented with the ClearML SDK and invokes Task.init.
Cloning an existing task.
Creating a task via CLI using clearml-task.

Logging Task Information

The above diagram describes how execution information is recorded when running code instrumented with ClearML:

Once a ClearML Task is initialized, ClearML automatically logs the complete environment information including:
- Source code
- Python environment
- Configuration parameters.
As the execution progresses, any outputs produced are recorded including:
- Console logs
- Metrics and graphs
- Models and other artifacts
Once the script terminates, the task will change its status to either Completed, Failed, or Aborted (see Task states below).

All information logged can be viewed in the task details UI.

Cloning Tasks

The above diagram demonstrates how a previously run task can be used as a baseline for experimentation:

A previously run task is cloned, creating a new task, in Draft mode (see Task states below).
The new task retains all the source task's configuration. The original task's outputs are not carried over.
The new task's configuration is modified to reflect the desired parameters for the new execution.
The new task is enqueued for execution.
A clearml-agent servicing the queue pulls the new task and executes it (where ClearML again logs all the execution outputs).

Task States

The state of a task represents its stage in the task lifecycle. It indicates whether the task is read-write (editable) or read-only. For each state, a state transition indicates which actions can be performed on an experiment, and the new state after performing an action.

The following table describes the task states and state transitions.

State	Description / Usage	State Transition
Draft	The experiment is editable. Only experiments in Draft mode are editable. The experiment is not running locally or remotely.	If the experiment is enqueued for a worker to fetch and execute, the state becomes Pending.
Pending	The experiment was enqueued and is waiting in a queue for a worker to fetch and execute it.	If the experiment is dequeued, the state becomes Draft.
Running	The experiment is running locally or remotely.	If the experiment is manually or programmatically terminated, the state becomes Aborted.
Completed	The experiment ran and terminated successfully.	If the experiment is reset or cloned, the state of the cloned experiment or newly cloned experiment becomes Draft. Resetting deletes the logs and output of a previous run. Cloning creates an exact, editable copy.
Failed	The experiment ran and terminated with an error.	The same as Completed.
Aborted	The experiment ran, and was manually or programmatically terminated. The server's non-responsive task monitor aborts a task automatically after no activity has been detected for a specified time interval (configurable).	The same as Completed.
Published	The experiment is read-only. Publish an experiment to prevent changes to its inputs and outputs.	A Published experiment cannot be reset. If it is cloned, the state of the newly cloned experiment becomes Draft.

SDK Interface

See the task SDK interface for an overview for using the most basic Pythonic methods of the Task class. See the Task reference page for a complete list of available methods.

11 KiB Raw Blame History Unescape Escape