clearml-docs/docs/fundamentals/task.md
2021-06-21 01:00:16 +03:00

17 KiB

title
Task / Experiment

ClearML Task lies at the heart of ClearML's experiment manager. A Task is an object that holds all the execution information: Code, Environment, Parameters, Artifacts and Results.

A Task is a single code execution session. To transform an existing script into a Task, one must call Task.init() which creates a Task object that automatically captures:

  • Git information
  • Python environment
  • Parameters in the code
  • Uncommitted code
  • Outputs of the execution (e.g. console outputs, Tensorboard, logs etc.)

Previously executed Tasks can be accessed and utilized with code. It's possible to copy a Task multiple times and modify its:

  • Arguments
  • Environment (e.g. repo commit ID, Python package)
  • Configurations (e.g. command line arguments, configuration file etc.).

In ClearML, Tasks are organized into projects, and Tasks can be identified either by a project name & task name combination or by a unique ID.

Projects and Sub Projects

In ClearML, Tasks are organized into projects. Projects are logical entities (similar to folders) that group tasks. Users can decide how to group tasks, but different models or objectives are usually grouped into different projects. Projects can be further divided into sub-projects (and sub-sub-projects, etc.) just like files and subdirectories on a computer, making experiment organization easier.

Task sections

A Task is comprised of multiple sections, linked together for traceability. After a Task has been initialized, it's possible to track, visualize, and, depending on its status, edit Task details, including:

Execution

The environment for executing the experiment.

Source code:

  • Repository / Commit - Saves a reference to the git repository and specific commit ID of the current experiment.
  • Script Path - Stores the entry point script for the experiment.
  • Working Directory - The working directory for the current experiment. This is relative to the root git repository folder.

Uncommitted changes

Stores the uncommitted changes of the current experiment. If the experiment has no git repository, it will store the entire experiment script file here (ClearML only stores a single file, when using more than a single script for an experiment please use git 😄 )

Installed packages

Stores a list of all the packages that the experiment is using, including the specific version of the packages. Only directly imported packages will appear here. This is done to make sure the important packages and versions used by the experiment are captured. The section itself is fully compatible with the Python requirements.txt standard, and is fully editable.

Base docker image

Specify the required docker image for remote execution of the code (see ClearML Agent). A remote machine will execute the entire experiment inside the requested docker. It's also possible to add parameters for the docker execution. For example: nvcr.io/nvidia/pytorch:20.11-py3 --ipc=host

Output destination

Storage target to Automatically uploads all models / snapshots. This is applicable mostly when an experiment is executed by an agent, read more on Agents and Storage integration here.

Configuration

Configurations are a set of arguments / dictionaries / files used to define the experiment (read more here).

User properties

Editable key / value store, which enables adding information to an experiment after execution, making it easier to search / filter.

Hyperparameters

  • Args - Command line arguments of the experiment process .argparse values are automatically detected and logged here.
  • Environment - Specific Environment variables to be logged.
  • General - The default section name for a general purpose dictionary of parameters that are logged. See the 'name' parameter of task_connect.
  • user_section - Custom section for logged python dictionaries & objects that are logged.

Configuration object:

  • General - Default section for a dictionary or configuration file to store as plain test configuration. Modifiable when executed by an agent.
  • user_section - Support for multiple configuration files (or dictionaries), name each configuration section. Modifiable when executed by an agent.

Artifacts

Artifacts are a way to store the outputs of an experiment, and later use those outputs as inputs in other processes.

See more information on Artifacts.

Models

  • Input Model - Any model weights file loaded by the experiment will appear here.
  • Output Model - Any stored weights file / model will be logged here. This is useful for searching and connecting output models to inference pipelines for production automation.

Results

Results recorded in the task. Supports text, graphs, plots, images audio and more including automatic reports by Tensorboard and Matplotlib. See logger.

Console

Stdout and stderr outputs will appear here automatically.

Scalars

Any time-series graphs appear here such as Tensorboard scalar, scalar reporting from code and machine performance (CPU / GPU / Net etc.).

Plots

Non-time-series plots appear here, such as Tensorboard Histograms \ Distribution and Matplotlib plots (with exception to imshow plots).
It's also possible to report plots directly to ClearML (e.g. scatter 2d / 3d tables, generic plotly objects etc).

Debug samples

Any media (image / audio / html) is saved here. Media reported to Tensorboard is saved here as well as images shown with Matplotlib.plot.imshow.

It's also possible to manually report media / link an experiment produces with the Logger interface. See Logger.report_media.

Usage

Task Creation

Task.init() is the main method used to create Tasks in ClearML. It will create a Task, and populate it with:

  • A link to the running git repository (including commit ID and local uncommitted changes)
  • Python packages used (i.e. directly imported Python packages, and the versions available on the machine)
  • Argparse arguments (default and specific to the current execution)
  • Reports to Tensorboard & Matplotlib and model checkpoints.
from clearml import Task


task = Task.init(
    project_name='example', 
    task_name='task template', 
    task_type=None,
    tags=None,
    reuse_last_task_id=True,
    continue_last_task=False,
    output_uri=None,
    auto_connect_arg_parser=True,
    auto_connect_frameworks=True,
    auto_resource_monitoring=True,
    auto_connect_streams=True,    
)

Once a Task is created, the Task object can be accessed from anywhere in the code by calling Task.current_task().

If multiple Tasks need to be created in the same process (for example, for logging multiple manual runs), make sure we close a Task, before initializing a new one. To close a task simply call task.close() (see example here).

Projects can be divided into sub-projects, just like folders are broken into subfolders. For example:

Task.init(project_name='main_project/sub_project', task_name='test')

Nesting projects works on multiple levels. For example: project_name=main_project/sub_project/sub_sub_project

Task Reuse

Every Task.init call will create a new Task for the current execution. In order to mitigate the clutter that a multitude of debugging Tasks might create, a Task will be reused if:

  • The last time it was executed (on this machine) was under 72 hours ago (configurable, see sdk.development.task_reuse_time_window_in_hours in the sdk.development section of the ClearML configuration reference)
  • The previous Task execution did not have any artifacts/models

It's possible to always create a new Task by passing reuse_last_task_id=False. See full Task.init documentation here.

Empty Task Creation

A Task can also be created without the need to execute the code itself. Unlike the runtime detections, all the environment and configuration details needs to be provided explicitly.

For example:

task = Task.create(
    project_name='example', 
    task_name='task template',
    repo='https://github.com/allegroai/clearml.git',
    branch='master',
    script='examples/reporting/html_reporting.py',
    working_directory='.',
    docker=None,
)

See Task.create in the Python SDK reference.

Accessing Tasks

A Task can be identified by its project and name, and by a unique identifier (UUID string). The name and project of a Task can be changed after an experiment has been executed, but its ID can't be changed.

Programmatically, Task objects can be retrieved by querying the system based on either the Task ID or a project and name combination. If a project / name combination is used, and multiple Tasks have the exact same name, the function will return the last modified Task.

For example:

  • Accessing a Task object with a Task ID:
a_task = Task.get_task(task_id='123456deadbeef')
  • Accessing a Task with a project / name:
a_task = Task.get_task(project_name='examples', task_name='artifacts')

Once a Task object is obtained, it's possible to query the state of the Task, reported scalars, etc. The Task's outputs, such as artifacts and models, can also be retrieved.

Querying \ Searching Tasks

Searching and filtering Tasks can be done via the web UI, but also programmatically. Input search parameters into the Task.get_tasks method, which returns a list of Task objects that match the search.

For example:

task_list = Task.get_tasks(
    task_ids=None,  # type Optional[Sequence[str]]
    project_name=None,  # Optional[str]
    task_name=None,  # Optional[str]
    task_filter=None  # Optional[Dict]
)

We can search for tasks by either their UUID or their project \ name combination It's possible to also filter Tasks by passing filtering rules to task_filter. For example:

task_filter={
    # only Tasks with tag `included_tag` and without tag `excluded_tag`
    'tags': ['included_tag', '-excluded_tag'],
    # filter out archived Tasks
    'system_tags': ['-archived'],
    # only completed & published Tasks
    'status': ['completed', 'published'],
    # only training type Tasks
    'type': ['training'],
    # match text in Task comment or task name
    'search_text': 'reg_exp_text'
}

Cloning & Executing Tasks

Once a Task object is created, it can be a copied (cloned). Task.clone returns a copy of the original Task (source_task). By default, the cloned Task is added to the same project as the original, and it's called "Clone Of ORIGINAL_NAME", but the name / project / comment of the cloned Task can be directly overridden.

cloned = Task.clone(
    source_task=task,  # type: Optional[Union[Task, str]]
    # override default name
    name='newly created task',  # type: Optional[str]
    comment=None,  # type: Optional[str]
    # insert cloned Task into a different project
    project=None,  # type: Optional[str]
)

A cloned Task starts in draft mode, so its Task configurations can be edited (see Task.set_parameters). Once a Task is modified, launch it by pushing it into an execution queue, then a ClearML Agent will pull it from the queue and execute the Task.

Task.enqueue(
    task=task,  # type: Union[Task, str]
    queue_name='default',  # type: Optional[str] 
    queue_id=None  # type: Optional[str]
)

See enqueue example.

Advanced Remote Execution

A compelling workflow is:

  1. Running code on the development machine for a few iterations, or just setting up the environment.
  2. Moving the execution to a beefier remote machine for the actual training.

For example, to stop the current manual execution, and then re-run it on a remote machine, simply add the following function call to the code:

task.execute_remotely(
    queue_name='default',  # type: Optional[str]
    clone=False,  # type: bool
    exit_process=True  # type: bool
)

Once the function is called on the machine, it will stop the local process and enqueue the current Task into the default queue. From there, an agent will be able to pick it up and launch it.

Remote Function Execution

A specific function can also be launched on a remote machine with create_function_task.

For example:

def run_me_remotely(some_argument):
    print(some_argument)

a_func_task = task.create_function_task(
    func=run_me_remotely,  # type: Callable
    func_name='func_id_run_me_remotely',  # type:Optional[str]
    task_name='a func task',  # type:Optional[str]
    # everything below will be passed directly to our function as arguments
    some_argument=123
)

Arguments passed to the function will be automatically logged under the Function section in the Hyperparameters tab. Like any other arguments, they can be changed from the UI or programmatically.

:::note Function Tasks must be created from within a regular Task, created by calling Task.init() :::

Task Lifecycle

ClearML Tasks are created in one of the following methods:

  • Manually running code that is instrumented with the ClearML SDK and invokes Task.init().
  • Cloning an existing task.
  • Creating a task via CLI using clearml-task.

Logging Task Information

image

The above diagram describes how execution information is recorded when running code instrumented with ClearML:

  1. Once a ClearML Task is initialized, ClearML automatically logs the complete environment information including:
    • Source code
    • Python environment
    • Configuration parameters.
  2. As the execution progresses, any outputs produced are recorded including:
    • Console logs
    • Metrics and graphs
    • Models and other artifacts
  3. Once the script terminates, the Task will change its status to either Completed, Failed, or Aborted.

All information logged can be viewed in the task details UI.

Cloning Tasks

image

The above diagram demonstrates how a previously run task can be used as a baseline for experimentation:

  1. A previously run task is cloned, creating a new task, in draft mode.
    The new task retains all of the source task's configuration. The original task's outputs are not carried over.
  2. The new task's configuration is modified to reflect the desired parameters for the new execution.
  3. The new task is enqueued for execution.
  4. A clearml-agent servicing the queue pulls the new task and executes it (where ClearML again logs all of the execution outputs).

Task states

The state of a Task represents its stage in the Task lifecycle. It indicates whether the Task is read-write (editable) or read-only. For each state, a state transition indicates which actions can be performed on an experiment, and the new state after performing an action.

The following table describes Task the states and state transitions.

State Description / Usage State Transition
Draft The experiment is editable. Only experiments in Draft mode are editable. The experiment is not running locally or remotely. If the experiment is enqueued for a worker to fetch and execute, the state becomes Pending.
Pending The experiment was enqueued and is waiting in a queue for a worker to fetch and execute it. If the experiment is dequeued, the state becomes Draft.
Running The experiment is running locally or remotely. If the experiment is manually or programmatically terminated, the state becomes Aborted.
Completed The experiment ran and terminated successfully. If the experiment is reset or cloned, the state of the cloned experiment or newly cloned experiment becomes Draft. Resetting deletes the logs and output of a previous run. Cloning creates an exact, editable copy.
Failed The experiment ran and terminated with an error. The same as Completed.
Aborted The experiment ran, and was manually or programmatically terminated. The same as Completed.
Published The experiment is read-only. Publish an experiment to prevent changes to its inputs and outputs. A Published experiment cannot be reset. If it is cloned, the state of the newly cloned experiment becomes Draft.

Task types

Tasks also have a type attribute, which denotes their purpose (Training / Testing / Data processing). This helps to further organize projects and ensure Tasks are easy to search and find. The default Task type is training. Available Task types are:

  • Experimentation

    • training, testing, inference
  • Other workflows

    • controller, optimizer
    • monitor, service, application
    • data_processing, qc
    • custom