clearml-docs/task.md at 7e907a0ffcf93a6af9a88fe9ef84b2df1d1227da

mirror of https://github.com/clearml/clearml-docs synced 2025-01-31 06:27:22 +00:00

2021-07-13 17:03:38 +03:00

15 KiB

Raw Blame History

title
Task / Experiment

ClearML Task lies at the heart of ClearML's experiment manager.

A Task is a single code execution session, which can represent an experiment, a step in a workflow, a workflow controller, or any custom implementation you choose.

To transform an existing script into a ClearML Task, one must call the Task.init() method and specify a task name and its project. This creates a Task object that automatically captures code execution information as well as execution outputs.

All the information captured by a task is by default uploaded to the ClearML Server and it can be visualized in the ClearML WebApp (UI). ClearML can also be configured to upload model checkpoints, artifacts, and charts to cloud storage (see Storage).

In the UI and code, tasks are grouped into projects, which are logical entities similar to folders. Users can decide how to group tasks, though different models or objectives are usually grouped into different projects. Projects can be divided into sub-projects (and sub-sub-projects, etc.) just like files and subdirectories on a computer, making experiment organization easier. In the WebApp, every project has an Overview tab, where a project description can be written and shared.

Tasks that are in the system can be accessed and utilized with code. To access a task, it can be identified either by a project name & task name combination or by a unique ID.

It's possible to copy (clone) a task multiple times and to modify it for re-execution.

Task sections

The sections of ClearML Task are made up of the information that a task captures and stores, which consists of code execution details and execution outputs. This information is used for tracking and visualizing results, reproducing, tuning, and comparing experiments, and executing workflows.

The captured code execution information includes:

Git information
Uncommitted code modifications
Python environment
Execution configuration

The captured execution output includes:

To view a more in depth description of each task section, see Tracking Experiments and Visualizing Results.

Task types

Tasks have a type attribute, which denotes their purpose (Training / Testing / Data processing). This helps to further organize projects and ensure tasks are easy to search and find. The default task type is training. Available task types are:

Experimentation
- training, testing, inference
Other workflows
- controller, optimizer
- monitor, service, application
- data_processing, qc
- custom

Task lifecycle

ClearML Tasks are created in one of the following methods:

Manually running code that is instrumented with the ClearML SDK and invokes Task.init().
Cloning an existing task.
Creating a task via CLI using clearml-task.

Logging Task Information

The above diagram describes how execution information is recorded when running code instrumented with ClearML:

Once a ClearML Task is initialized, ClearML automatically logs the complete environment information including:
- Source code
- Python environment
- Configuration parameters.
As the execution progresses, any outputs produced are recorded including:
- Console logs
- Metrics and graphs
- Models and other artifacts
Once the script terminates, the task will change its status to either Completed, Failed, or Aborted (see Task states below).

All information logged can be viewed in the task details UI.

Cloning Tasks

The above diagram demonstrates how a previously run task can be used as a baseline for experimentation:

A previously run task is cloned, creating a new task, in Draft mode (see Task states below).
The new task retains all the source task's configuration. The original task's outputs are not carried over.
The new task's configuration is modified to reflect the desired parameters for the new execution.
The new task is enqueued for execution.
A clearml-agent servicing the queue pulls the new task and executes it (where ClearML again logs all the execution outputs).

Task states

The state of a Task represents its stage in the Task lifecycle. It indicates whether the Task is read-write (editable) or read-only. For each state, a state transition indicates which actions can be performed on an experiment, and the new state after performing an action.

The following table describes the Task states and state transitions.

State	Description / Usage	State Transition
Draft	The experiment is editable. Only experiments in Draft mode are editable. The experiment is not running locally or remotely.	If the experiment is enqueued for a worker to fetch and execute, the state becomes Pending.
Pending	The experiment was enqueued and is waiting in a queue for a worker to fetch and execute it.	If the experiment is dequeued, the state becomes Draft.
Running	The experiment is running locally or remotely.	If the experiment is manually or programmatically terminated, the state becomes Aborted.
Completed	The experiment ran and terminated successfully.	If the experiment is reset or cloned, the state of the cloned experiment or newly cloned experiment becomes Draft. Resetting deletes the logs and output of a previous run. Cloning creates an exact, editable copy.
Failed	The experiment ran and terminated with an error.	The same as Completed.
Aborted	The experiment ran, and was manually or programmatically terminated.	The same as Completed.
Published	The experiment is read-only. Publish an experiment to prevent changes to its inputs and outputs.	A Published experiment cannot be reset. If it is cloned, the state of the newly cloned experiment becomes Draft.

Usage

Task Creation

Task.init() is the main method used to create Tasks in ClearML. It will create a Task, and populate it with:

A link to the running git repository (including commit ID and local uncommitted changes)
Python packages used (i.e. directly imported Python packages, and the versions available on the machine)
Argparse arguments (default and specific to the current execution)
Reports to Tensorboard & Matplotlib and model checkpoints.

from clearml import Task


task = Task.init(
    project_name='example', 
    task_name='task template', 
    task_type=None,
    tags=None,
    reuse_last_task_id=True,
    continue_last_task=False,
    output_uri=None,
    auto_connect_arg_parser=True,
    auto_connect_frameworks=True,
    auto_resource_monitoring=True,
    auto_connect_streams=True,    
)

Once a Task is created, the Task object can be accessed from anywhere in the code by calling Task.current_task.

If multiple Tasks need to be created in the same process (for example, for logging multiple manual runs), make sure to close a Task, before initializing a new one. To close a task simply call task.close (see example here).

When initializing a Task, its project needs to be specified. If the project entered does not exist, it will be created. Projects can be divided into sub-projects, just like folders are broken into sub-folders. For example:

Task.init(project_name='main_project/sub_project', task_name='test')

Nesting projects works on multiple levels. For example: project_name=main_project/sub_project/sub_sub_project

Task Reuse

Every Task.init call will create a new Task for the current execution. In order to mitigate the clutter that a multitude of debugging Tasks might create, a Task will be reused if:

The last time it was executed (on this machine) was under 72 hours ago (configurable, see sdk.development.task_reuse_time_window_in_hours in the sdk.development section of the ClearML configuration reference)
The previous Task execution did not have any artifacts / models

It's possible to always create a new Task by passing reuse_last_task_id=False.

See full Task.init documentation here.

Empty Task Creation

A Task can also be created without the need to execute the code itself. Unlike the runtime detections, all the environment and configuration details needs to be provided explicitly.

For example:

task = Task.create(
    project_name='example', 
    task_name='task template',
    repo='https://github.com/allegroai/clearml.git',
    branch='master',
    script='examples/reporting/html_reporting.py',
    working_directory='.',
    docker=None,
)

See Task.create in the Python SDK reference.

Accessing Tasks

A Task can be identified by its project and name, and by a unique identifier (UUID string). The name and project of a Task can be changed after an experiment has been executed, but its ID can't be changed.

Programmatically, Task objects can be retrieved by querying the system based on either the Task ID or a project and name combination. If a project / name combination is used, and multiple Tasks have the exact same name, the function will return the last modified Task.

For example:

Accessing a Task object with a Task ID:

a_task = Task.get_task(task_id='123456deadbeef')

Accessing a Task with a project / name:

a_task = Task.get_task(project_name='examples', task_name='artifacts')

Once a Task object is obtained, it's possible to query the state of the Task, reported scalars, etc. The Task's outputs, such as artifacts and models, can also be retrieved.

Querying \ Searching Tasks

Searching and filtering Tasks can be done via the web UI, but also programmatically. Input search parameters into the Task.get_tasks method, which returns a list of Task objects that match the search.

For example:

task_list = Task.get_tasks(
    task_ids=None,  # type Optional[Sequence[str]]
    project_name=None,  # Optional[str]
    task_name=None,  # Optional[str]
    task_filter=None  # Optional[Dict]
)

It's possible to also filter Tasks by passing filtering rules to task_filter. For example:

task_filter={
    # only Tasks with tag `included_tag` and without tag `excluded_tag`
    'tags': ['included_tag', '-excluded_tag'],
    # filter out archived Tasks
    'system_tags': ['-archived'],
    # only completed & published Tasks
    'status': ['completed', 'published'],
    # only training type Tasks
    'type': ['training'],
    # match text in Task comment or task name
    'search_text': 'reg_exp_text'
}

Cloning & Executing Tasks

Once a Task object is created, it can be a copied (cloned). Task.clone returns a copy of the original Task (source_task). By default, the cloned Task is added to the same project as the original, and it's called "Clone Of ORIGINAL_NAME", but the name / project / comment of the cloned Task can be directly overridden.

cloned = Task.clone(
    source_task=task,  # type: Optional[Union[Task, str]]
    # override default name
    name='newly created task',  # type: Optional[str]
    comment=None,  # type: Optional[str]
    # insert cloned Task into a different project
    project=None,  # type: Optional[str]
)

A cloned Task starts in draft mode, so its Task configurations can be edited (see Task.set_parameters). Once a Task is modified, launch it by pushing it into an execution queue, then a ClearML Agent will pull it from the queue and execute the Task.

Task.enqueue(
    task=task,  # type: Union[Task, str]
    queue_name='default',  # type: Optional[str] 
    queue_id=None  # type: Optional[str]
)

See enqueue example.

Advanced Remote Execution

A compelling workflow is:

Running code on the development machine for a few iterations, or just setting up the environment.
Moving the execution to a beefier remote machine for the actual training.

For example, to stop the current manual execution, and then re-run it on a remote machine, simply add the following function call to the code:

task.execute_remotely(
    queue_name='default',  # type: Optional[str]
    clone=False,  # type: bool
    exit_process=True  # type: bool
)

Once the function is called on the machine, it will stop the local process and enqueue the current Task into the default queue. From there, an agent will be able to pick it up and launch it.

See the Remote Execution example.

Remote Function Execution

A specific function can also be launched on a remote machine with create_function_task.

For example:

def run_me_remotely(some_argument):
    print(some_argument)

a_func_task = task.create_function_task(
    func=run_me_remotely,  # type: Callable
    func_name='func_id_run_me_remotely',  # type:Optional[str]
    task_name='a func task',  # type:Optional[str]
    # everything below will be passed directly to our function as arguments
    some_argument=123
)

Arguments passed to the function will be automatically logged under the Function section in the Hyperparameters tab. Like any other arguments, they can be changed from the UI or programmatically.

:::note Function Tasks must be created from within a regular Task, created by calling Task.init() :::

15 KiB Raw Blame History

Task sections

Task types

Task lifecycle

Logging Task Information

Cloning Tasks

Task states

Usage

Task Creation

Task Reuse

Empty Task Creation

Accessing Tasks

Querying \ Searching Tasks

Cloning & Executing Tasks

Advanced Remote Execution

Remote Function Execution

15 KiB

Raw Blame History