mirror of
https://github.com/clearml/clearml-docs
synced 2025-03-03 10:42:51 +00:00
Clarify pipelines (#10)
This commit is contained in:
parent
13b7ac4d82
commit
5662c42a64
@ -2,73 +2,68 @@
|
||||
title: Pipelines
|
||||
---
|
||||
|
||||
Users can automate [Tasks](task) to run consecutively or according to some logic by putting the Tasks into a pipeline.
|
||||
Users can automate [Tasks](task) to run consecutively or according to some logic by putting the tasks into a pipeline.
|
||||
Tasks in a pipeline can leverage other tasks' work products such as artifacts and parameters.
|
||||
|
||||
Pipelines are controlled by a *Controller Task* that holds the logic of the pipeline execution steps.
|
||||
|
||||
## How do pipelines work?
|
||||
|
||||
Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. The user decides the controlling logic, whether it be simple
|
||||
([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) or complex custom logic.
|
||||
Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. Pipelines are made
|
||||
up of steps. Each step consists of a task that already exists in the ClearML Server and is used as a template. The
|
||||
user decides the controlling logic of the step interactions, whether it be simple ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph))
|
||||
or more complex.
|
||||
|
||||
Once the pipeline is running, it first clones existing Tasks (called templates) and then sends the cloned Tasks for execution
|
||||
according to the pipeline's control logic.
|
||||
Once the pipeline is running, it starts sequentially launching the steps configured in the Controller. In each step, the template task
|
||||
is cloned, and the cloned task is sent for execution. Depending on the specifications laid out in the Controller Task, a
|
||||
step's parameters can be overridden, and / or a step can use a previous step's work products.
|
||||
|
||||
Callbacks can be utilized to control pipeline execution flow. A callback can be defined
|
||||
to be called before and / or after the execution of every task in a pipeline. Additionally, there is an option to
|
||||
create customized, step-specific callbacks.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
## Simple DAG Pipelines
|
||||
|
||||
For a simple, DAG based logic, use the off-the-shelf `PipelineController` class to define the DAG (see an example [here](../guides/pipeline/pipeline_controller)). Once PipelineController object is populated and configured,
|
||||
we can start the pipeline, which will launch its first steps, then it waits until the pipeline is completed.
|
||||
For a simple, DAG based logic, use the off-the-shelf [`PipelineController`](../references/sdk/automation_controller_pipelinecontroller.md) class to define the DAG (see an example
|
||||
[here](../guides/pipeline/pipeline_controller)). Once the `PipelineController` object is populated and configured,
|
||||
we can start the pipeline, which will begin executing the steps in succession, then it waits until the pipeline is completed.
|
||||
The pipeline control logic is processed in a background thread.
|
||||
|
||||
## Custom Pipelines
|
||||
|
||||
In cases where a DAG is insufficient (for example when needing to launch one pipeline, then, if performance is inadequate, rerun pipeline again),
|
||||
users can apply custom logic, using a generic methods to enqueue Tasks, implemented in python code.
|
||||
|
||||
The logic of the pipeline sits in a *Controller Task*.
|
||||
Since a pipeline *Controller Task* is a Task on its own, it's possible to have pipelines running other pipelines.
|
||||
This gives users greater degrees of freedom for automation.
|
||||
|
||||
Custom pipelines usually involves cloning existing Tasks (Template Tasks), modiftying their parameters and manually enqueuing
|
||||
them to queues (For execution by [agents](../clearml_agent.md). Since it's possible to control Task's execution (Including
|
||||
overriding Hyperparameters and Artifacts) and get output metrics, it's possible to create custom logic that controls inputs and acts upon outputs.
|
||||
|
||||
A simple Custom pipeline may look like this:
|
||||
|
||||
```python
|
||||
task = Task.init('examples', 'Simple Controller Task', task_type=Task.TaskTypes.controller)
|
||||
|
||||
# Get a reference to the task to pipe to.
|
||||
first_task = Task.get_task(project_name='PROJECT NAME', task_name='TASK NAME')
|
||||
|
||||
# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified.
|
||||
cloned_first_task = Task.clone(source_task=first_task, name='Auto generated cloned task')
|
||||
|
||||
cloned_first_task.set_parameters({'key':val})
|
||||
|
||||
Task.enqueue(cloned_first_task.id, queue_name='QUEUE NAME')
|
||||
|
||||
# Here comes custom logic
|
||||
#
|
||||
#
|
||||
###
|
||||
|
||||
# Get a reference to the task to pipe to.
|
||||
next_task = Task.get_task(project_name='SECOND PROJECT NAME', task_name='SECOND TASK NAME')
|
||||
|
||||
# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified.
|
||||
cloned_task = Task.clone(source_task=next_task, name='Second Cloned Task')
|
||||
|
||||
Task.enqueue(cloned_task.id, queue_name='QUEUE NAME')
|
||||
|
||||
```
|
||||
|
||||
See an example for custom pipelines [here](https://github.com/allegroai/clearml/tree/master/examples/automation)
|
||||
|
||||
:::note
|
||||
We recommend enqueuing Pipeline Controller Tasks into a
|
||||
[services](agents_and_queues#services-agent--queue) queue
|
||||
:::
|
||||
[services](agents_and_queues.md#services-agent--queue) queue
|
||||
:::
|
||||
|
||||
Callback functions can be specified to be called in the steps of a `PipelineController` object.
|
||||
There is an option to define a callback to be called before and / or after every step in the pipeline,
|
||||
using the `step_task_created_callback` or the `step_task_completed_callback` parameters of the [`start`](../references/sdk/automation_controller_pipelinecontroller.md#start)
|
||||
method. Alternatively, step-specific callback functions can be specified with the `pre_execute_callback` and / or
|
||||
`post_execute_callback` parameters of the [`add_step`](../references/sdk/automation_controller_pipelinecontroller.md#add_step)
|
||||
method.
|
||||
|
||||
## Advanced pipelines
|
||||
|
||||
Since a pipeline *Controller Task* is itself a ClearML Task, it can be used as a pipeline step and can be used to create
|
||||
more complicated workflows, such as pipelines running other pipelines, or a pipeline running multiple tasks concurrently.
|
||||
|
||||
For example, it could be useful to have one pipeline for data preparation, which triggers a second pipeline that trains
|
||||
networks.
|
||||
|
||||
It could also be useful to run a pipeline that runs tasks concurrently, training multiple networks with different hyperparameter
|
||||
values simultaneously. See the [Tabular training pipeline](../guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline.md)
|
||||
example of a pipeline with concurrent steps.
|
||||
|
||||
## Custom Pipelines
|
||||
|
||||
In cases where a DAG is insufficient (for example, when needing to launch one pipeline, then, if performance is inadequate,
|
||||
rerun pipeline again), users can apply custom logic, using generic methods to enqueue tasks, implemented in python code.
|
||||
The custom logic of the pipeline sits in the *Controller Task*'s script.
|
||||
|
||||
Custom pipelines usually involve cloning template tasks, modifying their parameters, and manually enqueuing
|
||||
them to queues (for execution by [agents](../clearml_agent.md)). It's possible to create custom logic that controls inputs
|
||||
(e.g. overriding hyperparameters and artifacts) and acts upon task outputs.
|
||||
|
||||
See an example of a custom pipeline [here](../guides/automation/task_piping.md).
|
||||
|
Loading…
Reference in New Issue
Block a user