mirror of
https://github.com/clearml/clearml-docs
synced 2025-05-15 09:56:15 +00:00
Clarify pipelines (#10)
This commit is contained in:
parent
13b7ac4d82
commit
5662c42a64
@ -2,73 +2,68 @@
|
|||||||
title: Pipelines
|
title: Pipelines
|
||||||
---
|
---
|
||||||
|
|
||||||
Users can automate [Tasks](task) to run consecutively or according to some logic by putting the Tasks into a pipeline.
|
Users can automate [Tasks](task) to run consecutively or according to some logic by putting the tasks into a pipeline.
|
||||||
Tasks in a pipeline can leverage other tasks' work products such as artifacts and parameters.
|
Tasks in a pipeline can leverage other tasks' work products such as artifacts and parameters.
|
||||||
|
|
||||||
Pipelines are controlled by a *Controller Task* that holds the logic of the pipeline execution steps.
|
Pipelines are controlled by a *Controller Task* that holds the logic of the pipeline execution steps.
|
||||||
|
|
||||||
## How do pipelines work?
|
## How do pipelines work?
|
||||||
|
|
||||||
Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. The user decides the controlling logic, whether it be simple
|
Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. Pipelines are made
|
||||||
([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) or complex custom logic.
|
up of steps. Each step consists of a task that already exists in the ClearML Server and is used as a template. The
|
||||||
|
user decides the controlling logic of the step interactions, whether it be simple ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph))
|
||||||
|
or more complex.
|
||||||
|
|
||||||
Once the pipeline is running, it first clones existing Tasks (called templates) and then sends the cloned Tasks for execution
|
Once the pipeline is running, it starts sequentially launching the steps configured in the Controller. In each step, the template task
|
||||||
according to the pipeline's control logic.
|
is cloned, and the cloned task is sent for execution. Depending on the specifications laid out in the Controller Task, a
|
||||||
|
step's parameters can be overridden, and / or a step can use a previous step's work products.
|
||||||
|
|
||||||
|
Callbacks can be utilized to control pipeline execution flow. A callback can be defined
|
||||||
|
to be called before and / or after the execution of every task in a pipeline. Additionally, there is an option to
|
||||||
|
create customized, step-specific callbacks.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
## Simple DAG Pipelines
|
## Simple DAG Pipelines
|
||||||
|
|
||||||
For a simple, DAG based logic, use the off-the-shelf `PipelineController` class to define the DAG (see an example [here](../guides/pipeline/pipeline_controller)). Once PipelineController object is populated and configured,
|
For a simple, DAG based logic, use the off-the-shelf [`PipelineController`](../references/sdk/automation_controller_pipelinecontroller.md) class to define the DAG (see an example
|
||||||
we can start the pipeline, which will launch its first steps, then it waits until the pipeline is completed.
|
[here](../guides/pipeline/pipeline_controller)). Once the `PipelineController` object is populated and configured,
|
||||||
|
we can start the pipeline, which will begin executing the steps in succession, then it waits until the pipeline is completed.
|
||||||
The pipeline control logic is processed in a background thread.
|
The pipeline control logic is processed in a background thread.
|
||||||
|
|
||||||
## Custom Pipelines
|
|
||||||
|
|
||||||
In cases where a DAG is insufficient (for example when needing to launch one pipeline, then, if performance is inadequate, rerun pipeline again),
|
|
||||||
users can apply custom logic, using a generic methods to enqueue Tasks, implemented in python code.
|
|
||||||
|
|
||||||
The logic of the pipeline sits in a *Controller Task*.
|
|
||||||
Since a pipeline *Controller Task* is a Task on its own, it's possible to have pipelines running other pipelines.
|
|
||||||
This gives users greater degrees of freedom for automation.
|
|
||||||
|
|
||||||
Custom pipelines usually involves cloning existing Tasks (Template Tasks), modiftying their parameters and manually enqueuing
|
|
||||||
them to queues (For execution by [agents](../clearml_agent.md). Since it's possible to control Task's execution (Including
|
|
||||||
overriding Hyperparameters and Artifacts) and get output metrics, it's possible to create custom logic that controls inputs and acts upon outputs.
|
|
||||||
|
|
||||||
A simple Custom pipeline may look like this:
|
|
||||||
|
|
||||||
```python
|
|
||||||
task = Task.init('examples', 'Simple Controller Task', task_type=Task.TaskTypes.controller)
|
|
||||||
|
|
||||||
# Get a reference to the task to pipe to.
|
|
||||||
first_task = Task.get_task(project_name='PROJECT NAME', task_name='TASK NAME')
|
|
||||||
|
|
||||||
# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified.
|
|
||||||
cloned_first_task = Task.clone(source_task=first_task, name='Auto generated cloned task')
|
|
||||||
|
|
||||||
cloned_first_task.set_parameters({'key':val})
|
|
||||||
|
|
||||||
Task.enqueue(cloned_first_task.id, queue_name='QUEUE NAME')
|
|
||||||
|
|
||||||
# Here comes custom logic
|
|
||||||
#
|
|
||||||
#
|
|
||||||
###
|
|
||||||
|
|
||||||
# Get a reference to the task to pipe to.
|
|
||||||
next_task = Task.get_task(project_name='SECOND PROJECT NAME', task_name='SECOND TASK NAME')
|
|
||||||
|
|
||||||
# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified.
|
|
||||||
cloned_task = Task.clone(source_task=next_task, name='Second Cloned Task')
|
|
||||||
|
|
||||||
Task.enqueue(cloned_task.id, queue_name='QUEUE NAME')
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
See an example for custom pipelines [here](https://github.com/allegroai/clearml/tree/master/examples/automation)
|
|
||||||
|
|
||||||
:::note
|
:::note
|
||||||
We recommend enqueuing Pipeline Controller Tasks into a
|
We recommend enqueuing Pipeline Controller Tasks into a
|
||||||
[services](agents_and_queues#services-agent--queue) queue
|
[services](agents_and_queues.md#services-agent--queue) queue
|
||||||
:::
|
:::
|
||||||
|
|
||||||
|
Callback functions can be specified to be called in the steps of a `PipelineController` object.
|
||||||
|
There is an option to define a callback to be called before and / or after every step in the pipeline,
|
||||||
|
using the `step_task_created_callback` or the `step_task_completed_callback` parameters of the [`start`](../references/sdk/automation_controller_pipelinecontroller.md#start)
|
||||||
|
method. Alternatively, step-specific callback functions can be specified with the `pre_execute_callback` and / or
|
||||||
|
`post_execute_callback` parameters of the [`add_step`](../references/sdk/automation_controller_pipelinecontroller.md#add_step)
|
||||||
|
method.
|
||||||
|
|
||||||
|
## Advanced pipelines
|
||||||
|
|
||||||
|
Since a pipeline *Controller Task* is itself a ClearML Task, it can be used as a pipeline step and can be used to create
|
||||||
|
more complicated workflows, such as pipelines running other pipelines, or a pipeline running multiple tasks concurrently.
|
||||||
|
|
||||||
|
For example, it could be useful to have one pipeline for data preparation, which triggers a second pipeline that trains
|
||||||
|
networks.
|
||||||
|
|
||||||
|
It could also be useful to run a pipeline that runs tasks concurrently, training multiple networks with different hyperparameter
|
||||||
|
values simultaneously. See the [Tabular training pipeline](../guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline.md)
|
||||||
|
example of a pipeline with concurrent steps.
|
||||||
|
|
||||||
|
## Custom Pipelines
|
||||||
|
|
||||||
|
In cases where a DAG is insufficient (for example, when needing to launch one pipeline, then, if performance is inadequate,
|
||||||
|
rerun pipeline again), users can apply custom logic, using generic methods to enqueue tasks, implemented in python code.
|
||||||
|
The custom logic of the pipeline sits in the *Controller Task*'s script.
|
||||||
|
|
||||||
|
Custom pipelines usually involve cloning template tasks, modifying their parameters, and manually enqueuing
|
||||||
|
them to queues (for execution by [agents](../clearml_agent.md)). It's possible to create custom logic that controls inputs
|
||||||
|
(e.g. overriding hyperparameters and artifacts) and acts upon task outputs.
|
||||||
|
|
||||||
|
See an example of a custom pipeline [here](../guides/automation/task_piping.md).
|
||||||
|
Loading…
Reference in New Issue
Block a user