Clarify pipelines (#10)

2025-06-26 18:17:44 +00:00 · 2021-07-14 10:39:45 +03:00 · 2021-07-14 10:39:45 +03:00 · 5662c42a64
commit 5662c42a64
parent 13b7ac4d82
1 changed files with 50 additions and 55 deletions
--- a/docs/fundamentals/pipelines.md
+++ b/docs/fundamentals/pipelines.md
@ -2,73 +2,68 @@
 title: Pipelines
 ---

-Users can automate [Tasks](task) to run consecutively or according to some logic by putting the Tasks into a pipeline. 
+Users can automate [Tasks](task) to run consecutively or according to some logic by putting the tasks into a pipeline. 
 Tasks in a pipeline can leverage other tasks' work products such as artifacts and parameters. 

 Pipelines are controlled by a *Controller Task* that holds the logic of the pipeline execution steps. 

 ## How do pipelines work? 

-Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. The user decides the controlling logic, whether it be simple 
-([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) or complex custom logic.
+Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. Pipelines are made 
+up of steps. Each step consists of a task that already exists in the ClearML Server and is used as a template. The 
+user decides the controlling logic of the step interactions, whether it be simple ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) 
+or more complex. 

-Once the pipeline is running, it first clones existing Tasks (called templates) and then sends the cloned Tasks for execution 
-according to the pipeline's control logic.
+Once the pipeline is running, it starts sequentially launching the steps configured in the Controller. In each step, the template task 
+is cloned, and the cloned task is sent for execution. Depending on the specifications laid out in the Controller Task, a 
+step's parameters can be overridden, and / or a step can use a previous step's work products. 
+
+Callbacks can be utilized to control pipeline execution flow. A callback can be defined 
+to be called before and / or after the execution of every task in a pipeline. Additionally, there is an option to 
+create customized, step-specific callbacks. 
+
+![Pipeline chart](../img/fundamentals_pipeline.png)

-![image](../img/fundamentals_pipeline.png)

 ## Simple DAG Pipelines

-For a simple, DAG based logic, use the off-the-shelf `PipelineController` class to define the DAG (see an example [here](../guides/pipeline/pipeline_controller)). Once PipelineController object is populated and configured, 
-we can start the pipeline, which will launch its first steps, then it waits until the pipeline is completed. 
+For a simple, DAG based logic, use the off-the-shelf [`PipelineController`](../references/sdk/automation_controller_pipelinecontroller.md) class to define the DAG (see an example 
+[here](../guides/pipeline/pipeline_controller)). Once the `PipelineController` object is populated and configured, 
+we can start the pipeline, which will begin executing the steps in succession, then it waits until the pipeline is completed. 
 The pipeline control logic is processed in a background thread. 

-## Custom Pipelines
-
-In cases where a DAG is insufficient (for example when needing to launch one pipeline, then, if performance is inadequate, rerun pipeline again), 
-users can apply custom logic, using a generic methods to enqueue Tasks, implemented in python code.
-
-The logic of the pipeline sits in a *Controller Task*.
-Since a pipeline *Controller Task* is a Task on its own, it's possible to have pipelines running other pipelines.
-This gives users greater degrees of freedom for automation. 
-
-Custom pipelines usually involves cloning existing Tasks (Template Tasks), modiftying their parameters and manually enqueuing 
-them to queues (For execution by [agents](../clearml_agent.md). Since it's possible to control Task's execution (Including 
-overriding Hyperparameters and Artifacts) and get output metrics, it's possible to create custom logic that controls inputs and acts upon outputs.
-
-A simple Custom pipeline may look like this:
-
-```python
-task = Task.init('examples', 'Simple Controller Task', task_type=Task.TaskTypes.controller)
-
-# Get a reference to the task to pipe to.
-first_task = Task.get_task(project_name='PROJECT NAME', task_name='TASK NAME')
-
-# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified.
-cloned_first_task = Task.clone(source_task=first_task, name='Auto generated cloned task')
-
-cloned_first_task.set_parameters({'key':val})
-
-Task.enqueue(cloned_first_task.id, queue_name='QUEUE NAME')
-
-# Here comes custom logic
-#
-#
-###
-
-# Get a reference to the task to pipe to.
-next_task = Task.get_task(project_name='SECOND PROJECT NAME', task_name='SECOND TASK NAME')
-
-# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified.
-cloned_task = Task.clone(source_task=next_task, name='Second Cloned Task')
-
-Task.enqueue(cloned_task.id, queue_name='QUEUE NAME')
-
-```
-
-See an example for custom pipelines [here](https://github.com/allegroai/clearml/tree/master/examples/automation)
-
 :::note
 We recommend enqueuing Pipeline Controller Tasks into a 
-[services](agents_and_queues#services-agent--queue) queue
-:::
+[services](agents_and_queues.md#services-agent--queue) queue
+:::
+
+Callback functions can be specified to be called in the steps of a `PipelineController` object. 
+There is an option to define a callback to be called before and / or after every step in the pipeline, 
+using the `step_task_created_callback` or the `step_task_completed_callback` parameters of the [`start`](../references/sdk/automation_controller_pipelinecontroller.md#start) 
+method. Alternatively, step-specific callback functions can be specified with the `pre_execute_callback` and / or 
+`post_execute_callback` parameters of the [`add_step`](../references/sdk/automation_controller_pipelinecontroller.md#add_step) 
+method. 
+
+## Advanced pipelines
+
+Since a pipeline *Controller Task* is itself a ClearML Task, it can be used as a pipeline step and can be used to create 
+more complicated workflows, such as pipelines running other pipelines, or a pipeline running multiple tasks concurrently.
+
+For example, it could be useful to have one pipeline for data preparation, which triggers a second pipeline that trains
+networks.
+
+It could also be useful to run a pipeline that runs tasks concurrently, training multiple networks with different hyperparameter
+values simultaneously. See the [Tabular training pipeline](../guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline.md) 
+example of a pipeline with concurrent steps. 
+
+## Custom Pipelines
+
+In cases where a DAG is insufficient (for example, when needing to launch one pipeline, then, if performance is inadequate, 
+rerun pipeline again), users can apply custom logic, using generic methods to enqueue tasks, implemented in python code.
+The custom logic of the pipeline sits in the *Controller Task*'s script.  
+
+Custom pipelines usually involve cloning template tasks, modifying their parameters, and manually enqueuing 
+them to queues (for execution by [agents](../clearml_agent.md)). It's possible to create custom logic that controls inputs 
+(e.g. overriding hyperparameters and artifacts) and acts upon task outputs.
+
+See an example of a custom pipeline [here](../guides/automation/task_piping.md).