diff --git a/docs/fundamentals/pipelines.md b/docs/fundamentals/pipelines.md index 8dedbccb..df7976f6 100644 --- a/docs/fundamentals/pipelines.md +++ b/docs/fundamentals/pipelines.md @@ -2,73 +2,68 @@ title: Pipelines --- -Users can automate [Tasks](task) to run consecutively or according to some logic by putting the Tasks into a pipeline. +Users can automate [Tasks](task) to run consecutively or according to some logic by putting the tasks into a pipeline. Tasks in a pipeline can leverage other tasks' work products such as artifacts and parameters. Pipelines are controlled by a *Controller Task* that holds the logic of the pipeline execution steps. ## How do pipelines work? -Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. The user decides the controlling logic, whether it be simple -([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) or complex custom logic. +Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. Pipelines are made +up of steps. Each step consists of a task that already exists in the ClearML Server and is used as a template. The +user decides the controlling logic of the step interactions, whether it be simple ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) +or more complex. -Once the pipeline is running, it first clones existing Tasks (called templates) and then sends the cloned Tasks for execution -according to the pipeline's control logic. +Once the pipeline is running, it starts sequentially launching the steps configured in the Controller. In each step, the template task +is cloned, and the cloned task is sent for execution. Depending on the specifications laid out in the Controller Task, a +step's parameters can be overridden, and / or a step can use a previous step's work products. + +Callbacks can be utilized to control pipeline execution flow. A callback can be defined +to be called before and / or after the execution of every task in a pipeline. Additionally, there is an option to +create customized, step-specific callbacks. + +![Pipeline chart](../img/fundamentals_pipeline.png) -![image](../img/fundamentals_pipeline.png) ## Simple DAG Pipelines -For a simple, DAG based logic, use the off-the-shelf `PipelineController` class to define the DAG (see an example [here](../guides/pipeline/pipeline_controller)). Once PipelineController object is populated and configured, -we can start the pipeline, which will launch its first steps, then it waits until the pipeline is completed. +For a simple, DAG based logic, use the off-the-shelf [`PipelineController`](../references/sdk/automation_controller_pipelinecontroller.md) class to define the DAG (see an example +[here](../guides/pipeline/pipeline_controller)). Once the `PipelineController` object is populated and configured, +we can start the pipeline, which will begin executing the steps in succession, then it waits until the pipeline is completed. The pipeline control logic is processed in a background thread. -## Custom Pipelines - -In cases where a DAG is insufficient (for example when needing to launch one pipeline, then, if performance is inadequate, rerun pipeline again), -users can apply custom logic, using a generic methods to enqueue Tasks, implemented in python code. - -The logic of the pipeline sits in a *Controller Task*. -Since a pipeline *Controller Task* is a Task on its own, it's possible to have pipelines running other pipelines. -This gives users greater degrees of freedom for automation. - -Custom pipelines usually involves cloning existing Tasks (Template Tasks), modiftying their parameters and manually enqueuing -them to queues (For execution by [agents](../clearml_agent.md). Since it's possible to control Task's execution (Including -overriding Hyperparameters and Artifacts) and get output metrics, it's possible to create custom logic that controls inputs and acts upon outputs. - -A simple Custom pipeline may look like this: - -```python -task = Task.init('examples', 'Simple Controller Task', task_type=Task.TaskTypes.controller) - -# Get a reference to the task to pipe to. -first_task = Task.get_task(project_name='PROJECT NAME', task_name='TASK NAME') - -# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified. -cloned_first_task = Task.clone(source_task=first_task, name='Auto generated cloned task') - -cloned_first_task.set_parameters({'key':val}) - -Task.enqueue(cloned_first_task.id, queue_name='QUEUE NAME') - -# Here comes custom logic -# -# -### - -# Get a reference to the task to pipe to. -next_task = Task.get_task(project_name='SECOND PROJECT NAME', task_name='SECOND TASK NAME') - -# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified. -cloned_task = Task.clone(source_task=next_task, name='Second Cloned Task') - -Task.enqueue(cloned_task.id, queue_name='QUEUE NAME') - -``` - -See an example for custom pipelines [here](https://github.com/allegroai/clearml/tree/master/examples/automation) - :::note We recommend enqueuing Pipeline Controller Tasks into a -[services](agents_and_queues#services-agent--queue) queue -::: \ No newline at end of file +[services](agents_and_queues.md#services-agent--queue) queue +::: + +Callback functions can be specified to be called in the steps of a `PipelineController` object. +There is an option to define a callback to be called before and / or after every step in the pipeline, +using the `step_task_created_callback` or the `step_task_completed_callback` parameters of the [`start`](../references/sdk/automation_controller_pipelinecontroller.md#start) +method. Alternatively, step-specific callback functions can be specified with the `pre_execute_callback` and / or +`post_execute_callback` parameters of the [`add_step`](../references/sdk/automation_controller_pipelinecontroller.md#add_step) +method. + +## Advanced pipelines + +Since a pipeline *Controller Task* is itself a ClearML Task, it can be used as a pipeline step and can be used to create +more complicated workflows, such as pipelines running other pipelines, or a pipeline running multiple tasks concurrently. + +For example, it could be useful to have one pipeline for data preparation, which triggers a second pipeline that trains +networks. + +It could also be useful to run a pipeline that runs tasks concurrently, training multiple networks with different hyperparameter +values simultaneously. See the [Tabular training pipeline](../guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline.md) +example of a pipeline with concurrent steps. + +## Custom Pipelines + +In cases where a DAG is insufficient (for example, when needing to launch one pipeline, then, if performance is inadequate, +rerun pipeline again), users can apply custom logic, using generic methods to enqueue tasks, implemented in python code. +The custom logic of the pipeline sits in the *Controller Task*'s script. + +Custom pipelines usually involve cloning template tasks, modifying their parameters, and manually enqueuing +them to queues (for execution by [agents](../clearml_agent.md)). It's possible to create custom logic that controls inputs +(e.g. overriding hyperparameters and artifacts) and acts upon task outputs. + +See an example of a custom pipeline [here](../guides/automation/task_piping.md).