mirror of
https://github.com/clearml/clearml-docs
synced 2025-01-31 14:37:18 +00:00
139 lines
5.5 KiB
Markdown
139 lines
5.5 KiB
Markdown
---
|
|
title: Pipeline from Functions
|
|
---
|
|
|
|
The [pipeline_from_functions.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_functions.py)
|
|
example script demonstrates the creation of a pipeline using the [PipelineController](../../references/sdk/automation_controller_pipelinecontroller.md)
|
|
class.
|
|
|
|
This example creates a pipeline incorporating four tasks, three of which are created from a function:
|
|
* `step_one` - Downloads and processes data.
|
|
* `step_two` - Further processes the data from `step_one`.
|
|
* `step_three` - Uses the processed data from `step_two` to train a model.
|
|
|
|
The fourth task is the pipeline task, which is created when the pipeline is launched.
|
|
|
|
The step functions will be registered as pipeline steps when they are added to the pipeline controller.
|
|
|
|
When the pipeline steps are executed, corresponding ClearML Tasks are created. For this reason, each function which makes
|
|
up a pipeline step needs to be self-contained. Notice that all package imports inside the function will be automatically
|
|
logged as required packages for the pipeline execution step.
|
|
|
|
|
|
|
|
## Pipeline Controller
|
|
|
|
1. Create the [PipelineController](../../references/sdk/automation_controller_pipelinecontroller.md) object.
|
|
|
|
```python
|
|
pipe = PipelineController(
|
|
name='pipeline demo',
|
|
project='examples',
|
|
version='0.0.1',
|
|
add_pipeline_tags=False,
|
|
)
|
|
```
|
|
|
|
1. Set the default execution queue to be used. All the pipeline steps will be enqueued for execution in this queue
|
|
(unless overridden by the `execution_queue` parameter of the `add_function_step` method).
|
|
|
|
```python
|
|
pipe.set_default_execution_queue('default')
|
|
```
|
|
|
|
1. Add a pipeline level parameter that can be referenced from any step in the pipeline (see `step_one` below).
|
|
```python
|
|
pipe.add_parameter(
|
|
name='url',
|
|
description='url to pickle file',
|
|
default='https://github.com/allegroai/events/raw/master/odsc20-east/generic/iris_dataset.pkl'
|
|
)
|
|
```
|
|
|
|
1. Build the pipeline (see [`PipelineController.add_function_step`](../../references/sdk/automation_controller_pipelinecontroller.md#add_function_step)
|
|
for complete reference).
|
|
|
|
The first step in the pipeline uses the `step_one` function and uses as its input the pipeline level argument defined
|
|
above. Its return object will be stored as an artifact under the name `data_frame`.
|
|
|
|
```python
|
|
pipe.add_function_step(
|
|
name='step_one',
|
|
function=step_one,
|
|
function_kwargs=dict(pickle_data_url='${pipeline.url}'),
|
|
function_return=['data_frame'],
|
|
cache_executed_step=True,
|
|
)
|
|
```
|
|
|
|
The second step in the pipeline uses the `step_two` function and uses as its input the first step's output.This reference
|
|
implicitly defines the pipeline structure, making `step_one` the parent step of `step_two`.
|
|
|
|
Its return object will be stored as an artifact under the name `processed_data`.
|
|
|
|
```python
|
|
pipe.add_function_step(
|
|
name='step_two',
|
|
# parents=['step_one'], # the pipeline will automatically detect the dependencies based on the kwargs inputs
|
|
function=step_two,
|
|
function_kwargs=dict(data_frame='${step_one.data_frame}'),
|
|
function_return=['processed_data'],
|
|
cache_executed_step=True,
|
|
)
|
|
```
|
|
|
|
The third step in the pipeline uses the `step_three` function and uses as its input the second step's output. This
|
|
reference implicitly defines the pipeline structure, making `step_two` the parent step of `step_three`.
|
|
|
|
Its return object will be stored as an artifact under the name `model`:
|
|
|
|
```python
|
|
pipe.add_function_step(
|
|
name='step_three',
|
|
# parents=['step_two'], # the pipeline will automatically detect the dependencies based on the kwargs inputs
|
|
function=step_three,
|
|
function_kwargs=dict(data='${step_two.processed_data}'),
|
|
function_return=['model'],
|
|
cache_executed_step=True,
|
|
)
|
|
```
|
|
|
|
1. Run the pipeline.
|
|
```python
|
|
pipe.start()
|
|
```
|
|
|
|
The pipeline will be launched remotely, through the `services` queue, unless otherwise specified.
|
|
|
|
## WebApp
|
|
When the experiment is executed, the console output displays the task ID, and links to the pipeline controller task page and pipeline page.
|
|
|
|
```
|
|
ClearML Task: created new task id=bc93610688f242ecbbe70f413ff2cf5f
|
|
ClearML results page: https://app.clear.ml/projects/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f/output/log
|
|
ClearML pipeline page: https://app.clear.ml/pipelines/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f
|
|
```
|
|
|
|
The pipeline run's page contains the pipeline's structure, the execution status of every step, as well as the run's
|
|
configuration parameters and output.
|
|
|
|
![Pipeline DAG](../../img/pipeline_from_functions_DAG.png)
|
|
|
|
To view a run's complete information, click **Full details** on the bottom of the **Run Info** panel, which will open the
|
|
pipeline's [controller task page](../../webapp/webapp_exp_track_visual.md).
|
|
|
|
Click a step to see an overview of its details.
|
|
|
|
![Pipeline step info](../../img/pipeline_from_functions_step_info.png)
|
|
|
|
## Console and Code
|
|
|
|
Click **DETAILS** to view a log of the pipeline controller's console output.
|
|
|
|
![Pipeline console](../../img/pipeline_from_functions_console.png)
|
|
|
|
Click on a step to view its console output. You can also view the selected step's code by clicking **CODE**
|
|
on top of the console log.
|
|
|
|
![Pipeline step code](../../img/pipeline_from_functions_code.png)
|