Update pipeline examples (#229)

2025-06-26 18:17:44 +00:00 · 2022-04-11 09:08:02 +03:00 · 2022-04-11 09:08:02 +03:00 · 719cecc855
commit 719cecc855
parent 90ed85f9b9
14 changed files with 139 additions and 131 deletions
--- a/docs/guides/pipeline/pipeline_controller.md
+++ b/docs/guides/pipeline/pipeline_controller.md
@ -40,23 +40,36 @@ The sections below describe in more detail what happens in the controller task a
        add_pipeline_tags=False,
   )
   ```
   * `name` - Name the pipeline controller task
   * `project` - Project where pipeline controller and tasks will be stored 
   * `version` - Provide a pipeline version. If `auto_version_bump` is set to `True`, then the version number will be 
   automatically bumped if the same version already exists. 
   * `add_pipeline_tags` - If `True`, then all pipeline steps are tagged with `pipe: <pipeline_task_id>` 
-1. Add Step 1 using the [PipelineController.add_step](../../references/sdk/automation_controller_pipelinecontroller.md#add_step) 
+1. Set the default execution queue to be used. All the pipeline steps will be enqueued for execution in this queue.
   method.
   ```python
-   pipe.add_step(name='stage_data', base_task_project='examples', base_task_name='pipeline step 1 dataset artifact')
+   pipe.set_default_execution_queue('default')
   ```
-   * `name` - The name of Step 1 (`stage_data`).
+1. Build the pipeline (see [PipelineController.add_step](../../references/sdk/automation_controller_pipelinecontroller.md#add_step) 
-   * `base_task_project` and `base_task_name` - Step 1's base Task to clone (the cloned Task will be executed when the pipeline runs).
+   method for complete reference):
-1. Add Step 2.    
+   The pipeline’s [first step](#step-1---downloading-the-datae) uses the pre-existing task 
   `pipeline step 1 dataset artifact` in the `examples` project. The step uploads local data and stores it as an artifact.
   ```python
   pipe.add_step(
        name='stage_data', 
        base_task_project='examples', 
        base_task_name='pipeline step 1 dataset artifact'
   )
   ```
   The [second step](#step-2---processing-the-data) uses the pre-existing task `pipeline step 2 process dataset` in 
   the `examples` project. The second step’s dependency upon the first step’s completion is designated by setting it as 
   its parent. 
   Custom configuration values specific to this step execution are defined through the `parameter_override` parameter, 
   where the first step’s artifact is fed into the second step.
   Special pre-execution and post-execution logic is added for this step  through the use of `pre_execute_callback` 
   and  `post_execute_callback` respectively. 
   ```python
   pipe.add_step(
@ -72,68 +85,43 @@ The sections below describe in more detail what happens in the controller task a
        post_execute_callback=post_execute_callback_example
   )
   ```
   In addition to the parameters included in Step 1, input the following: 
   * `parents` - The names of the steps, which the current step depends upon their completion to begin execution. In this
     instance, the execution of Step 2 (`stage_process`) depends upon the completion of Step 1 (`stage_data`).
   * `parameter_override` - Pass the URL of the data artifact from Step 1 to Step 2. Override the value of  the parameter 
     whose key is `dataset_url` (in the parameter group named `General`). Override it with the URL of the artifact named 
     `dataset`. Also override the test size. 
-    :::important Syntax of the parameter_override Value
+   The [third step](#step-3---training-the-network) uses the pre-existing task `pipeline step 3 train model` in the 
-    For other examples of ``parameter_override`` syntax, see [PipelineController.add_step](../../references/sdk/automation_controller_pipelinecontroller.md#add_step).
+   `examples` projects. The step uses Step 2’s artifacts.
    :::
   * `pre_execute_callback` - The pipeline controller will execute the input callback function before the pipeline step is
   executed. If the callback function returns `False`, the pipeline step will be skipped.  
   *  `post_execute_callback` - The pipeline controller will execute the input callback function after the pipeline step is
   executed.
 1. Add Step 3.
    ```python
    pipe.add_step(
        name='stage_train', 
        parents=['stage_process', ],
        base_task_project='examples', 
        base_task_name='pipeline step 3 train model',
        parameter_override={'General/dataset_task_id': '${stage_process.id}'})
    ```
   * `name` - The name of Step 3 (`stage_train`).
   * `parents` - The start of Step 3 (`stage_train`) depends upon the completion of Step 2 (`stage_process`).
   * `parameter_override` - Pass the ID of the Step 2 Task to the Step 3 Task. This is the ID of the cloned Task, not the base Task.
 1. Run the pipeline.
   ```python
   # Starting the pipeline (in the background)
   pipe.start()
   ```
   The pipeline launches remotely, through the services queue, unless otherwise specified.
 ## Step 1 - Downloading the Data
-In the Step 1 Task ([step1_dataset_artifact.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step1_dataset_artifact.py)): 
+The pipeline’s first step ([step1_dataset_artifact.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step1_dataset_artifact.py))
-1. Clone base Task and enqueue it for execution using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely).
+does the following: 
   ```python
   task.execute_remotely()
   ```
-1. Download data and store it as an artifact named `dataset`. This is the same artifact name used in `parameter_override`
+1. Download data using [`StorageManager.get_local_copy`](../../references/sdk/storage.md#storagemanagerget_local_copy) 
 when the [`add_step`](../../references/sdk/automation_controller_pipelinecontroller.md#add_step) method is called in the pipeline controller.
   ```python
   # simulate local dataset, download one, so we have something local
   local_iris_pkl = StorageManager.get_local_copy(
       remote_url='https://github.com/allegroai/events/raw/master/odsc20-east/generic/iris_dataset.pkl'
   )
-    
+   ```    
 1. Store the data as an artifact named `dataset` using  [`Task.upload_artifact`](../../references/sdk/task.md#upload_artifact)
   ```python
   # add and upload local file containing our toy dataset
   task.upload_artifact('dataset', artifact_object=local_iris_pkl)
   ```
 ## Step 2 - Processing the Data
-In the Step 2 Task ([step2_data_processing.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step2_data_processing.py)): 
+The pipeline's second step ([step2_data_processing.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step2_data_processing.py))
-1. Create a parameter dictionary and connect it to the Task.
+does the following: 
 1. Connect its configuration parameters with the ClearML task:
   ```python 
   args = {
@ -147,22 +135,14 @@ In the Step 2 Task ([step2_data_processing.py](https://github.com/allegroai/clea
    task.connect(args)
   ```
-   The parameter `dataset_url` is the same parameter name used by `parameter_override` when the [`add_step`](../../references/sdk/automation_controller_pipelinecontroller.md#add_step) 
+1. Download the data created in the previous step (specified through the `dataset_url` parameter) using 
-   method is called in the pipeline controller.
+   [`StorageManager.get_local_copy`](../../references/sdk/storage.md#storagemanagerget_local_copy) 
 1. Clone base Task and enqueue it for execution using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely).
   ```python
   task.execute_remotely() 
   ```
 1. Later in Step 2, the Task uses the URL in the parameter dictionary to get the data.
   ```python
   iris_pickle = StorageManager.get_local_copy(remote_url=args['dataset_url'])
   ```
-1. Task Processes data and then stores the processed data as artifacts.
+1. Generate testing and training sets from the data and store them as artifacts.
   ```python
   task.upload_artifact('X_train', X_train)
@ -173,8 +153,10 @@ In the Step 2 Task ([step2_data_processing.py](https://github.com/allegroai/clea
 ## Step 3 - Training the Network
-In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step3_train_model.py)): 
+The pipeline's third step ([step3_train_model.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step3_train_model.py))
-1. Create a parameter dictionary and connect it to the Task.
+does the following: 
 1. Connect its configuration parameters with the ClearML task. This allows the [pipeline controller](#the-pipeline-controller) 
   to override the `dataset_task_id` value as the pipeline is run. 
   ```python
   # Arguments
@ -184,15 +166,13 @@ In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/
   task.connect(args)
   ```
-   The parameter `dataset_task_id` is later overridden by the ID of the Step 2 Task (cloned Task, not base Task). 
+1. Clone the base task and enqueue it using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely).
 1. Clone the Step 3 base Task and enqueue it using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely).
   ```python
   task.execute_remotely() 
   ```
-1. Use the Step 2 Task ID to get the processed data stored in artifacts.
+1. Access the data created in the previous task.
   ```python
   dataset_task = Task.get_task(task_id=args['dataset_task_id'])
@ -202,25 +182,24 @@ In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/
   y_test = dataset_task.artifacts['y_test'].get()
   ```
-1. Train the network and log plots, along with ClearML automatic logging.
+1. Train the network and log plots.
 ## Running the Pipeline
 **To run the pipeline:**
-1. Run the script for each of the steps, if the script has not run once before.
+1. If the pipeline steps tasks do not yet exist, run their code to create the ClearML tasks.
-
+   ```bash
   python step1_dataset_artifact.py
   python step2_data_processing.py
   python step3_train_model.py
   ``` 
-1. Run the pipeline controller one of the following two ways:
+1. Run the pipeline controller.
    * Run the script.
   ```bash
   python pipeline_from_tasks.py
-        
+   ```     
    * Remotely execute the Task - If the Task `pipeline demo` in the project `examples` already exists in ClearML Server, clone it and enqueue it to execute.
   :::note
   If you enqueue a Task, make sure an [agent](../../clearml_agent.md) is assigned to the queue, so 
@ -228,6 +207,34 @@ In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/
   :::
-The plot appears in **RESULTS** > **PLOTS** describing the pipeline. Hover over a step in the pipeline, and view the name of the step and the parameters overridden by the step.    
+## WebApp
 When the experiment is executed, the terminal returns the task ID, and links to the pipeline controller task page and 
 pipeline page. 
 ```
 ClearML Task: created new task id=bc93610688f242ecbbe70f413ff2cf5f
 ClearML results page: https://app.clear.ml/projects/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f/output/log
 ClearML pipeline page: https://app.clear.ml/pipelines/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f
 ```
 The pipeline run’s page contains the pipeline’s structure, the execution status of every step, as well as the run’s 
 configuration parameters and output.
 ![Pipeline DAG](../../img/examples_pipeline_from_tasks_DAG.png)
 To view a run’s complete information, click **Full details** on the bottom of the **Run Info** panel, which will open 
 the pipeline’s [controller task page](../../webapp/webapp_exp_track_visual.md).
 Click a step to see its summary information.
 ![Pipeline step info](../../img/examples_pipeline_from_tasks_step_info.png)
 ### Console
 Click **DETAILS** to view a log of the pipeline controller’s console output.
 ![Pipeline console](../../img/examples_pipeline_from_tasks_console.png)
 Click on a step to view its console output. 
 ![image](../../img/pipeline_controller_01.png)
--- a/docs/guides/pipeline/pipeline_decorator.md
+++ b/docs/guides/pipeline/pipeline_decorator.md
@ -69,33 +69,33 @@ To run the pipeline, call the pipeline controller function.
 ## WebApp
-### Pipeline Controller
+When the experiment is executed, the terminal returns the task ID, and links to the pipeline controller task page and pipeline page. 
 The pipeline controller’s **CONFIGURATION** page contains the pipeline structure and step definitions in its **Configuration Objects** 
 section.
-The **Pipeline** configuration object contains the pipeline structure and execution parameters.
+```
 ClearML Task: created new task id=bc93610688f242ecbbe70f413ff2cf5f
 ClearML results page: https://app.clear.ml/projects/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f/output/log
 ClearML pipeline page: https://app.clear.ml/pipelines/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f
 ```
-![Pipeline configuration](../../img/pipeline_decorator_configurations.png)
+The pipeline run’s page contains the pipeline’s structure, the execution status of every step, as well as the run’s 
 configuration parameters and output.
-An additional configuration object per pipeline step contains the step’s definitions and execution parameters.
+![Pipeline DAG](../../img/examples_pipeline_from_decorator_DAG.png)
-The pipeline controller’s **RESULTS > PLOTS** page provides summary details for the pipeline execution.
+To view a run’s complete information, click **Full details** on the bottom of the **Run Info** panel, which will open the 
 pipeline’s [controller task page](../../webapp/webapp_exp_track_visual.md).
-The **Execution Flow** graphically summarizes the pipeline's execution. Hover over each step to view its details.
+Click a step to see an overview of its details.
-![Pipeline execution flow plot](../../img/pipeline_decorator_plot_1.png)
+![Pipeline step info](../../img/examples_pipeline_from_decorator_step_info.png)
-The **Execution Details** table provides the pipeline execution details in table format. 
+## Console and Code
-![Pipeline execution details plot](../../img/pipeline_decorator_plot_2.png)
+Click **DETAILS** to view a log of the pipeline controller’s console output.   
-### Pipeline Steps 
+![Pipeline console](../../img/examples_pipeline_from_decorator_console.png)
 Each function step’s arguments are stored in their respective task’s **CONFIGURATION > HYPER PARAMETERS > kwargs**. 
-![Pipeline step configuration](../../img/pipeline_decorator_step_configuration.png)
+Click on a step to view its console output. You can also view the selected step’s code by clicking **CODE**
 on top of the console log.
-
+![Pipeline step code](../../img/examples_pipeline_from_decorator_code.png)
 Values that were listed in the `return_values`parameter of the `@PipelineDecorator.component` decorator are stored as 
 artifacts in the relevant step's task. These artifacts can be viewed in the step task’s **ARTIFACTS** tab. 
 ![Pipeline step artifacts](../../img/pipeline_decorator_step_artifacts.png)
--- a/docs/guides/pipeline/pipeline_functions.md
+++ b/docs/guides/pipeline/pipeline_functions.md
@ -106,32 +106,33 @@ logged as required packages for the pipeline execution step.
   The pipeline will be launched remotely, through the `services` queue, unless otherwise specified.  
 ## WebApp
-### Pipeline Controller
+When the experiment is executed, the terminal returns the task ID, and links to the pipeline controller task page and pipeline page. 
 The pipeline controller’s **CONFIGURATION** page contains the pipeline structure and step definitions in its **Configuration Objects** 
 section.
-The **Pipeline** configuration object contains the pipeline structure and execution parameters.
+```
 ClearML Task: created new task id=bc93610688f242ecbbe70f413ff2cf5f
 ClearML results page: https://app.clear.ml/projects/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f/output/log
 ClearML pipeline page: https://app.clear.ml/pipelines/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f
 ```
-![Pipeline configuration](../../img/pipeline_function_config.png)
+The pipeline run’s page contains the pipeline’s structure, the execution status of every step, as well as the run’s 
 configuration parameters and output.
-An additional configuration object per pipeline step contains the step’s definitions and execution parameters.
+![Pipeline DAG](../../img/pipeline_from_functions_DAG.png)
-The pipeline controller’s **RESULTS > PLOTS** page provides summary details for the pipeline execution.
+To view a run’s complete information, click **Full details** on the bottom of the **Run Info** panel, which will open the 
 pipeline’s [controller task page](../../webapp/webapp_exp_track_visual.md).
-The **Execution Flow** graphically summarizes the pipeline's execution. Hover over each step to view its details.
+Click a step to see an overview of its details.
-![Pipeline execution flow plot](../../img/pipeline_decorator_plot_1.png)
+![Pipeline step info](../../img/pipeline_from_functions_step_info.png)
-The **Execution Details** table provides the pipeline execution details in table format. 
+## Console and Code
-![pipeline execution details plot](../../img/pipeline_function_plot.png)
+Click **DETAILS** to view a log of the pipeline controller’s console output.   
-### Pipeline Steps 
+![Pipeline console](../../img/pipeline_from_functions_console.png)
 Each function step’s arguments are stored in their respective task’s **CONFIGURATION > HYPER PARAMETERS > kwargs**. 
-![Pipeline step configurations](../../img/pipeline_function_step_configuration.png)
+Click on a step to view its console output. You can also view the selected step’s code by clicking **CODE**
 on top of the console log.
-Values that were listed in the `return_values`parameter of the `PipelineDecorator.component` decorator are stored as 
+![Pipeline step code](../../img/pipeline_from_functions_code.png)
 artifacts in the relevant step's task. These artifacts can be viewed in the step task’s ARTIFACTS tab. 
 ![Pipeline step artifacts](../../img/pipeline_decorator_step_artifacts.png)
--- a/docs/img/examples_pipeline_from_decorator_DAG.png
+++ b/docs/img/examples_pipeline_from_decorator_DAG.png
--- a/docs/img/examples_pipeline_from_decorator_code.png
+++ b/docs/img/examples_pipeline_from_decorator_code.png
--- a/docs/img/examples_pipeline_from_decorator_console.png
+++ b/docs/img/examples_pipeline_from_decorator_console.png
--- a/docs/img/examples_pipeline_from_decorator_step_info.png
+++ b/docs/img/examples_pipeline_from_decorator_step_info.png
--- a/docs/img/examples_pipeline_from_tasks_DAG.png
+++ b/docs/img/examples_pipeline_from_tasks_DAG.png
--- a/docs/img/examples_pipeline_from_tasks_console.png
+++ b/docs/img/examples_pipeline_from_tasks_console.png
--- a/docs/img/examples_pipeline_from_tasks_step_info.png
+++ b/docs/img/examples_pipeline_from_tasks_step_info.png
--- a/docs/img/pipeline_from_functions_DAG.png
+++ b/docs/img/pipeline_from_functions_DAG.png
--- a/docs/img/pipeline_from_functions_code.png
+++ b/docs/img/pipeline_from_functions_code.png
--- a/docs/img/pipeline_from_functions_console.png
+++ b/docs/img/pipeline_from_functions_console.png
--- a/docs/img/pipeline_from_functions_step_info.png
+++ b/docs/img/pipeline_from_functions_step_info.png