diff --git a/docs/guides/pipeline/pipeline_controller.md b/docs/guides/pipeline/pipeline_controller.md index 316e2db4..9f6796eb 100644 --- a/docs/guides/pipeline/pipeline_controller.md +++ b/docs/guides/pipeline/pipeline_controller.md @@ -40,29 +40,42 @@ The sections below describe in more detail what happens in the controller task a add_pipeline_tags=False, ) ``` - * `name` - Name the pipeline controller task - * `project` - Project where pipeline controller and tasks will be stored - * `version` - Provide a pipeline version. If `auto_version_bump` is set to `True`, then the version number will be - automatically bumped if the same version already exists. - * `add_pipeline_tags` - If `True`, then all pipeline steps are tagged with `pipe: ` - -1. Add Step 1 using the [PipelineController.add_step](../../references/sdk/automation_controller_pipelinecontroller.md#add_step) - method. - - ```python - pipe.add_step(name='stage_data', base_task_project='examples', base_task_name='pipeline step 1 dataset artifact') - ``` - - * `name` - The name of Step 1 (`stage_data`). - * `base_task_project` and `base_task_name` - Step 1's base Task to clone (the cloned Task will be executed when the pipeline runs). -1. Add Step 2. +1. Set the default execution queue to be used. All the pipeline steps will be enqueued for execution in this queue. + + ```python + pipe.set_default_execution_queue('default') + ``` +1. Build the pipeline (see [PipelineController.add_step](../../references/sdk/automation_controller_pipelinecontroller.md#add_step) + method for complete reference): + + The pipeline’s [first step](#step-1---downloading-the-datae) uses the pre-existing task + `pipeline step 1 dataset artifact` in the `examples` project. The step uploads local data and stores it as an artifact. + ```python pipe.add_step( - name='stage_process', - parents=['stage_data', ], + name='stage_data', base_task_project='examples', + base_task_name='pipeline step 1 dataset artifact' + ) + ``` + + The [second step](#step-2---processing-the-data) uses the pre-existing task `pipeline step 2 process dataset` in + the `examples` project. The second step’s dependency upon the first step’s completion is designated by setting it as + its parent. + + Custom configuration values specific to this step execution are defined through the `parameter_override` parameter, + where the first step’s artifact is fed into the second step. + + Special pre-execution and post-execution logic is added for this step through the use of `pre_execute_callback` + and `post_execute_callback` respectively. + + ```python + pipe.add_step( + name='stage_process', + parents=['stage_data', ], + base_task_project='examples', base_task_name='pipeline step 2 process dataset', parameter_override={ 'General/dataset_url': '${stage_data.artifacts.dataset.url}', @@ -71,69 +84,44 @@ The sections below describe in more detail what happens in the controller task a pre_execute_callback=pre_execute_callback_example, post_execute_callback=post_execute_callback_example ) - ``` - In addition to the parameters included in Step 1, input the following: - * `parents` - The names of the steps, which the current step depends upon their completion to begin execution. In this - instance, the execution of Step 2 (`stage_process`) depends upon the completion of Step 1 (`stage_data`). - * `parameter_override` - Pass the URL of the data artifact from Step 1 to Step 2. Override the value of the parameter - whose key is `dataset_url` (in the parameter group named `General`). Override it with the URL of the artifact named - `dataset`. Also override the test size. + ``` - :::important Syntax of the parameter_override Value - For other examples of ``parameter_override`` syntax, see [PipelineController.add_step](../../references/sdk/automation_controller_pipelinecontroller.md#add_step). - ::: + The [third step](#step-3---training-the-network) uses the pre-existing task `pipeline step 3 train model` in the + `examples` projects. The step uses Step 2’s artifacts. - * `pre_execute_callback` - The pipeline controller will execute the input callback function before the pipeline step is - executed. If the callback function returns `False`, the pipeline step will be skipped. - * `post_execute_callback` - The pipeline controller will execute the input callback function after the pipeline step is - executed. - -1. Add Step 3. - - ```python - pipe.add_step( - name='stage_train', - parents=['stage_process', ], - base_task_project='examples', - base_task_name='pipeline step 3 train model', - parameter_override={'General/dataset_task_id': '${stage_process.id}'}) - ``` - - * `name` - The name of Step 3 (`stage_train`). - * `parents` - The start of Step 3 (`stage_train`) depends upon the completion of Step 2 (`stage_process`). - * `parameter_override` - Pass the ID of the Step 2 Task to the Step 3 Task. This is the ID of the cloned Task, not the base Task. - 1. Run the pipeline. + ```python - # Starting the pipeline (in the background) pipe.start() ``` + The pipeline launches remotely, through the services queue, unless otherwise specified. + ## Step 1 - Downloading the Data -In the Step 1 Task ([step1_dataset_artifact.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step1_dataset_artifact.py)): -1. Clone base Task and enqueue it for execution using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely). - ```python - task.execute_remotely() - ``` +The pipeline’s first step ([step1_dataset_artifact.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step1_dataset_artifact.py)) +does the following: -1. Download data and store it as an artifact named `dataset`. This is the same artifact name used in `parameter_override` -when the [`add_step`](../../references/sdk/automation_controller_pipelinecontroller.md#add_step) method is called in the pipeline controller. - +1. Download data using [`StorageManager.get_local_copy`](../../references/sdk/storage.md#storagemanagerget_local_copy) + ```python # simulate local dataset, download one, so we have something local local_iris_pkl = StorageManager.get_local_copy( remote_url='https://github.com/allegroai/events/raw/master/odsc20-east/generic/iris_dataset.pkl' ) - + ``` +1. Store the data as an artifact named `dataset` using [`Task.upload_artifact`](../../references/sdk/task.md#upload_artifact) + ```python # add and upload local file containing our toy dataset task.upload_artifact('dataset', artifact_object=local_iris_pkl) ``` ## Step 2 - Processing the Data -In the Step 2 Task ([step2_data_processing.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step2_data_processing.py)): -1. Create a parameter dictionary and connect it to the Task. +The pipeline's second step ([step2_data_processing.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step2_data_processing.py)) +does the following: + +1. Connect its configuration parameters with the ClearML task: ```python args = { @@ -147,22 +135,14 @@ In the Step 2 Task ([step2_data_processing.py](https://github.com/allegroai/clea task.connect(args) ``` - The parameter `dataset_url` is the same parameter name used by `parameter_override` when the [`add_step`](../../references/sdk/automation_controller_pipelinecontroller.md#add_step) - method is called in the pipeline controller. - -1. Clone base Task and enqueue it for execution using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely). - - ```python - task.execute_remotely() - ``` - -1. Later in Step 2, the Task uses the URL in the parameter dictionary to get the data. +1. Download the data created in the previous step (specified through the `dataset_url` parameter) using + [`StorageManager.get_local_copy`](../../references/sdk/storage.md#storagemanagerget_local_copy) ```python iris_pickle = StorageManager.get_local_copy(remote_url=args['dataset_url']) ``` -1. Task Processes data and then stores the processed data as artifacts. +1. Generate testing and training sets from the data and store them as artifacts. ```python task.upload_artifact('X_train', X_train) @@ -173,8 +153,10 @@ In the Step 2 Task ([step2_data_processing.py](https://github.com/allegroai/clea ## Step 3 - Training the Network -In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step3_train_model.py)): -1. Create a parameter dictionary and connect it to the Task. +The pipeline's third step ([step3_train_model.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step3_train_model.py)) +does the following: +1. Connect its configuration parameters with the ClearML task. This allows the [pipeline controller](#the-pipeline-controller) + to override the `dataset_task_id` value as the pipeline is run. ```python # Arguments @@ -183,16 +165,14 @@ In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/ } task.connect(args) ``` - - The parameter `dataset_task_id` is later overridden by the ID of the Step 2 Task (cloned Task, not base Task). - -1. Clone the Step 3 base Task and enqueue it using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely). + +1. Clone the base task and enqueue it using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely). ```python task.execute_remotely() ``` -1. Use the Step 2 Task ID to get the processed data stored in artifacts. +1. Access the data created in the previous task. ```python dataset_task = Task.get_task(task_id=args['dataset_task_id']) @@ -202,25 +182,24 @@ In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/ y_test = dataset_task.artifacts['y_test'].get() ``` -1. Train the network and log plots, along with ClearML automatic logging. +1. Train the network and log plots. ## Running the Pipeline **To run the pipeline:** -1. Run the script for each of the steps, if the script has not run once before. - - python step1_dataset_artifact.py - python step2_data_processing.py - python step3_train_model.py - -1. Run the pipeline controller one of the following two ways: - - * Run the script. - - python pipeline_from_tasks.py - - * Remotely execute the Task - If the Task `pipeline demo` in the project `examples` already exists in ClearML Server, clone it and enqueue it to execute. +1. If the pipeline steps tasks do not yet exist, run their code to create the ClearML tasks. + ```bash + python step1_dataset_artifact.py + python step2_data_processing.py + python step3_train_model.py + ``` + +1. Run the pipeline controller. + + ```bash + python pipeline_from_tasks.py + ``` :::note If you enqueue a Task, make sure an [agent](../../clearml_agent.md) is assigned to the queue, so @@ -228,6 +207,34 @@ In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/ ::: -The plot appears in **RESULTS** > **PLOTS** describing the pipeline. Hover over a step in the pipeline, and view the name of the step and the parameters overridden by the step. +## WebApp + +When the experiment is executed, the terminal returns the task ID, and links to the pipeline controller task page and +pipeline page. + +``` +ClearML Task: created new task id=bc93610688f242ecbbe70f413ff2cf5f +ClearML results page: https://app.clear.ml/projects/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f/output/log +ClearML pipeline page: https://app.clear.ml/pipelines/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f +``` + +The pipeline run’s page contains the pipeline’s structure, the execution status of every step, as well as the run’s +configuration parameters and output. + +![Pipeline DAG](../../img/examples_pipeline_from_tasks_DAG.png) + +To view a run’s complete information, click **Full details** on the bottom of the **Run Info** panel, which will open +the pipeline’s [controller task page](../../webapp/webapp_exp_track_visual.md). + +Click a step to see its summary information. + +![Pipeline step info](../../img/examples_pipeline_from_tasks_step_info.png) + +### Console + +Click **DETAILS** to view a log of the pipeline controller’s console output. + +![Pipeline console](../../img/examples_pipeline_from_tasks_console.png) + +Click on a step to view its console output. -![image](../../img/pipeline_controller_01.png) \ No newline at end of file diff --git a/docs/guides/pipeline/pipeline_decorator.md b/docs/guides/pipeline/pipeline_decorator.md index 7b9a4a3a..cca16709 100644 --- a/docs/guides/pipeline/pipeline_decorator.md +++ b/docs/guides/pipeline/pipeline_decorator.md @@ -69,33 +69,33 @@ To run the pipeline, call the pipeline controller function. ## WebApp -### Pipeline Controller -The pipeline controller’s **CONFIGURATION** page contains the pipeline structure and step definitions in its **Configuration Objects** -section. +When the experiment is executed, the terminal returns the task ID, and links to the pipeline controller task page and pipeline page. -The **Pipeline** configuration object contains the pipeline structure and execution parameters. +``` +ClearML Task: created new task id=bc93610688f242ecbbe70f413ff2cf5f +ClearML results page: https://app.clear.ml/projects/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f/output/log +ClearML pipeline page: https://app.clear.ml/pipelines/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f +``` -![Pipeline configuration](../../img/pipeline_decorator_configurations.png) +The pipeline run’s page contains the pipeline’s structure, the execution status of every step, as well as the run’s +configuration parameters and output. -An additional configuration object per pipeline step contains the step’s definitions and execution parameters. +![Pipeline DAG](../../img/examples_pipeline_from_decorator_DAG.png) -The pipeline controller’s **RESULTS > PLOTS** page provides summary details for the pipeline execution. +To view a run’s complete information, click **Full details** on the bottom of the **Run Info** panel, which will open the +pipeline’s [controller task page](../../webapp/webapp_exp_track_visual.md). -The **Execution Flow** graphically summarizes the pipeline's execution. Hover over each step to view its details. +Click a step to see an overview of its details. -![Pipeline execution flow plot](../../img/pipeline_decorator_plot_1.png) +![Pipeline step info](../../img/examples_pipeline_from_decorator_step_info.png) -The **Execution Details** table provides the pipeline execution details in table format. +## Console and Code -![Pipeline execution details plot](../../img/pipeline_decorator_plot_2.png) +Click **DETAILS** to view a log of the pipeline controller’s console output. -### Pipeline Steps -Each function step’s arguments are stored in their respective task’s **CONFIGURATION > HYPER PARAMETERS > kwargs**. +![Pipeline console](../../img/examples_pipeline_from_decorator_console.png) -![Pipeline step configuration](../../img/pipeline_decorator_step_configuration.png) +Click on a step to view its console output. You can also view the selected step’s code by clicking **CODE** +on top of the console log. - -Values that were listed in the `return_values`parameter of the `@PipelineDecorator.component` decorator are stored as -artifacts in the relevant step's task. These artifacts can be viewed in the step task’s **ARTIFACTS** tab. - -![Pipeline step artifacts](../../img/pipeline_decorator_step_artifacts.png) +![Pipeline step code](../../img/examples_pipeline_from_decorator_code.png) diff --git a/docs/guides/pipeline/pipeline_functions.md b/docs/guides/pipeline/pipeline_functions.md index ac20a7fa..5fee3358 100644 --- a/docs/guides/pipeline/pipeline_functions.md +++ b/docs/guides/pipeline/pipeline_functions.md @@ -106,32 +106,33 @@ logged as required packages for the pipeline execution step. The pipeline will be launched remotely, through the `services` queue, unless otherwise specified. ## WebApp -### Pipeline Controller -The pipeline controller’s **CONFIGURATION** page contains the pipeline structure and step definitions in its **Configuration Objects** -section. +When the experiment is executed, the terminal returns the task ID, and links to the pipeline controller task page and pipeline page. -The **Pipeline** configuration object contains the pipeline structure and execution parameters. +``` +ClearML Task: created new task id=bc93610688f242ecbbe70f413ff2cf5f +ClearML results page: https://app.clear.ml/projects/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f/output/log +ClearML pipeline page: https://app.clear.ml/pipelines/462f48dba7b441ffb34bddb783711da7/experiments/bc93610688f242ecbbe70f413ff2cf5f +``` -![Pipeline configuration](../../img/pipeline_function_config.png) +The pipeline run’s page contains the pipeline’s structure, the execution status of every step, as well as the run’s +configuration parameters and output. -An additional configuration object per pipeline step contains the step’s definitions and execution parameters. +![Pipeline DAG](../../img/pipeline_from_functions_DAG.png) -The pipeline controller’s **RESULTS > PLOTS** page provides summary details for the pipeline execution. +To view a run’s complete information, click **Full details** on the bottom of the **Run Info** panel, which will open the +pipeline’s [controller task page](../../webapp/webapp_exp_track_visual.md). -The **Execution Flow** graphically summarizes the pipeline's execution. Hover over each step to view its details. +Click a step to see an overview of its details. -![Pipeline execution flow plot](../../img/pipeline_decorator_plot_1.png) +![Pipeline step info](../../img/pipeline_from_functions_step_info.png) -The **Execution Details** table provides the pipeline execution details in table format. +## Console and Code -![pipeline execution details plot](../../img/pipeline_function_plot.png) +Click **DETAILS** to view a log of the pipeline controller’s console output. -### Pipeline Steps -Each function step’s arguments are stored in their respective task’s **CONFIGURATION > HYPER PARAMETERS > kwargs**. +![Pipeline console](../../img/pipeline_from_functions_console.png) -![Pipeline step configurations](../../img/pipeline_function_step_configuration.png) +Click on a step to view its console output. You can also view the selected step’s code by clicking **CODE** +on top of the console log. -Values that were listed in the `return_values`parameter of the `PipelineDecorator.component` decorator are stored as -artifacts in the relevant step's task. These artifacts can be viewed in the step task’s ARTIFACTS tab. - -![Pipeline step artifacts](../../img/pipeline_decorator_step_artifacts.png) +![Pipeline step code](../../img/pipeline_from_functions_code.png) diff --git a/docs/img/examples_pipeline_from_decorator_DAG.png b/docs/img/examples_pipeline_from_decorator_DAG.png new file mode 100644 index 00000000..d0742530 Binary files /dev/null and b/docs/img/examples_pipeline_from_decorator_DAG.png differ diff --git a/docs/img/examples_pipeline_from_decorator_code.png b/docs/img/examples_pipeline_from_decorator_code.png new file mode 100644 index 00000000..90977c7b Binary files /dev/null and b/docs/img/examples_pipeline_from_decorator_code.png differ diff --git a/docs/img/examples_pipeline_from_decorator_console.png b/docs/img/examples_pipeline_from_decorator_console.png new file mode 100644 index 00000000..0b26de13 Binary files /dev/null and b/docs/img/examples_pipeline_from_decorator_console.png differ diff --git a/docs/img/examples_pipeline_from_decorator_step_info.png b/docs/img/examples_pipeline_from_decorator_step_info.png new file mode 100644 index 00000000..f5c24832 Binary files /dev/null and b/docs/img/examples_pipeline_from_decorator_step_info.png differ diff --git a/docs/img/examples_pipeline_from_tasks_DAG.png b/docs/img/examples_pipeline_from_tasks_DAG.png new file mode 100644 index 00000000..e9ace7c6 Binary files /dev/null and b/docs/img/examples_pipeline_from_tasks_DAG.png differ diff --git a/docs/img/examples_pipeline_from_tasks_console.png b/docs/img/examples_pipeline_from_tasks_console.png new file mode 100644 index 00000000..7a509ea3 Binary files /dev/null and b/docs/img/examples_pipeline_from_tasks_console.png differ diff --git a/docs/img/examples_pipeline_from_tasks_step_info.png b/docs/img/examples_pipeline_from_tasks_step_info.png new file mode 100644 index 00000000..0eeb2f8e Binary files /dev/null and b/docs/img/examples_pipeline_from_tasks_step_info.png differ diff --git a/docs/img/pipeline_from_functions_DAG.png b/docs/img/pipeline_from_functions_DAG.png new file mode 100644 index 00000000..65750816 Binary files /dev/null and b/docs/img/pipeline_from_functions_DAG.png differ diff --git a/docs/img/pipeline_from_functions_code.png b/docs/img/pipeline_from_functions_code.png new file mode 100644 index 00000000..f4ca5409 Binary files /dev/null and b/docs/img/pipeline_from_functions_code.png differ diff --git a/docs/img/pipeline_from_functions_console.png b/docs/img/pipeline_from_functions_console.png new file mode 100644 index 00000000..88c6299b Binary files /dev/null and b/docs/img/pipeline_from_functions_console.png differ diff --git a/docs/img/pipeline_from_functions_step_info.png b/docs/img/pipeline_from_functions_step_info.png new file mode 100644 index 00000000..e4fb7e7f Binary files /dev/null and b/docs/img/pipeline_from_functions_step_info.png differ