mirror of
https://github.com/clearml/clearml-docs
synced 2025-01-31 14:37:18 +00:00
Clarify pipeline step caching (#474)
This commit is contained in:
parent
2cf096f7ec
commit
bc06f92614
@ -57,9 +57,15 @@ pipeline Task. To enable the automatic logging, use the `monitor_metrics`, `moni
|
||||
when creating a pipeline step.
|
||||
|
||||
### Pipeline Step Caching
|
||||
The Pipeline controller also offers step caching, meaning, reusing outputs of previously executed pipeline steps, in the
|
||||
case of exact same step code, and the same step input values. By default, pipeline steps are not cached. Enable caching
|
||||
when creating a pipeline step.
|
||||
The Pipeline controller supports step caching, meaning, reusing outputs of previously executed pipeline steps.
|
||||
|
||||
Cached pipeline steps are reused when they meet the following criteria:
|
||||
* The step code is the same, including environment setup (components in the task's [Execution](../webapp/webapp_exp_track_visual.md#execution)
|
||||
section, like required packages and docker image)
|
||||
* The step input arguments are unchanged, including step arguments and parameters (anything logged to the task's [Configuration](../webapp/webapp_exp_track_visual.md#configuration)
|
||||
section)
|
||||
|
||||
By default, pipeline steps are not cached. Enable caching when creating a pipeline step (for example, see [@PipelineDecorator.component](pipelines_sdk_function_decorators.md#pipelinedecoratorcomponent)).
|
||||
|
||||
When a step is cached, the step code is hashed, alongside the step’s parameters (as passed in runtime), into a single
|
||||
representing hash string. The pipeline first checks if a cached step exists in the system (archived Tasks will not be used
|
||||
|
@ -88,8 +88,9 @@ def step_one(pickle_data_url: str, extra: int = 43):
|
||||
* `return_values` - The artifact names for the step’s corresponding ClearML task to store the step’s returned objects.
|
||||
In the example above, a single object is returned and stored as an artifact named `data_frame`
|
||||
* `name` (Optional) - The name for the pipeline step. If not provided, the function name is used
|
||||
* `cache` - If `True`, the pipeline controller checks if an identical step with the same parameters was already executed.
|
||||
If found, its outputs are used instead of rerunning the step.
|
||||
* `cache` - If `True`, the pipeline controller checks if a step with the same code (including setup, see task [Execution](../webapp/webapp_exp_track_visual.md#execution)
|
||||
section) and input arguments was already executed. If found, the cached step's outputs are used
|
||||
instead of rerunning the step.
|
||||
* `packages` - A list of required packages or a local requirements.txt file. Example: `["tqdm>=2.1", "scikit-learn"]` or
|
||||
`"./requirements.txt"`. If not provided, packages are automatically added based on the imports used inside the function.
|
||||
* `execution_queue` (Optional) - Queue in which to enqueue the specific step. This overrides the queue set with the
|
||||
|
@ -75,7 +75,9 @@ pipe.add_step(
|
||||
* One of the following:
|
||||
* `base_task_project` and `base_task_name` - Project and name of the base task to clone
|
||||
* `base_task_id` - ID of the base task to clone
|
||||
* `cache_executed_step` – If `True`, the controller will check if an identical task with the same parameters was already executed. If it was found, its outputs will be used instead of launching a new task.
|
||||
* `cache_executed_step` – If `True`, the controller will check if an identical task with the same code (including setup,
|
||||
e.g. required packages, docker image, etc.) and input arguments was already executed. If found, the cached step's
|
||||
outputs are used instead of launching a new task.
|
||||
* `execution_queue` (Optional) - the queue to use for executing this specific step. If not provided, the task will be sent to the default execution queue, as defined on the class
|
||||
* `parents` – Optional list of parent steps in the pipeline. The current step in the pipeline will be sent for execution only after all the parent steps have been executed successfully.
|
||||
* `parameter_override` - Dictionary of parameters and values to override in the current step. See [parameter_override](#parameter_override).
|
||||
@ -141,8 +143,10 @@ pipe.add_function_step(
|
||||
* `function_kwargs` (optional) - A dictionary of function arguments and default values which are translated into task
|
||||
hyperparameters. If not provided, all function arguments are translated into hyperparameters.
|
||||
* `function_return` - The names for storing the pipeline step’s returned objects as artifacts in its ClearML task.
|
||||
* `cache_executed_step` - If `True`, the controller checks if an identical task with the same parameters was already
|
||||
executed. If it was found, its outputs are used instead of launching a new task.
|
||||
* `cache_executed_step` - If `True`, the controller will check if an identical task with the same code
|
||||
(including setup, see task [Execution](../webapp/webapp_exp_track_visual.md#execution)
|
||||
section) and input arguments was already executed. If found, the cached step's
|
||||
outputs are used instead of launching a new task.
|
||||
* `parents` – Optional list of parent steps in the pipeline. The current step in the pipeline will be sent for execution
|
||||
only after all the parent steps have been executed successfully.
|
||||
* `pre_execute_callback` & `post_execute_callback` - Control pipeline flow with callback functions that can be called
|
||||
|
Loading…
Reference in New Issue
Block a user