mirror of
https://github.com/clearml/clearml
synced 2025-03-13 15:20:50 +00:00
parent
4a30e32f73
commit
28c54524d8
@ -180,14 +180,15 @@ class PipelineController(object):
|
|||||||
:param add_run_number: If True (default), add the run number of the pipeline to the pipeline name.
|
:param add_run_number: If True (default), add the run number of the pipeline to the pipeline name.
|
||||||
Example, the second time we launch the pipeline "best pipeline", we rename it to "best pipeline #2"
|
Example, the second time we launch the pipeline "best pipeline", we rename it to "best pipeline #2"
|
||||||
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
||||||
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
|
||||||
- Callable: A function called on node failure. Takes as parameters:
|
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
||||||
the PipelineController instance, the PipelineController.Node that failed and an int
|
- Callable: A function called on node failure. Takes as parameters:
|
||||||
representing the number of previous retries for the node that failed
|
the PipelineController instance, the PipelineController.Node that failed and an int
|
||||||
The function must return a `bool`: True if the node should be retried and False otherwise.
|
representing the number of previous retries for the node that failed.
|
||||||
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
The function must return ``True`` if the node should be retried and ``False`` otherwise.
|
||||||
By default, if this callback is not specified, the function will be retried the number of
|
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
||||||
times indicated by `retry_on_failure`.
|
By default, if this callback is not specified, the function will be retried the number of
|
||||||
|
times indicated by `retry_on_failure`.
|
||||||
|
|
||||||
.. code-block:: py
|
.. code-block:: py
|
||||||
|
|
||||||
@ -391,18 +392,14 @@ class PipelineController(object):
|
|||||||
The current step in the pipeline will be sent for execution only after all the parent nodes
|
The current step in the pipeline will be sent for execution only after all the parent nodes
|
||||||
have been executed successfully.
|
have been executed successfully.
|
||||||
:param parameter_override: Optional parameter overriding dictionary.
|
:param parameter_override: Optional parameter overriding dictionary.
|
||||||
The dict values can reference a previously executed step using the following form '${step_name}'
|
The dict values can reference a previously executed step using the following form '${step_name}'. Examples:
|
||||||
Examples:
|
|
||||||
- Artifact access
|
- Artifact access ``parameter_override={'Args/input_file': '${<step_name>.artifacts.<artifact_name>.url}' }``
|
||||||
parameter_override={'Args/input_file': '${<step_name>.artifacts.<artifact_name>.url}' }
|
- Model access (last model used) ``parameter_override={'Args/input_file': '${<step_name>.models.output.-1.url}' }``
|
||||||
- Model access (last model used)
|
- Parameter access ``parameter_override={'Args/input_file': '${<step_name>.parameters.Args/input_file}' }``
|
||||||
parameter_override={'Args/input_file': '${<step_name>.models.output.-1.url}' }
|
- Pipeline Task argument (see `Pipeline.add_parameter`) ``parameter_override={'Args/input_file': '${pipeline.<pipeline_parameter>}' }``
|
||||||
- Parameter access
|
- Task ID ``parameter_override={'Args/input_file': '${stage3.id}' }``
|
||||||
parameter_override={'Args/input_file': '${<step_name>.parameters.Args/input_file}' }
|
|
||||||
- Pipeline Task argument (see `Pipeline.add_parameter`)
|
|
||||||
parameter_override={'Args/input_file': '${pipeline.<pipeline_parameter>}' }
|
|
||||||
- Task ID
|
|
||||||
parameter_override={'Args/input_file': '${stage3.id}' }
|
|
||||||
:param configuration_overrides: Optional, override Task configuration objects.
|
:param configuration_overrides: Optional, override Task configuration objects.
|
||||||
Expected dictionary of configuration object name and configuration object content.
|
Expected dictionary of configuration object name and configuration object content.
|
||||||
Examples:
|
Examples:
|
||||||
@ -410,19 +407,13 @@ class PipelineController(object):
|
|||||||
{'General': 'configuration file content'}
|
{'General': 'configuration file content'}
|
||||||
{'OmegaConf': YAML.dumps(full_hydra_dict)}
|
{'OmegaConf': YAML.dumps(full_hydra_dict)}
|
||||||
:param task_overrides: Optional task section overriding dictionary.
|
:param task_overrides: Optional task section overriding dictionary.
|
||||||
The dict values can reference a previously executed step using the following form '${step_name}'
|
The dict values can reference a previously executed step using the following form '${step_name}'. Examples:
|
||||||
Examples:
|
|
||||||
- get the latest commit from a specific branch
|
- get the latest commit from a specific branch ``task_overrides={'script.version_num': '', 'script.branch': 'main'}``
|
||||||
task_overrides={'script.version_num': '', 'script.branch': 'main'}
|
- match git repository branch to a previous step ``task_overrides={'script.branch': '${stage1.script.branch}', 'script.version_num': ''}``
|
||||||
- match git repository branch to a previous step
|
- change container image ``task_overrides={'container.image': 'nvidia/cuda:11.6.0-devel-ubuntu20.04', 'container.arguments': '--ipc=host'}``
|
||||||
task_overrides={'script.branch': '${stage1.script.branch}', 'script.version_num': ''}
|
- match container image to a previous step ``task_overrides={'container.image': '${stage1.container.image}'}``
|
||||||
- change container image
|
- reset requirements (the agent will use the "requirements.txt" inside the repo) ``task_overrides={'script.requirements.pip': ""}``
|
||||||
task_overrides={'container.image': 'nvidia/cuda:11.6.0-devel-ubuntu20.04',
|
|
||||||
'container.arguments': '--ipc=host'}
|
|
||||||
- match container image to a previous step
|
|
||||||
task_overrides={'container.image': '${stage1.container.image}'}
|
|
||||||
- reset requirements (the agent will use the "requirements.txt" inside the repo)
|
|
||||||
task_overrides={'script.requirements.pip': ""}
|
|
||||||
:param execution_queue: Optional, the queue to use for executing this specific step.
|
:param execution_queue: Optional, the queue to use for executing this specific step.
|
||||||
If not provided, the task will be sent to the default execution queue, as defined on the class
|
If not provided, the task will be sent to the default execution queue, as defined on the class
|
||||||
:param monitor_metrics: Optional, log the step's metrics on the pipeline Task.
|
:param monitor_metrics: Optional, log the step's metrics on the pipeline Task.
|
||||||
@ -497,14 +488,15 @@ class PipelineController(object):
|
|||||||
:param base_task_factory: Optional, instead of providing a pre-existing Task,
|
:param base_task_factory: Optional, instead of providing a pre-existing Task,
|
||||||
provide a Callable function to create the Task (returns Task object)
|
provide a Callable function to create the Task (returns Task object)
|
||||||
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
||||||
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
|
||||||
- Callable: A function called on node failure. Takes as parameters:
|
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
||||||
the PipelineController instance, the PipelineController.Node that failed and an int
|
- Callable: A function called on node failure. Takes as parameters:
|
||||||
representing the number of previous retries for the node that failed
|
the PipelineController instance, the PipelineController.Node that failed and an int
|
||||||
The function must return a `bool`: True if the node should be retried and False otherwise.
|
representing the number of previous retries for the node that failed.
|
||||||
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
The function must return ``True`` if the node should be retried and ``False`` otherwise.
|
||||||
By default, if this callback is not specified, the function will be retried the number of
|
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
||||||
times indicated by `retry_on_failure`.
|
By default, if this callback is not specified, the function will be retried the number of
|
||||||
|
times indicated by `retry_on_failure`.
|
||||||
|
|
||||||
.. code-block:: py
|
.. code-block:: py
|
||||||
|
|
||||||
@ -759,16 +751,16 @@ class PipelineController(object):
|
|||||||
was already executed. If it was found, use it instead of launching a new Task.
|
was already executed. If it was found, use it instead of launching a new Task.
|
||||||
Default: False, a new cloned copy of base_task is always used.
|
Default: False, a new cloned copy of base_task is always used.
|
||||||
Notice: If the git repo reference does not have a specific commit ID, the Task will never be used.
|
Notice: If the git repo reference does not have a specific commit ID, the Task will never be used.
|
||||||
|
|
||||||
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
||||||
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
|
||||||
- Callable: A function called on node failure. Takes as parameters:
|
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
||||||
the PipelineController instance, the PipelineController.Node that failed and an int
|
- Callable: A function called on node failure. Takes as parameters:
|
||||||
representing the number of previous retries for the node that failed
|
the PipelineController instance, the PipelineController.Node that failed and an int
|
||||||
The function must return a `bool`: True if the node should be retried and False otherwise.
|
representing the number of previous retries for the node that failed.
|
||||||
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
The function must return ``True`` if the node should be retried and ``False`` otherwise.
|
||||||
By default, if this callback is not specified, the function will be retried the number of
|
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
||||||
times indicated by `retry_on_failure`.
|
By default, if this callback is not specified, the function will be retried the number of
|
||||||
|
times indicated by `retry_on_failure`.
|
||||||
|
|
||||||
.. code-block:: py
|
.. code-block:: py
|
||||||
|
|
||||||
@ -972,9 +964,9 @@ class PipelineController(object):
|
|||||||
:param configuration: The configuration. This is usually the configuration used in the model training process.
|
:param configuration: The configuration. This is usually the configuration used in the model training process.
|
||||||
Specify one of the following:
|
Specify one of the following:
|
||||||
|
|
||||||
- A dictionary/list - A dictionary containing the configuration. ClearML stores the configuration in
|
- A dictionary/list - A dictionary containing the configuration. ClearML stores the configuration in
|
||||||
the **ClearML Server** (backend), in a HOCON format (JSON-like format) which is editable.
|
the **ClearML Server** (backend), in a HOCON format (JSON-like format) which is editable.
|
||||||
- A ``pathlib2.Path`` string - A path to the configuration file. ClearML stores the content of the file.
|
- A ``pathlib2.Path`` string - A path to the configuration file. ClearML stores the content of the file.
|
||||||
A local path must be relative path. When executing a pipeline remotely in a worker, the contents brought
|
A local path must be relative path. When executing a pipeline remotely in a worker, the contents brought
|
||||||
from the **ClearML Server** (backend) overwrites the contents of the file.
|
from the **ClearML Server** (backend) overwrites the contents of the file.
|
||||||
|
|
||||||
@ -1174,7 +1166,7 @@ class PipelineController(object):
|
|||||||
def is_successful(self, fail_on_step_fail=True, fail_condition="all"):
|
def is_successful(self, fail_on_step_fail=True, fail_condition="all"):
|
||||||
# type: (bool, str) -> bool
|
# type: (bool, str) -> bool
|
||||||
"""
|
"""
|
||||||
Evaluate whether or not the pipeline is successful
|
Evaluate whether the pipeline is successful.
|
||||||
|
|
||||||
:param fail_on_step_fail: If True (default), evaluate the pipeline steps' status to assess if the pipeline
|
:param fail_on_step_fail: If True (default), evaluate the pipeline steps' status to assess if the pipeline
|
||||||
is successful. If False, only evaluate the controller
|
is successful. If False, only evaluate the controller
|
||||||
@ -3087,14 +3079,15 @@ class PipelineDecorator(PipelineController):
|
|||||||
:param add_run_number: If True (default), add the run number of the pipeline to the pipeline name.
|
:param add_run_number: If True (default), add the run number of the pipeline to the pipeline name.
|
||||||
Example, the second time we launch the pipeline "best pipeline", we rename it to "best pipeline #2"
|
Example, the second time we launch the pipeline "best pipeline", we rename it to "best pipeline #2"
|
||||||
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
||||||
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
|
||||||
- Callable: A function called on node failure. Takes as parameters:
|
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
||||||
the PipelineController instance, the PipelineController.Node that failed and an int
|
- Callable: A function called on node failure. Takes as parameters:
|
||||||
representing the number of previous retries for the node that failed
|
the PipelineController instance, the PipelineController.Node that failed and an int
|
||||||
The function must return a `bool`: True if the node should be retried and False otherwise.
|
representing the number of previous retries for the node that failed.
|
||||||
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
The function must return ``True`` if the node should be retried and ``False`` otherwise.
|
||||||
By default, if this callback is not specified, the function will be retried the number of
|
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
||||||
times indicated by `retry_on_failure`.
|
By default, if this callback is not specified, the function will be retried the number of
|
||||||
|
times indicated by `retry_on_failure`.
|
||||||
|
|
||||||
.. code-block:: py
|
.. code-block:: py
|
||||||
|
|
||||||
@ -3947,22 +3940,25 @@ class PipelineDecorator(PipelineController):
|
|||||||
pass
|
pass
|
||||||
|
|
||||||
Parameters would be stored as:
|
Parameters would be stored as:
|
||||||
- paramA: sectionA/paramA
|
|
||||||
- paramB: sectionB/paramB
|
- paramA: sectionA/paramA
|
||||||
- paramC: sectionB/paramC
|
- paramB: sectionB/paramB
|
||||||
- paramD: Args/paramD
|
- paramC: sectionB/paramC
|
||||||
|
- paramD: Args/paramD
|
||||||
|
|
||||||
:param start_controller_locally: If True, start the controller on the local machine. The steps will run
|
:param start_controller_locally: If True, start the controller on the local machine. The steps will run
|
||||||
remotely if `PipelineDecorator.run_locally` or `PipelineDecorator.debug_pipeline` are not called.
|
remotely if `PipelineDecorator.run_locally` or `PipelineDecorator.debug_pipeline` are not called.
|
||||||
Default: False
|
Default: False
|
||||||
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
:param retry_on_failure: Integer (number of retries) or Callback function that returns True to allow a retry
|
||||||
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
|
||||||
- Callable: A function called on node failure. Takes as parameters:
|
- Integer: In case of node failure, retry the node the number of times indicated by this parameter.
|
||||||
the PipelineController instance, the PipelineController.Node that failed and an int
|
- Callable: A function called on node failure. Takes as parameters:
|
||||||
representing the number of previous retries for the node that failed
|
the PipelineController instance, the PipelineController.Node that failed and an int
|
||||||
The function must return a `bool`: True if the node should be retried and False otherwise.
|
representing the number of previous retries for the node that failed.
|
||||||
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
The function must return ``True`` if the node should be retried and ``False`` otherwise.
|
||||||
By default, if this callback is not specified, the function will be retried the number of
|
If True, the node will be re-queued and the number of retries left will be decremented by 1.
|
||||||
times indicated by `retry_on_failure`.
|
By default, if this callback is not specified, the function will be retried the number of
|
||||||
|
times indicated by `retry_on_failure`.
|
||||||
|
|
||||||
.. code-block:: py
|
.. code-block:: py
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user