From 24cf6a06f05fc8963a8c5a43df96addc72a0673b Mon Sep 17 00:00:00 2001 From: pollfly <75068813+pollfly@users.noreply.github.com> Date: Thu, 21 Nov 2024 09:47:55 +0200 Subject: [PATCH] Add Artifacts fundamentals page (#970) --- docs/fundamentals/artifacts.md | 98 ++++++++----------- docs/fundamentals/models.md | 78 +++++++++++++++ docs/fundamentals/projects.md | 2 +- docs/fundamentals/task.md | 4 +- .../frameworks/pytorch/model_updating.md | 2 +- .../settings/webapp_settings_access_rules.md | 2 +- sidebars.js | 8 +- 7 files changed, 128 insertions(+), 66 deletions(-) create mode 100644 docs/fundamentals/models.md diff --git a/docs/fundamentals/artifacts.md b/docs/fundamentals/artifacts.md index 22e207c0..f9820d43 100644 --- a/docs/fundamentals/artifacts.md +++ b/docs/fundamentals/artifacts.md @@ -1,78 +1,58 @@ --- -title: Models +title: Artifacts --- -ClearML supports tracking, updating, and visualizing models. +**Artifacts** are objects associated with ClearML [tasks](task.md) that are logged to ClearML, so they can later be +easily accessed, modified, and used. -Models are stored in ClearML as experiment artifacts, but unlike other artifacts that are dependent on their creating -task, models are independent entities with their own unique ID. Models can be accessed directly with a model object or -indirectly via their creating task. This property makes Models a standalone entry that can be used as an artifactory -interface. +Task artifacts support built-in serialization for a wide range of object types, such as: +* Numpy arrays (`.npz`) +* Pandas DataFrames +* PIL images (converted to `.jpg`) +* Files and folders +* Python objects +* and more -## Automatically Logging Models +ClearML also logs your tasks' input and output models as well as interim model checkpoints. Model artifacts also have +unique ClearML Model IDs (see [Models](models.md)). -Once integrated into code, ClearML automatically logs and tracks models and any snapshots created by the following -frameworks: -* [TensorFlow](../integrations/tensorflow.md) -* [Keras](../integrations/keras.md) -* [PyTorch](../integrations/pytorch.md) -* [scikit-learn](../integrations/scikit_learn.md) (only using joblib) -* [XGBoost](../integrations/xgboost.md) (only using joblib) -* [Fast.ai](../integrations/fastai.md) -* [MegEngine](../integrations/megengine.md) -* [CatBoost](../integrations/catboost.md) -* [MONAI](../integrations/monai.md) +Artifacts allow you to: +* **Track Task Inputs**: Record non source-controlled data to reproduce your workflows. +* **Compare Outputs**: Easily access model snapshots. +* **Build Elaborate Workflows**: Implement pipelines by using the outputs of one task as inputs to another (e.g. a data +cleaning task logs its clean dataset for use by a subsequent training task). -When a supported framework loads a weights file, the running task will be automatically updated, with its input model -pointing directly to the original training task's model. +## Logging Artifacts +ClearML automatically logs artifacts created by popular frameworks, including TensorFlow and PyTorch. See [supported frameworks](../clearml_sdk/task_sdk.md#automatic-logging). -## Manually Logging Models +You can also log any other object using [`Task.upload_artifact()`](../references/sdk/task.md#upload_)artifact. See +the [Artifacts Reporting](../guides/reporting/artifacts.md) example for details. -### Output Models +ClearML can be configured to upload artifacts to any supported types of storage, which include local and shared folders, +AWS S3 buckets, Google Cloud Storage, and Azure Storage (see [Storage](../integrations/storage.md)). -ClearML stores training results as output models. The `OutputModel` object is instantiated with a task object as an -argument (see [`task`](../references/sdk/model_outputmodel.md) parameter), so it's automatically registered as the Task's -output model. Since OutputModel objects are connected to tasks, the models are traceable in experiments. +## Updating Artifacts Dynamically -Output models are read-write so weights can be updated throughout training. Additionally, users can specify a model's -network design and label enumeration. Once an output model is registered, it can be used as the input model for another -experiment. +Clearml can automatically update artifacts as their contents change while your task is running through the use of +[`register_artifact()`](../references/sdk/task.md#register_artifact). -The snapshots of manually uploaded models aren't automatically captured, but ClearML provides methods to update them -through a `Task` or `OutputModel` object. +## Accessing Artifacts +Task artifacts can be accessed by other tasks. To use an artifact, first retrieve the `Task` that created it. Then use +one of the following methods: +* `get_local_copy()`: Caches the file for later use and returns its path. +* `get()`: Directly retrieves the Python object associated with the artifact. -### Input Models - -ClearML provides flexibility for explicitly connecting input models and experimentation, including: - -* Importing pre-trained models from external sources such as Amazon AWS, GIT repositories, PyTorch, and TensorFlow -* Using standalone models already registered in ClearML by previously run experiments -* Defining your own input models in scripts - -## Setting Upload Destination - -* ClearML automatically captures the storage path of Models created by supported frameworks. By default, it stores the - local path they are saved to. -* Upload destinations can be specified explicitly on a per OutputModel or per experiment basis. Alternatively, the upload - destination of all OutputModels can be specified in the ClearML [configuration file](../configs/clearml_conf.md). +For more information, see [Using Artifacts](../clearml_sdk/task_sdk.md#using-artifacts). ## WebApp Interface +Artifacts appear under the **ARTIFACTS** tab of a Task. Each artifact's location is displayed in the **FILE PATH** field: +* **Locally stored artifacts**: Include an option to copy the artifact’s location for accessibility (since web +applications are prohibited from accessing the local disk for security reasons) +* **Network stored artifacts**: Display a download action to retrieve files from URLs (e.g., `https://`, `s3://`). -In the ClearML's web UI, model information can be located through a project's Model Table or through the model's creating -task. - -Models associated with a task appear in the task's **ARTIFACTS** tab. To see further model details, including design, -label enumeration, lineage, and general information, click the model name, which is a hyperlink to the -[model's detail page](../webapp/webapp_model_viewing.md). - -Models can also be accessed through their associated project's [Model Table](../webapp/webapp_model_table.md), where all -the models associated with a project are listed. - -![WebApp Model](../img/fundamentals_models.png) +![WebApp Artifacts section](../img/webapp_tracking_30.png) ## SDK Interface - -See the [Models SDK interface](../clearml_sdk/model_sdk.md) for an overview for using the most basic Pythonic methods of the model -classes. See a detailed list of all available methods in the [Model](../references/sdk/model_model.md), [OutputModel](../references/sdk/model_outputmodel.md), and [InputModel](../references/sdk/model_inputmodel.md) -reference pages. +See the [Artifacts](../clearml_sdk/task_sdk.md#artifacts) section in the Task SDK page for an overview of how to work +with ClearML Artifacts using Pythonic methods. diff --git a/docs/fundamentals/models.md b/docs/fundamentals/models.md new file mode 100644 index 00000000..2313cf3b --- /dev/null +++ b/docs/fundamentals/models.md @@ -0,0 +1,78 @@ +--- +title: Models +--- + +ClearML supports tracking, updating, and visualizing models. + +Models are stored in ClearML as experiment artifacts, but unlike other artifacts that are dependent on their creating +task, models are independent entities with their own unique ID. Models can be accessed directly with a model object or +indirectly via their creating task. This property makes Models a standalone entry that can be used as an artifactory +interface. + +## Automatically Logging Models + +Once integrated into code, ClearML automatically logs and tracks models and any snapshots created by the following +frameworks: +* [TensorFlow](../integrations/tensorflow.md) +* [Keras](../integrations/keras.md) +* [PyTorch](../integrations/pytorch.md) +* [scikit-learn](../integrations/scikit_learn.md) (only using joblib) +* [XGBoost](../integrations/xgboost.md) (only using joblib) +* [Fast.ai](../integrations/fastai.md) +* [MegEngine](../integrations/megengine.md) +* [CatBoost](../integrations/catboost.md) +* [MONAI](../integrations/monai.md) + +When a supported framework loads a weights file, the running task will be automatically updated, with its input model +pointing directly to the original training task's model. + +## Manually Logging Models + +### Output Models + +ClearML stores training results as output models. The `OutputModel` object is instantiated with a task object as an +argument (see [`task`](../references/sdk/model_outputmodel.md) parameter), so it's automatically registered as the Task's +output model. Since OutputModel objects are connected to tasks, the models are traceable in experiments. + +Output models are read-write so weights can be updated throughout training. Additionally, users can specify a model's +network design and label enumeration. Once an output model is registered, it can be used as the input model for another +experiment. + +The snapshots of manually uploaded models aren't automatically captured, but ClearML provides methods to update them +through a `Task` or `OutputModel` object. + +### Input Models + +ClearML provides flexibility for explicitly connecting input models and experimentation, including: + +* Importing pre-trained models from external sources such as Amazon AWS, GIT repositories, PyTorch, and TensorFlow +* Using standalone models already registered in ClearML by previously run experiments +* Defining your own input models in scripts + +## Setting Upload Destination + +* ClearML automatically captures the storage path of Models created by supported frameworks. By default, it stores the + local path they are saved to. +* Upload destinations can be specified explicitly on a per OutputModel or per experiment basis. Alternatively, the upload + destination of all OutputModels can be specified in the ClearML [configuration file](../configs/clearml_conf.md). + +## WebApp Interface + +In the ClearML's web UI, model information can be located through a project's Model Table or through the model's creating +task. + +Models associated with a task appear in the task's **ARTIFACTS** tab. To see further model details, including design, +label enumeration, lineage, and general information, click the model name, which is a hyperlink to the +[model's detail page](../webapp/webapp_model_viewing.md). + +Models can also be accessed through their associated project's [Model Table](../webapp/webapp_model_table.md), where all +the models associated with a project are listed. + +![WebApp Model](../img/fundamentals_models.png) + +## SDK Interface + +See [the Models SDK interface](../clearml_sdk/model_sdk.md) for an overview for using the most basic Pythonic methods of the model +classes. See a detailed list of all available methods in the [Model](../references/sdk/model_model.md), [OutputModel](../references/sdk/model_outputmodel.md), and [InputModel](../references/sdk/model_inputmodel.md) +reference pages. + diff --git a/docs/fundamentals/projects.md b/docs/fundamentals/projects.md index 295594f7..87ca4cde 100644 --- a/docs/fundamentals/projects.md +++ b/docs/fundamentals/projects.md @@ -2,7 +2,7 @@ title: Projects --- -Projects are contextual containers for [tasks](task.md) and [models](artifacts.md) (as well as [dataviews](../hyperdatasets/dataviews.md) +Projects are contextual containers for [tasks](task.md) and [models](models.md) (as well as [dataviews](../hyperdatasets/dataviews.md) when Hyper-Datasets are enabled), providing a logical structure similar to file system folders. An often useful method is to categorize components into projects according to models or objectives. Grouping into projects helps in identifying tasks, models, and dataviews when queried. diff --git a/docs/fundamentals/task.md b/docs/fundamentals/task.md index 20a2d3ba..54aeff4e 100644 --- a/docs/fundamentals/task.md +++ b/docs/fundamentals/task.md @@ -58,7 +58,7 @@ The captured [execution output](../webapp/webapp_exp_track_visual.md#experiment- * [Scalars](../webapp/webapp_exp_track_visual.md#scalars) * [Plots](../webapp/webapp_exp_track_visual.md#plots) * [Debug samples](../webapp/webapp_exp_track_visual.md#debug-samples) -* [Models](artifacts.md) +* [Models](models.md) For a more in-depth description of each task section, see [Tracking Experiments and Visualizing Results](../webapp/webapp_exp_track_visual.md). @@ -92,7 +92,7 @@ ClearML provides methods to easily track files generated throughout your experim - and more! Most importantly, ClearML also logs experiments' input and output models as well as interim model snapshots (see -[Models](artifacts.md)). +[Models](models.md)). #### Logging Artifacts ClearML provides an explicit logging interface that supports manually reporting a variety of artifacts. Any type of diff --git a/docs/guides/frameworks/pytorch/model_updating.md b/docs/guides/frameworks/pytorch/model_updating.md index cc7397a7..6d9471da 100644 --- a/docs/guides/frameworks/pytorch/model_updating.md +++ b/docs/guides/frameworks/pytorch/model_updating.md @@ -68,7 +68,7 @@ model.update_design(config_dict=model_config_dict) ## Updating Models To update a model, use [`OutputModel.update_weights()`](../../../references/sdk/model_outputmodel.md#update_weights). -This uploads the model to the set storage destination (see [Setting Upload Destination](../../../fundamentals/artifacts.md#setting-upload-destination)), +This uploads the model to the set storage destination (see [Setting Upload Destination](../../../fundamentals/models.md#setting-upload-destination)), and registers that location to the task as the output model. ```python diff --git a/docs/webapp/settings/webapp_settings_access_rules.md b/docs/webapp/settings/webapp_settings_access_rules.md index 7f40055e..869a1442 100644 --- a/docs/webapp/settings/webapp_settings_access_rules.md +++ b/docs/webapp/settings/webapp_settings_access_rules.md @@ -11,7 +11,7 @@ service accounts, and/or user groups have access permissions to the following wo * [Projects](../../fundamentals/projects.md) * [Tasks](../../fundamentals/task.md) -* [Models](../../fundamentals/artifacts.md) +* [Models](../../fundamentals/models.md) * [Dataviews](../../hyperdatasets/dataviews.md) * [Datasets](../../hyperdatasets/dataset.md) * [Queues](../../fundamentals/agents_and_queues.md#what-is-a-queue) diff --git a/sidebars.js b/sidebars.js index 83c8cfb1..f9526ce2 100644 --- a/sidebars.js +++ b/sidebars.js @@ -33,8 +33,12 @@ module.exports = { ] } ]}]}, - {'ClearML Fundamentals': ['fundamentals/projects', 'fundamentals/task', 'fundamentals/hyperparameters', 'fundamentals/artifacts', 'fundamentals/logger', 'fundamentals/agents_and_queues', - 'fundamentals/hpo']}, + {'ClearML Fundamentals': [ + 'fundamentals/projects', 'fundamentals/task', 'fundamentals/hyperparameters', + 'fundamentals/artifacts', 'fundamentals/models', 'fundamentals/logger', 'fundamentals/agents_and_queues', + 'fundamentals/hpo' + ] + }, { type: 'category', collapsible: true,