Add Artifacts fundamentals page (#970)

2025-06-26 18:17:44 +00:00 · 2024-11-21 09:47:55 +02:00 · 2024-11-21 09:47:55 +02:00 · 24cf6a06f0
commit 24cf6a06f0
parent 055762e101
7 changed files with 128 additions and 66 deletions
--- a/docs/fundamentals/artifacts.md
+++ b/docs/fundamentals/artifacts.md
@ -1,78 +1,58 @@
 ---
-title: Models
+title: Artifacts
 ---

-ClearML supports tracking, updating, and visualizing models.
+**Artifacts** are objects associated with ClearML [tasks](task.md) that are logged to ClearML, so they can later be 
+easily accessed, modified, and used. 

-Models are stored in ClearML as experiment artifacts, but unlike other artifacts that are dependent on their creating 
-task, models are independent entities with their own unique ID. Models can be accessed directly with a model object or 
-indirectly via their creating task. This property makes Models a standalone entry that can be used as an artifactory 
-interface.
+Task artifacts support built-in serialization for a wide range of object types, such as:
+* Numpy arrays (`.npz`)
+* Pandas DataFrames
+* PIL images (converted to `.jpg`)
+* Files and folders
+* Python objects
+* and more

-## Automatically Logging Models 
+ClearML also logs your tasks' input and output models as well as interim model checkpoints. Model artifacts also have 
+unique ClearML Model IDs (see [Models](models.md)).

-Once integrated into code, ClearML automatically logs and tracks models and any snapshots created by the following 
-frameworks:
-* [TensorFlow](../integrations/tensorflow.md)
-* [Keras](../integrations/keras.md)
-* [PyTorch](../integrations/pytorch.md)
-* [scikit-learn](../integrations/scikit_learn.md) (only using joblib)
-* [XGBoost](../integrations/xgboost.md) (only using joblib)
-* [Fast.ai](../integrations/fastai.md)
-* [MegEngine](../integrations/megengine.md)
-* [CatBoost](../integrations/catboost.md)
-* [MONAI](../integrations/monai.md)
+Artifacts allow you to:
+* **Track Task Inputs**: Record non source-controlled data to reproduce your workflows.
+* **Compare Outputs**: Easily access model snapshots.
+* **Build Elaborate Workflows**: Implement pipelines by using the outputs of one task as inputs to another (e.g. a data 
+cleaning task logs its clean dataset for use by a subsequent training task).

-When a supported framework loads a weights file, the running task will be automatically updated, with its input model 
-pointing directly to the original training task's model.
+## Logging Artifacts
+ClearML automatically logs artifacts created by popular frameworks, including TensorFlow and PyTorch. See [supported frameworks](../clearml_sdk/task_sdk.md#automatic-logging).

-## Manually Logging Models
+You can also log any other object using [`Task.upload_artifact()`](../references/sdk/task.md#upload_)artifact. See 
+the [Artifacts Reporting](../guides/reporting/artifacts.md) example for details.

-### Output Models
+ClearML can be configured to upload artifacts to any supported types of storage, which include local and shared folders, 
+AWS S3 buckets, Google Cloud Storage, and Azure Storage (see [Storage](../integrations/storage.md)).

-ClearML stores training results as output models. The `OutputModel` object is instantiated with a task object as an 
-argument (see [`task`](../references/sdk/model_outputmodel.md) parameter), so it's automatically registered as the Task's 
-output model. Since OutputModel objects are connected to tasks, the models are traceable in experiments.
+## Updating Artifacts Dynamically

-Output models are read-write so weights can be updated throughout training. Additionally, users can specify a model's 
-network design and label enumeration. Once an output model is registered, it can be used as the input model for another 
-experiment.
+Clearml can automatically update artifacts as their contents change while your task is running through the use of 
+[`register_artifact()`](../references/sdk/task.md#register_artifact).

-The snapshots of manually uploaded models aren't automatically captured, but ClearML provides methods to update them 
-through a `Task` or `OutputModel` object.
+## Accessing Artifacts
+Task artifacts can be accessed by other tasks. To use an artifact, first retrieve the `Task` that created it. Then use 
+one of the following methods:
+* `get_local_copy()`: Caches the file for later use and returns its path.
+* `get()`: Directly retrieves the Python object associated with the artifact.

-### Input Models  
-
-ClearML provides flexibility for explicitly connecting input models and experimentation, including:
-
-* Importing pre-trained models from external sources such as Amazon AWS, GIT repositories, PyTorch, and TensorFlow
-* Using standalone models already registered in ClearML by previously run experiments
-* Defining your own input models in scripts
-  
-## Setting Upload Destination
-
-* ClearML automatically captures the storage path of Models created by supported frameworks. By default, it stores the 
-  local path they are saved to.
-* Upload destinations can be specified explicitly on a per OutputModel or per experiment basis. Alternatively, the upload 
-  destination of all OutputModels can be specified in the ClearML [configuration file](../configs/clearml_conf.md). 
+For more information, see [Using Artifacts](../clearml_sdk/task_sdk.md#using-artifacts).

 ## WebApp Interface
+Artifacts appear under the **ARTIFACTS** tab of a Task. Each artifact's location is displayed in the **FILE PATH** field:
+* **Locally stored artifacts**: Include an option to copy the artifact’s location for accessibility (since web 
+applications are prohibited from accessing the local disk for security reasons)
+* **Network stored artifacts**: Display a download action to retrieve files from URLs (e.g., `https://`, `s3://`).

-In the ClearML's web UI, model information can be located through a project's Model Table or through the model's creating 
-task.
-
-Models associated with a task appear in the task's **ARTIFACTS** tab. To see further model details, including design, 
-label enumeration, lineage, and general information, click the model name, which is a hyperlink to the 
-[model's detail page](../webapp/webapp_model_viewing.md).
-
-Models can also be accessed through their associated project's [Model Table](../webapp/webapp_model_table.md), where all 
-the models associated with a project are listed.
-
-![WebApp Model](../img/fundamentals_models.png) 
+![WebApp Artifacts section](../img/webapp_tracking_30.png)

 ## SDK Interface
-
-See the [Models SDK interface](../clearml_sdk/model_sdk.md) for an overview for using the most basic Pythonic methods of the model 
-classes. See a detailed list of all available methods in the [Model](../references/sdk/model_model.md), [OutputModel](../references/sdk/model_outputmodel.md), and [InputModel](../references/sdk/model_inputmodel.md) 
-reference pages. 
+See the [Artifacts](../clearml_sdk/task_sdk.md#artifacts) section in the Task SDK page for an overview of how to work 
+with ClearML Artifacts using Pythonic methods. 

--- a/docs/fundamentals/models.md
+++ b/docs/fundamentals/models.md
@ -0,0 +1,78 @@
+---
+title: Models
+---
+
+ClearML supports tracking, updating, and visualizing models.
+
+Models are stored in ClearML as experiment artifacts, but unlike other artifacts that are dependent on their creating 
+task, models are independent entities with their own unique ID. Models can be accessed directly with a model object or 
+indirectly via their creating task. This property makes Models a standalone entry that can be used as an artifactory 
+interface.
+
+## Automatically Logging Models 
+
+Once integrated into code, ClearML automatically logs and tracks models and any snapshots created by the following 
+frameworks:
+* [TensorFlow](../integrations/tensorflow.md)
+* [Keras](../integrations/keras.md)
+* [PyTorch](../integrations/pytorch.md)
+* [scikit-learn](../integrations/scikit_learn.md) (only using joblib)
+* [XGBoost](../integrations/xgboost.md) (only using joblib)
+* [Fast.ai](../integrations/fastai.md)
+* [MegEngine](../integrations/megengine.md)
+* [CatBoost](../integrations/catboost.md)
+* [MONAI](../integrations/monai.md)
+
+When a supported framework loads a weights file, the running task will be automatically updated, with its input model 
+pointing directly to the original training task's model.
+
+## Manually Logging Models
+
+### Output Models
+
+ClearML stores training results as output models. The `OutputModel` object is instantiated with a task object as an 
+argument (see [`task`](../references/sdk/model_outputmodel.md) parameter), so it's automatically registered as the Task's 
+output model. Since OutputModel objects are connected to tasks, the models are traceable in experiments.
+
+Output models are read-write so weights can be updated throughout training. Additionally, users can specify a model's 
+network design and label enumeration. Once an output model is registered, it can be used as the input model for another 
+experiment.
+
+The snapshots of manually uploaded models aren't automatically captured, but ClearML provides methods to update them 
+through a `Task` or `OutputModel` object.
+
+### Input Models  
+
+ClearML provides flexibility for explicitly connecting input models and experimentation, including:
+
+* Importing pre-trained models from external sources such as Amazon AWS, GIT repositories, PyTorch, and TensorFlow
+* Using standalone models already registered in ClearML by previously run experiments
+* Defining your own input models in scripts
+  
+## Setting Upload Destination
+
+* ClearML automatically captures the storage path of Models created by supported frameworks. By default, it stores the 
+  local path they are saved to.
+* Upload destinations can be specified explicitly on a per OutputModel or per experiment basis. Alternatively, the upload 
+  destination of all OutputModels can be specified in the ClearML [configuration file](../configs/clearml_conf.md). 
+
+## WebApp Interface
+
+In the ClearML's web UI, model information can be located through a project's Model Table or through the model's creating 
+task.
+
+Models associated with a task appear in the task's **ARTIFACTS** tab. To see further model details, including design, 
+label enumeration, lineage, and general information, click the model name, which is a hyperlink to the 
+[model's detail page](../webapp/webapp_model_viewing.md).
+
+Models can also be accessed through their associated project's [Model Table](../webapp/webapp_model_table.md), where all 
+the models associated with a project are listed.
+
+![WebApp Model](../img/fundamentals_models.png) 
+
+## SDK Interface
+
+See [the Models SDK interface](../clearml_sdk/model_sdk.md) for an overview for using the most basic Pythonic methods of the model 
+classes. See a detailed list of all available methods in the [Model](../references/sdk/model_model.md), [OutputModel](../references/sdk/model_outputmodel.md), and [InputModel](../references/sdk/model_inputmodel.md) 
+reference pages. 
+
--- a/docs/fundamentals/projects.md
+++ b/docs/fundamentals/projects.md
@ -2,7 +2,7 @@
 title: Projects
 ---

-Projects are contextual containers for [tasks](task.md) and [models](artifacts.md) (as well as [dataviews](../hyperdatasets/dataviews.md) 
+Projects are contextual containers for [tasks](task.md) and [models](models.md) (as well as [dataviews](../hyperdatasets/dataviews.md) 
 when Hyper-Datasets are enabled), providing a logical structure similar to file system folders. 
 An often useful method is to categorize components into projects according to models or objectives. 
 Grouping into projects helps in identifying tasks, models, and dataviews when queried.
--- a/docs/fundamentals/task.md
+++ b/docs/fundamentals/task.md
@ -58,7 +58,7 @@ The captured [execution output](../webapp/webapp_exp_track_visual.md#experiment-
 * [Scalars](../webapp/webapp_exp_track_visual.md#scalars)
 * [Plots](../webapp/webapp_exp_track_visual.md#plots)
 * [Debug samples](../webapp/webapp_exp_track_visual.md#debug-samples)
-* [Models](artifacts.md) 
+* [Models](models.md) 

 For a more in-depth description of each task section, see [Tracking Experiments and Visualizing Results](../webapp/webapp_exp_track_visual.md).

@ -92,7 +92,7 @@ ClearML provides methods to easily track files generated throughout your experim
 - and more!

 Most importantly, ClearML also logs experiments' input and output models as well as interim model snapshots (see 
-[Models](artifacts.md)).
+[Models](models.md)).

 #### Logging Artifacts 
 ClearML provides an explicit logging interface that supports manually reporting a variety of artifacts. Any type of 
--- a/docs/guides/frameworks/pytorch/model_updating.md
+++ b/docs/guides/frameworks/pytorch/model_updating.md
@ -68,7 +68,7 @@ model.update_design(config_dict=model_config_dict)

 ## Updating Models
 To update a model, use [`OutputModel.update_weights()`](../../../references/sdk/model_outputmodel.md#update_weights). 
-This uploads the model to the set storage destination (see [Setting Upload Destination](../../../fundamentals/artifacts.md#setting-upload-destination)), 
+This uploads the model to the set storage destination (see [Setting Upload Destination](../../../fundamentals/models.md#setting-upload-destination)), 
 and registers that location to the task as the output model. 

 ```python
--- a/docs/webapp/settings/webapp_settings_access_rules.md
+++ b/docs/webapp/settings/webapp_settings_access_rules.md
@ -11,7 +11,7 @@ service accounts, and/or user groups have access permissions to the following wo
 
 * [Projects](../../fundamentals/projects.md)
 * [Tasks](../../fundamentals/task.md) 
-* [Models](../../fundamentals/artifacts.md)
+* [Models](../../fundamentals/models.md)
 * [Dataviews](../../hyperdatasets/dataviews.md) 
 * [Datasets](../../hyperdatasets/dataset.md)
 * [Queues](../../fundamentals/agents_and_queues.md#what-is-a-queue) 
--- a/sidebars.js
+++ b/sidebars.js
@ -33,8 +33,12 @@ module.exports = {
                ]
            }        
        ]}]},
-        {'ClearML Fundamentals': ['fundamentals/projects', 'fundamentals/task', 'fundamentals/hyperparameters', 'fundamentals/artifacts', 'fundamentals/logger', 'fundamentals/agents_and_queues',
-            'fundamentals/hpo']},
+        {'ClearML Fundamentals': [
+            'fundamentals/projects', 'fundamentals/task', 'fundamentals/hyperparameters',
+            'fundamentals/artifacts', 'fundamentals/models', 'fundamentals/logger', 'fundamentals/agents_and_queues',
+            'fundamentals/hpo'
+            ]
+        },
        {
            type: 'category',
            collapsible: true,