clearml-docs/docs/guides/reporting/artifacts.md

131 lines
4.2 KiB
Markdown
Raw Normal View History

2021-05-13 23:48:51 +00:00
---
title: Artifacts Reporting
---
The [artifacts.py](https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py) example demonstrates
uploading objects (other than models) to storage as experiment artifacts.
These artifacts include:
* Pandas DataFrames
* Local files, dictionaries
* Folders
* Numpy objects
* Image files
Artifacts can be uploaded and dynamically tracked, or uploaded without tracking.
2022-03-13 13:07:06 +00:00
Configure ClearML for uploading artifacts to any of the supported types of storage, which include local and shared folders,
2021-10-06 12:50:00 +00:00
S3 buckets, Google Cloud Storage, and Azure Storage ([debug sample storage](../../references/sdk/logger.md#set_default_upload_destination)
2022-03-13 13:07:06 +00:00
is different). Configure ClearML in any of the following ways:
2021-05-13 23:48:51 +00:00
2024-03-06 13:00:50 +00:00
* In the configuration file, set [`default_output_uri`](../../configs/clearml_conf.md#config_default_output_uri).
2021-05-13 23:48:51 +00:00
* In code, when [initializing a Task](../../references/sdk/task.md#taskinit), use the `output_uri` parameter.
* In the **ClearML Web UI**, when [modifying an experiment](../../webapp/webapp_exp_tuning.md#output-destination).
2023-09-04 12:40:42 +00:00
When the script runs, it creates an experiment named `artifacts example` in the `examples` project.
2021-05-13 23:48:51 +00:00
2022-03-13 13:07:06 +00:00
ClearML reports artifacts in the **ClearML Web UI** **>** experiment details **>** **ARTIFACTS** tab.
2021-05-13 23:48:51 +00:00
2024-03-06 13:00:50 +00:00
![Experiment artifacts](../../img/examples_reporting_03.png)
2021-05-13 23:48:51 +00:00
2021-09-09 10:17:46 +00:00
## Dynamically Tracked Artifacts
2021-05-13 23:48:51 +00:00
2024-03-06 13:00:50 +00:00
ClearML supports uploading and dynamically tracking Pandas DataFrames. Use [`Task.register_artifact()`](../../references/sdk/task.md#register_artifact)
to add a DataFrame to a task. If the DataFrame is modified, ClearML will automatically update the changes.
2021-05-13 23:48:51 +00:00
For example:
2021-12-14 13:12:30 +00:00
```python
df = pd.DataFrame(
{
'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]
},
index=['falcon', 'dog', 'spider', 'fish']
)
# Register Pandas object as artifact to watch
# (it will be monitored in the background and automatically synced and uploaded)
2024-03-06 13:00:50 +00:00
task.register_artifact(
name='train',
artifact=df,
metadata={'counting': 'legs', 'max legs': 68}
)
2021-12-14 13:12:30 +00:00
```
2021-05-13 23:48:51 +00:00
2024-03-06 13:00:50 +00:00
By modifying the artifact, and calling [`Task.get_registered_artifacts()`](../../references/sdk/task.md#get_registered_artifacts)
to retrieve it, you can see ClearML tracking the changes:
2021-05-13 23:48:51 +00:00
2021-12-14 13:12:30 +00:00
```python
# change the artifact object
df.sample(frac=0.5, replace=True, random_state=1)
# or access it from anywhere using the Task's get_registered_artifacts()
Task.current_task().get_registered_artifacts()['train'].sample(frac=0.5, replace=True, random_state=1)
```
2021-05-13 23:48:51 +00:00
2021-09-09 10:17:46 +00:00
## Artifacts Without Tracking
2021-05-13 23:48:51 +00:00
2022-03-13 13:07:06 +00:00
ClearML supports several types of objects that can be uploaded and are not tracked. Use the [Task.upload_artifact](../../references/sdk/task.md#upload_artifact)
2021-05-13 23:48:51 +00:00
method.
Artifacts without tracking include:
* Pandas DataFrames
* Local files
* Dictionaries (stored as a JSONs)
* Numpy objects (stored as NPZ files)
* Image files (stored as PNG files)
* Folders (stored as a ZIP files)
* Wildcards (stored as a ZIP files)
### Pandas DataFrames
2021-12-14 13:12:30 +00:00
```python
# add and upload pandas.DataFrame (onetime snapshot of the object)
2024-03-06 13:00:50 +00:00
task.upload_artifact(name='Pandas', artifact_object=df)
2021-12-14 13:12:30 +00:00
```
2021-05-13 23:48:51 +00:00
2021-09-09 10:17:46 +00:00
### Local Files
2021-05-13 23:48:51 +00:00
2021-12-14 13:12:30 +00:00
```python
# add and upload local file artifact
task.upload_artifact(
2024-03-06 13:00:50 +00:00
name='local file',
2021-12-14 13:12:30 +00:00
artifact_object=os.path.join(
'data_samples',
'dancing.jpg'
)
)
```
2021-05-13 23:48:51 +00:00
### Dictionaries
2021-12-14 13:12:30 +00:00
```python
2022-12-26 09:08:10 +00:00
# add and upload dictionary stored as JSON
2024-03-06 13:00:50 +00:00
task.upload_artifact(name='dictionary', artifact_object=df.to_dict())
2021-12-14 13:12:30 +00:00
```
2021-09-09 10:17:46 +00:00
### Numpy Objects
2021-12-14 13:12:30 +00:00
```python
# add and upload Numpy Object (stored as .npz file)
2024-03-06 13:00:50 +00:00
task.upload_artifact(name='Numpy Eye', artifact_object=np.eye(100, 100))
2021-12-14 13:12:30 +00:00
```
2021-09-09 10:17:46 +00:00
### Image Files
2021-12-14 13:12:30 +00:00
```python
# add and upload Image (stored as .png file)
im = Image.open(os.path.join('data_samples', 'dancing.jpg'))
2024-03-06 13:00:50 +00:00
task.upload_artifact(name='pillow_image', artifact_object=im)
2021-12-14 13:12:30 +00:00
```
2021-05-13 23:48:51 +00:00
### Folders
2021-12-14 13:12:30 +00:00
```python
# add and upload a folder, artifact_object should be the folder path
2024-03-06 13:00:50 +00:00
task.upload_artifact(name='local folder', artifact_object=os.path.join('data_samples'))
2021-12-14 13:12:30 +00:00
```
2021-05-13 23:48:51 +00:00
### Wildcards
2021-12-14 13:12:30 +00:00
```python
# add and upload a wildcard
2024-03-06 13:00:50 +00:00
task.upload_artifact(name='wildcard jpegs', artifact_object=os.path.join('data_samples', '*.jpg'))
2021-12-14 13:12:30 +00:00
```