From 054eb2ad543c5bb0d186c5282fc4df0cd88ad612 Mon Sep 17 00:00:00 2001 From: pollfly <75068813+pollfly@users.noreply.github.com> Date: Tue, 26 Dec 2023 15:50:12 +0200 Subject: [PATCH] Add dataset reporting info (#740) --- docs/clearml_data/clearml_data_sdk.md | 28 +++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/docs/clearml_data/clearml_data_sdk.md b/docs/clearml_data/clearml_data_sdk.md index 8f85627c..9e5ce96e 100644 --- a/docs/clearml_data/clearml_data_sdk.md +++ b/docs/clearml_data/clearml_data_sdk.md @@ -216,6 +216,34 @@ For example: dataset.remove_files(dataset_path="*.csv", recursive=True) ``` +## Dataset Preview + +Add informative metrics, plots, or media to the Dataset. Use [`Dataset.get_logger()`](../references/sdk/dataset.md#get_logger) +to access the dataset's logger object, then add any additional information to the dataset, using the methods +available with a [logger](../references/sdk/logger.md) object. + +You can add some dataset summaries (like [table reporting](../references/sdk/logger.md#report_table)) to create a preview +of the data stored for better visibility, or attach any statistics generated by the data ingestion process. + +For example: + +```python +# Attach a table to the dataset +dataset.get_logger().report_table( + title="Raw Dataset Metadata", series="Raw Dataset Metadata", csv="path/to/csv" +) + +# Attach a historgram to the table +dataset.get_logger().report_histogram( + title="Class distribution", + series="Class distribution", + values=histogram_data, + iteration=0, + xlabels=histogram_data.index.tolist(), + yaxis="Number of samples", +) +``` + ## Uploading Files To upload the dataset files to network storage, use the [`Dataset.upload`](../references/sdk/dataset.md#upload) method.