From 95020aa759aeab1115b5af20f6eeac60e89eaab5 Mon Sep 17 00:00:00 2001
From: pollfly <75068813+pollfly@users.noreply.github.com>
Date: Wed, 1 Sep 2021 12:48:30 +0300
Subject: [PATCH] Rewrite ClearML Data docs (#56)

---
 docs/clearml_data.md | 269 +++++++++++++++++++++++--------------------
 1 file changed, 144 insertions(+), 125 deletions(-)
diff --git a/docs/clearml_data.md b/docs/clearml_data.md
index e2838475..6cc78ea8 100644
--- a/docs/clearml_data.md
+++ b/docs/clearml_data.md
@@ -12,24 +12,34 @@ ClearML Data Management solves two important challenges:
 **We believe Data is not code**. It should not be stored in a git tree, because progress on datasets is not always linear.
 Moreover, it can be difficult and inefficient to find on a git tree the commit associated with a certain version of a dataset.
 
-A `clearml-data` dataset is a collection of files, stored on a central storage location (S3 \ GS \ Azure \ Network Storage).
+A `clearml-data` dataset is a collection of files, stored on a central storage location (S3 / GS / Azure / Network Storage).
 Datasets can be set up to inherit from other datasets, so data lineages can be created,
-and users can track when and how their data changes.<br/>
-Dataset changes are stored using differentiable storage, meaning a version will store the change-set from its previous dataset parents
+and users can track when and how their data changes.
+
+Dataset changes are stored using differentiable storage, meaning a version will store the change-set from its previous dataset parents.
 
 Local copies of datasets are always cached, so the same data never needs to be downloaded twice.
-When a dataset is pulled it will automatically pull all parent datasets and merge them into one output folder for you to work with
+When a dataset is pulled it will automatically pull all parent datasets and merge them into one output folder for you to work with.
 
-ClearML-data offers two interfaces:
+ClearML Data offers two interfaces:
 - `clearml-data` - CLI utility for creating, uploading, and managing datasets.
 - `clearml.Dataset` - A python interface for creating, retrieving, managing, and using datasets.
 
-## Creating a Dataset
+## Setup
+
+`clearml-data` comes built-in with our `clearml` python package! Just check out the [getting started](getting_started/ds/ds_first_steps.md) guide for more info!
+
+## Workflow 
+Below is an example of a workflow using ClearML Data's command line tool to create a dataset and inegrating the dataset into code
+using ClearML Data's python interface. 
+
+### Creating a Dataset
 
 Using the `clearml-data` CLI, users can create datasets using the following commands:
 ```bash
 clearml-data create --project dataset_example --name initial_version
 clearml-data add --files data_folder
+clearml-data close
 ```
 
 The commands will do the following:
@@ -40,13 +50,15 @@ The commands will do the following:
 
 1. All the files from the "data_folder" folder will be added to the dataset and uploaded
 by default to the [ClearML server](deploying_clearml/clearml_server.md).
+   
+1. The dataset will be finalized, making it immutable and ready to be consumed. 
 
 :::note
 `clearml-data` is stateful and remembers the last created dataset so there's no need to specify a specific dataset ID unless
 we want to work on another dataset.
 :::
 
-## Using a Dataset
+### Using a Dataset
 Now in our python code, we can access and use the created dataset from anywhere:
 
 ```python
@@ -60,17 +72,11 @@ We have all our files in the same folder structure under `local_path`, it is tha
 The next step is to set the dataset_id as a parameter for our code and voilà! We can now train on any dataset we have in
 the system.
 
-## Setup
+## CLI Options
 
-`clearml-data` comes built-in with our `clearml` python package! Just check out the [getting started](getting_started/ds/ds_first_steps.md) guide for more info!
+It's possible to manage datasets (create / modify / upload / delete) with the `clearml-data` command line tool.
 
-## Usage
-
-### CLI
-
-It's possible to manage datasets (create \ modify \ upload \ delete) with the `clearml-data` command line tool.
-
-#### Creating a Dataset
+### Creating a Dataset
 ```bash
 clearml-data create --project <project_name> --name <dataset_name> --parents <existing_dataset_id>`
 ```
@@ -80,10 +86,10 @@ Creates a new dataset. <br/>
 
 |Name|Description|Optional|
 |---|---|---|
-|name |Dataset's name| <img src="/docs/latest/icons/ico-optional-no.svg" className="icon size-md center-md" /> |
-|project|Dataset's project| <img src="/docs/latest/icons/ico-optional-no.svg" className="icon size-md center-md" /> |
-|parents|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|tags |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|name |Dataset's name| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
+|project|Dataset's project| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
+|parents|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|tags |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
 :::important
 clearml-data works in a stateful mode so once a new dataset is created, the following commands
@@ -92,7 +98,7 @@ do not require the `--id` flag.
 
 <br/>
 
-#### Add Files to Dataset
+### Add Files
 ```bash
 clearml-data add --id <dataset_id> --files <filenames/folders_to_add>
 ```
@@ -102,15 +108,15 @@ It's possible to add individual files or complete folders.<br/>
 
 |Name|Description|Optional|
 |---|---|---|
-|id | Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|files|Files / folders to add. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json` | <img src="/docs/latest/icons/ico-optional-no.svg" className="icon size-md center-md" /> |
-|dataset-folder | Dataset base folder to add the files to in the dataset. Default: dataset root| <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|non-recursive | Disable recursive scan of files | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|verbose | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|id | Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|files|Files / folders to add. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
+|dataset-folder | Dataset base folder to add the files to in the dataset. Default: dataset root| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|non-recursive | Disable recursive scan of files | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|verbose | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
 <br/>
 
-#### Remove Files From Dataset
+### Remove Files
 ```bash
 clearml-data remove --id <dataset_id_to_remove_from> --files <filenames/folders_to_remove>
 ```
@@ -119,33 +125,14 @@ clearml-data remove --id <dataset_id_to_remove_from> --files <filenames/folders_
 
 |Name|Description|Optional|
 |---|---|---|
-|id | Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|files |  Files / folders to remove (wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`). Notice: file path is the path within the dataset, not the local path.| <img src="/docs/latest/icons/ico-optional-no.svg" className="icon size-md center-md" /> |
-|non-recursive | Disable recursive scan of files | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|verbose | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|id | Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|files |  Files / folders to remove (wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`). Notice: file path is the path within the dataset, not the local path.| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
+|non-recursive | Disable recursive scan of files | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|verbose | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
 <br/>
 
-#### Finalize Dataset
-```bash
-clearml-data close --id <dataset_id>
-```
-Finalizes the dataset and makes it ready to be consumed.
-It automatically uploads all files that were not previously uploaded.
-Once a dataset is finalized, it can no longer be modified.
-
-**Parameters**
-
-|Name|Description|Optional|
-|---|---|---|
-|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|storage| Remote storage to use for the dataset files. Default: files_server | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|disable-upload | Disable automatic upload when closing the dataset | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|verbose | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-
-<br/>
-
-#### Upload Dataset' Content
+### Upload Dataset Content
 ```bash
 clearml-data upload [--id <dataset_id>] [--storage <upload_destination>]
 ```
@@ -157,13 +144,32 @@ medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure
 
 |Name|Description|Optional|
 |---|---|---|
-|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|storage| Remote storage to use for the dataset files. Default: files_server | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|verbose | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|storage| Remote storage to use for the dataset files. Default: files_server | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|verbose | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
 <br/>
 
-#### Sync Local Folder
+### Finalize Dataset
+```bash
+clearml-data close --id <dataset_id>
+```
+Finalizes the dataset and makes it ready to be consumed.
+It automatically uploads all files that were not previously uploaded.
+Once a dataset is finalized, it can no longer be modified.
+
+**Parameters**
+
+|Name|Description|Optional|
+|---|---|---|
+|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|storage| Remote storage to use for the dataset files. Default: files_server | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|disable-upload | Disable automatic upload when closing the dataset | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|verbose | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+
+<br/>
+
+### Sync Local Folder
 ```
 clearml-data sync [--id <dataset_id] --folder <folder_location>  [--parents '<parent_id>']`
 ```
@@ -180,19 +186,19 @@ This command also uploads the data and finalizes the dataset automatically.
 
 |Name|Description|Optional|
 |---|---|---|
-|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
-|folder|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|<img src="/docs/latest/icons/ico-optional-no.svg" className="icon size-md center-md" />|
-|storage|Remote storage to use for the dataset files. Default: files_server |<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|parents|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|project|If creating a new dataset, specify the dataset's project name|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|name|If creating a new dataset, specify the dataset's name|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|tags|Dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|skip-close|Do not auto close dataset after syncing folders|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|verbose | Verbose reporting |<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
+|folder|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
+|storage|Remote storage to use for the dataset files. Default: files_server |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|parents|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|project|If creating a new dataset, specify the dataset's project name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|name|If creating a new dataset, specify the dataset's name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|tags|Dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|skip-close|Do not auto close dataset after syncing folders|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|verbose | Verbose reporting |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
 <br/>
 
-#### List Dataset Content
+### List Dataset Content
 ```bash
 clearml-data list [--id <dataset_id>]
 ```
@@ -201,15 +207,15 @@ clearml-data list [--id <dataset_id>]
 
 |Name|Description|Optional|
 |---|---|---|
-|id|Dataset ID whose contents will be shown (alternatively, use project / name combination). Default: previously accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|project|Specify dataset project name (if used instead of ID, dataset name is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|name|Specify dataset name (if used instead of ID, dataset project is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|filter|Filter files based on folder / wildcard. Multiple filters are supported. Example: `folder/date_*.json folder/sub-folder`|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|modified|Only list file changes (add / remove / modify) introduced in this version|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|id|Dataset ID whose contents will be shown (alternatively, use project / name combination). Default: previously accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|project|Specify dataset project name (if used instead of ID, dataset name is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|name|Specify dataset name (if used instead of ID, dataset project is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|filter|Filter files based on folder / wildcard. Multiple filters are supported. Example: `folder/date_*.json folder/sub-folder`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|modified|Only list file changes (add / remove / modify) introduced in this version|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
 <br/>
 
-####  Delete a Dataset
+###  Delete Dataset
 ```
 clearml-data delete [--id <dataset_id_to_delete>]
 ```
@@ -221,12 +227,12 @@ This does not work on datasets with children.
 
 |Name|Description|Optional|
 |---|---|---|
-|id|ID of dataset to be deleted. Default: previously created / accessed dataset that hasn't been finalized yet|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|force|Force dataset deletion even if other dataset versions depend on it|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />||
+|id|ID of dataset to be deleted. Default: previously created / accessed dataset that hasn't been finalized yet|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|force|Force dataset deletion even if other dataset versions depend on it|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />||
 
 <br/>
 
-#### Search for a Dataset
+### Search for a Dataset
 ```
 clearml-data search [--name <name>] [--project <project_name>] [--tags <tag>]
 ```
@@ -245,98 +251,111 @@ Datasets can be searched by project, name, ID, and tags.
 
 <br/>
 
-### Python API
+### Compare Two Datasets 
 
-All API commands should be imported with<br/>
-`from clearml import Dataset`
+```
+clearml-data compare [--source SOURCE] [--target TARGET] 
+```
 
+Compare two datasets (target vs. source). The command returns a comparison summary that looks like this:
 
-#### `Dataset.get(dataset_id=DS_ID).get_local_copy()`
-
-Returns a path to dataset in cache, and downloads it if it is not already in cache.
+```
+Comparison summary: 4 files removed, 3 files modified, 0 files added
+```
 
 **Parameters**
 
 |Name|Description|Optional|
 |---|---|---|
-|use_soft_links|If True, use soft links. Default: False on Windows, True on Posix systems|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|raise_on_error|If True, raise exception if dataset merging failed on any file|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|source|Source dataset id (used as baseline)|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
+|target|Target dataset id (compare against the source baseline dataset)|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
+|verbose|Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
-<br/>
+### Merge Datasets
 
-#### `Dataset.get(dataset_id=DS_ID).get_mutable_local_copy()`
+```
+clearml-data squash --name NAME --ids [IDS [IDS ...]] 
+```
 
-Downloads the dataset to a specific folder (non-cached). If the folder already has contents, specify whether to overwrite
-its contents with the dataset contents.
+Squash (merge) multiple datasets into a single dataset version.
 
 **Parameters**
 
 |Name|Description|Optional|
 |---|---|---|
-|target_folder|Local target folder for the writable copy of the dataset|<img src="/docs/latest/icons/ico-optional-no.svg" className="icon size-md center-md" />|
-|overwrite|If True, recursively delete the contents of the target folder before creating a copy of the dataset. If False (default) and target folder contains files, raise exception or return None|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|raise_on_error|If True, raise exception if dataset merging failed on any file|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|name|Create squashed dataset name|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
+|ids|Source dataset IDs to squash (merge down)|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
+|storage|Remote storage to use for the dataset files. Default: files_server |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|verbose|Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
-<br/>
+### Verify Dataset
 
-#### `Dataset.create()`
+```
+clearml-data verify [--id ID] [--folder FOLDER] 
+```
 
-Create a new dataset.
-
-Parent datasets can be specified, and the new dataset inherits all of its parent's content. Multiple dataset parents can
-be listed. Merging of parent datasets is done based on the list's order, where each parent can override overlapping files
-in the previous parent dataset.
+Verify that the dataset content matches the data from the local source.  
 
 **Parameters**
 
 |Name|Description|Optional|
 |---|---|---|
-|dataset_name|Name of the new dataset|<img src="/docs/latest/icons/ico-optional-no.svg" className="icon size-md center-md" />|
-|dataset_project|The project containing the dataset. If not specified, infer project name from parent datasets. If there is no parent dataset, then this value is required|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|parent_datasets|Expand a parent dataset by adding / removing files|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|use_current_task|If True, the dataset is created on the current Task. Default: False|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|id|Specify dataset ID. Default: previously created/accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|folder|Specify dataset local copy (if not provided the local cache folder will be verified)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|filesize| If True, only verify file size and skip hash checks (default: false)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|verbose|Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
-<br/>
+### Get a Dataset 
 
-#### `Dataset.add_files()`
+```
+clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite]
+```
 
-Add files or folder into the current dataset.
+Get a local copy of a dataset. By default, you get a read only cached folder, but you can get a mutable copy by using the 
+`--copy` flag. 
 
 **Parameters**
 
 |Name|Description|Optional|
 |---|---|---|
-|path|Add a folder / file to the dataset|<img src="/docs/latest/icons/ico-optional-no.svg" className="icon size-md center-md" />|
-|wildcard|Add only a specific set of files based on wildcard matching. Wildcard matching can be a single string or a list of wildcards, for example: `~/data/*.jpg`, `~/data/json`|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|local_base_folder|Files will be located based on their relative path from local_base_folder|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|dataset_path|Where in the dataset the folder / files should be located|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|recursive|If True, match all wildcard files recursively|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|verbose| If True, print to console files added / modified|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|id| Specify dataset ID. Default: previously created / accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|copy| Get a writable copy of the dataset to a specific output folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|link| Create a soft link (not supported on Windows) to a read-only cached folder containing the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|overwrite| If True, overwrite the target folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
+|verbose| Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
 
-<br/>
+### Publish a Dataset
+
+```
+clearml-data publish --id ID
+```
+
+Publish the dataset for public use. The dataset must be [finalized](#finalize-dataset) before it is published.
 
-#### `Dataset.upload()`
-Start file uploading, the function returns when all files are uploaded.
 
 **Parameters**
 
 |Name|Description|Optional|
 |---|---|---|
-|show_progress|If True, show upload progress bar|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|verbose|If True, print verbose progress report|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|output_url|Target storage for the compressed dataset (default: file server). Examples: `s3://bucket/data`, `gs://bucket/data` , `azure://bucket/data`, `/mnt/share/data` |<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|compression|Compression algorithm for the Zipped dataset file (default: ZIP_DEFLATED)|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+|id| The dataset task id to be published.|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
 
-<br/>
 
-#### `Dataset.finalize()`
-Closes the dataset and marks it as *Completed*. After a dataset has been closed, it can no longer be modified.
-Before closing a dataset, its files must first be uploaded.
 
-**Parameters**
+## Python API
 
-|Name|Description|Optional|
-|---|---|---|
-|verbose|If True, print verbose progress report|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
-|raise_on_error|If True, raise exception if dataset finalizing failed|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
+It's also possible to manage a dataset using ClearML Data's python interface. 
 
+All API commands should be imported with:
+
+```python
+from clearml import Dataset
+```
+
+See all API commands in the [Dataset](references/sdk/dataset.md) reference page.
+
+## Tutorials 
+
+Take a look at the ClearML Data tutorials:
+* [Dataset Management with CLI and SDK](guides/data%20management/data_man_cifar_classification)
+* [Dataset Management with CLI](guides/data%20management/data_man_simple)
+* [Folder Sync with CLI](guides/data%20management/data_man_folder_sync)