We have all our files in the same folder structure under `local_path`, it is that simple!<br/>
The next step is to set the dataset_id as a parameter for our code and voilà! We can now train on any dataset we have in
the system.
## Setup
`clearml-data` comes built-in with our `clearml` python package! Just check out the [getting started](getting_started/ds/ds_first_steps.md) guide for more info!
## Usage
### CLI
It's possible to manage datasets (create \ modify \ upload \ delete) with the `clearml-data` command line tool.
|parents|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/> |
|tags |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
:::important
clearml-data works in a stateful mode so once a new dataset is created, the following commands
|files|Files / folders to add. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json` | <imgsrc="/icons/ico-optional-no.svg"className="icon size-md center-md"/> |
|dataset-folder | Dataset base folder to add the files to in the dataset. Default: dataset root| <imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/> |
|files | Files / folders to remove (wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`). Notice: file path is the path within the dataset, not the local path.| <imgsrc="/icons/ico-optional-no.svg"className="icon size-md center-md"/> |
|storage| Remote storage to use for the dataset files. Default: files_server | <imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/> |
|disable-upload | Disable automatic upload when closing the dataset | <imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/> |
|storage| Remote storage to use for the dataset files. Default: files_server | <imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/> |
|folder|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|<imgsrc="/icons/ico-optional-no.svg"className="icon size-md center-md"/>|
|storage|Remote storage to use for the dataset files. Default: files_server |<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|parents|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|project|If creating a new dataset, specify the dataset's project name|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|name|If creating a new dataset, specify the dataset's name|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|tags|Dataset user tags|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|skip-close|Do not auto close dataset after syncing folders|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|id|Dataset ID whose contents will be shown (alternatively, use project / name combination). Default: previously accessed dataset|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|project|Specify dataset project name (if used instead of ID, dataset name is also required)|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|name|Specify dataset name (if used instead of ID, dataset project is also required)|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|filter|Filter files based on folder / wildcard. Multiple filters are supported. Example: `folder/date_*.json folder/sub-folder`|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|modified|Only list file changes (add / remove / modify) introduced in this version|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
<br/>
#### Delete a Dataset
```
clearml-data delete [--id <dataset_id_to_delete>]
```
Deletes an entire dataset from ClearML. This can also be used to delete a newly created dataset.
This does not work on datasets with children.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|id|ID of dataset to be deleted. Default: previously created / accessed dataset that hasn't been finalized yet|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|force|Force dataset deletion even if other dataset versions depend on it|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>||
Returns a path to dataset in cache, and downloads it if it is not already in cache.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|use_soft_links|If True, use soft links. Default: False on Windows, True on Posix systems|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|raise_on_error|If True, raise exception if dataset merging failed on any file|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
Downloads the dataset to a specific folder (non-cached). If the folder already has contents, specify whether to overwrite
its contents with the dataset contents.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|target_folder|Local target folder for the writable copy of the dataset|<imgsrc="/icons/ico-optional-no.svg"className="icon size-md center-md"/>|
|overwrite|If True, recursively delete the contents of the target folder before creating a copy of the dataset. If False (default) and target folder contains files, raise exception or return None|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|raise_on_error|If True, raise exception if dataset merging failed on any file|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
<br/>
#### `Dataset.create()`
Create a new dataset.
Parent datasets can be specified, and the new dataset inherits all of its parent's content. Multiple dataset parents can
be listed. Merging of parent datasets is done based on the list's order, where each parent can override overlapping files
in the previous parent dataset.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|dataset_name|Name of the new dataset|<imgsrc="/icons/ico-optional-no.svg"className="icon size-md center-md"/>|
|dataset_project|The project containing the dataset. If not specified, infer project name from parent datasets. If there is no parent dataset, then this value is required|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|parent_datasets|Expand a parent dataset by adding / removing files|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|use_current_task|If True, the dataset is created on the current Task. Default: False|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
<br/>
#### `Dataset.add_files()`
Add files or folder into the current dataset.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|path|Add a folder / file to the dataset|<imgsrc="/icons/ico-optional-no.svg"className="icon size-md center-md"/>|
|wildcard|Add only a specific set of files based on wildcard matching. Wildcard matching can be a single string or a list of wildcards, for example: `~/data/*.jpg`, `~/data/json`|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|local_base_folder|Files will be located based on their relative path from local_base_folder|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|dataset_path|Where in the dataset the folder / files should be located|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|recursive|If True, match all wildcard files recursively|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
|verbose| If True, print to console files added / modified|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|
<br/>
#### `Dataset.upload()`
Start file uploading, the function returns when all files are uploaded.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|show_progress|If True, show upload progress bar|<imgsrc="/icons/ico-optional-yes.svg"className="icon size-md center-md"/>|