--- title: CLI --- :::important This page covers `clearml-data`, ClearML's file-based data management solution. See [Hyper-Datasets](../hyperdatasets/overview.md) for ClearML's advanced queryable dataset management solution. ::: The `clearml-data` utility is a CLI tool for controlling and managing your data with ClearML. The following page provides a reference to `clearml-data`'s CLI commands. ### Creating a Dataset ```bash clearml-data create --project --name --parents ``` Creates a new dataset.
:::tip Locating Dataset ID To locate a dataset's ID, go to the dataset task's info panel in the [WebApp](../webapp/webapp_overview.md). In the top of the panel, to the right of the dataset task name, click `ID` and the dataset ID appears ::: **Parameters**
|Name|Description|Optional| |---|---|---| |`--name` |Dataset's name| No | |`--project`|Dataset's project| No | |`--parents`|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| Yes | |`--tags` |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| Yes|
:::info clearml-data works in a stateful mode so once a new dataset is created, the following commands do not require the `--id` flag. :::
### Adding Files ```bash clearml-data add --id --files ``` It's possible to add individual files or complete folders.
**Parameters**
|Name|Description|Optional| |---|---|---| |`--id` | Dataset's ID. Default: previously created / accessed dataset| Yes | |`--files`|Files / folders to add. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json` | No | |`--dataset-folder` | Dataset base folder to add the files to in the dataset. Default: dataset root| Yes | |`--non-recursive` | Disable recursive scan of files | Yes | |`--verbose` | Verbose reporting | Yes|

### Removing Files ```bash clearml-data remove --id --files ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--id` | Dataset's ID. Default: previously created / accessed dataset| Yes | |`--files` | Files / folders to remove (wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`). Notice: file path is the path within the dataset, not the local path.| No | |`--non-recursive` | Disable recursive scan of files | Yes | |`--verbose` | Verbose reporting | Yes|

### Uploading Dataset Content ```bash clearml-data upload [--id ] [--storage ] ``` Uploads added files to [ClearML Server](../deploying_clearml/clearml_server.md) by default. It's possible to specify a different storage medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`. **Parameters**
|Name|Description|Optional| |---|---|---| |`--id`| Dataset's ID. Default: previously created / accessed dataset| Yes | |`--storage`| Remote storage to use for the dataset files. Default: files_server | Yes | |`--verbose` | Verbose reporting | Yes|

### Finalizing a Dataset ```bash clearml-data close --id ``` Finalizes the dataset and makes it ready to be consumed. It automatically uploads all files that were not previously uploaded. Once a dataset is finalized, it can no longer be modified. **Parameters**
|Name|Description|Optional| |---|---|---| |`--id`| Dataset's ID. Default: previously created / accessed dataset| Yes | |`--storage`| Remote storage to use for the dataset files. Default: files_server | Yes | |`--disable-upload` | Disable automatic upload when closing the dataset | Yes | |`--verbose` | Verbose reporting | Yes|

### Syncing Local Storage ``` clearml-data sync [--id [--parents ''] ``` This option syncs a folder's content with ClearML. It is useful in case a user has a single point of truth (i.e. a folder) which updates from time to time. Once an update should be reflected into ClearML's system, users can call `clearml-data sync`, create a new dataset, enter the folder, and the changes (either file addition, modification and removal) will be reflected in ClearML. This command also uploads the data and finalizes the dataset automatically. **Parameters**
|Name|Description|Optional| |---|---|---| |`--id`| Dataset's ID. Default: previously created / accessed dataset| Yes | |`--folder`|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|No| |`--storage`|Remote storage to use for the dataset files. Default: files_server |Yes| |`--parents`|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|Yes| |`--project`|If creating a new dataset, specify the dataset's project name|Yes| |`--name`|If creating a new dataset, specify the dataset's name|Yes| |`--tags`|Dataset user tags|Yes| |`--skip-close`|Do not auto close dataset after syncing folders|Yes| |`--verbose` | Verbose reporting |Yes|

### Listing Dataset Content ```bash clearml-data list [--id ] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--id`|Dataset ID whose contents will be shown (alternatively, use project / name combination). Default: previously accessed dataset|Yes| |`--project`|Specify dataset project name (if used instead of ID, dataset name is also required)|Yes| |`--name`|Specify dataset name (if used instead of ID, dataset project is also required)|Yes| |`--filter`|Filter files based on folder / wildcard. Multiple filters are supported. Example: `folder/date_*.json folder/sub-folder`|Yes| |`--modified`|Only list file changes (add / remove / modify) introduced in this version|Yes|

### Deleting a Dataset ``` clearml-data delete [--id ] ``` Deletes an entire dataset from ClearML. This can also be used to delete a newly created dataset. This does not work on datasets with children. **Parameters**
|Name|Description|Optional| |---|---|---| |`--id`|ID of dataset to be deleted. Default: previously created / accessed dataset that hasn't been finalized yet|Yes| |`--force`|Force dataset deletion even if other dataset versions depend on it|Yes||

### Searching for a Dataset ``` clearml-data search [--name ] [--project ] [--tags ] ``` Lists all datasets in the system that match the search request. Datasets can be searched by project, name, ID, and tags. **Parameters**
|Name|Description|Optional| |---|---|---| |`--ids`|A list of dataset IDs|| |`--project`|The project name of the datasets|| |`--name`|A dataset name or a partial name to filter datasets by|| |`--tags`|A list of dataset user tags||

### Comparing Two Datasets ``` clearml-data compare [--source SOURCE] [--target TARGET] ``` Compare two datasets (target vs. source). The command returns a comparison summary that looks like this: ``` Comparison summary: 4 files removed, 3 files modified, 0 files added ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--source`|Source dataset id (used as baseline)|No| |`--target`|Target dataset id (compare against the source baseline dataset)|No| |`--verbose`|Verbose report all file changes (instead of summary)|Yes|
### Merging Datasets ``` clearml-data squash --name NAME --ids [IDS [IDS ...]] ``` Squash (merge) multiple datasets into a single dataset version. **Parameters**
|Name|Description|Optional| |---|---|---| |`--name`|Create squashed dataset name|No| |`--ids`|Source dataset IDs to squash (merge down)|No| |`--storage`|Remote storage to use for the dataset files. Default: files_server |Yes| |`--verbose`|Verbose report all file changes (instead of summary)|Yes|
### Verifying a Dataset ``` clearml-data verify [--id ID] [--folder FOLDER] ``` Verify that the dataset content matches the data from the local source. **Parameters**
|Name|Description|Optional| |---|---|---| |`--id`|Specify dataset ID. Default: previously created/accessed dataset|Yes| |`--folder`|Specify dataset local copy (if not provided the local cache folder will be verified)|Yes| |`--filesize`| If True, only verify file size and skip hash checks (default: false)|Yes| |`--verbose`|Verbose report all file changes (instead of summary)|Yes|
### Getting a Dataset ``` clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite] ``` Get a local copy of a dataset. By default, you get a read only cached folder, but you can get a mutable copy by using the `--copy` flag. **Parameters**
|Name|Description|Optional| |---|---|---| |`--id`| Specify dataset ID. Default: previously created / accessed dataset|Yes| |`--copy`| Get a writable copy of the dataset to a specific output folder|Yes| |`--link`| Create a soft link (not supported on Windows) to a read-only cached folder containing the dataset|Yes| |`--overwrite`| If True, overwrite the target folder|Yes| |`--verbose`| Verbose report all file changes (instead of summary)|Yes|
### Publishing a Dataset ``` clearml-data publish --id ID ``` Publish the dataset for public use. The dataset must be [finalized](#finalizing-a-dataset) before it is published. **Parameters**
|Name|Description|Optional| |---|---|---| |`--id`| The dataset task id to be published.|No|