15 KiB
title
| title |
|---|
| CLI |
:::important
This page covers clearml-data, ClearML's file-based data management solution.
See Hyper-Datasets for ClearML's advanced queryable dataset management solution.
:::
The clearml-data utility is a CLI tool for controlling and managing your data with ClearML.
The following page provides a reference to clearml-data's CLI commands.
create
Creates a new dataset.
clearml-data create --project <project_name> --name <dataset_name> --parents <existing_dataset_id>
Parameters
:::tip Dataset ID
-
To locate a dataset's ID, go to the dataset task's info panel in the WebApp. In the top of the panel, to the right of the dataset task name, click
IDand the dataset ID appears. -
clearml-data works in a stateful mode so once a new dataset is created, the following commands do not require the
--idflag. :::
add
Add individual files or complete folders to the dataset.
clearml-data add --id <dataset_id> --files <filenames/folders_to_add>
Parameters
remove
Remove files from the dataset.
clearml-data remove --id <dataset_id_to_remove_from> --files <filenames/folders_to_remove>
Parameters
upload
Upload the local dataset changes to the server. By default, it's uploaded to the ClearML Server. It's possible to specify a different storage
medium by entering an upload destination, such as s3://bucket, gs://, azure://, /mnt/shared/.
clearml-data upload [--id <dataset_id>] [--storage <upload_destination>]
Parameters
| Name | Description | Optional |
|---|---|---|
--id |
Dataset's ID. Default: previously created / accessed dataset | |
--storage |
Remote storage to use for the dataset files. Default: files_server | |
--verbose |
Verbose reporting |
close
Finalize the dataset and makes it ready to be consumed. This automatically uploads all files that were not previously uploaded. Once a dataset is finalized, it can no longer be modified.
clearml-data close --id <dataset_id>
Parameters
sync
Sync a folder's content with ClearML. This option is useful in case a user has a single point of truth (i.e. a folder) which updates from time to time.
Once an update should be reflected in ClearML's system, call clearml-data sync and pass the folder path,
and the changes (either file addition, modification and removal) will be reflected in ClearML.
This command also uploads the data and finalizes the dataset automatically.
clearml-data sync [--id <dataset_id] --folder <folder_location> [--parents '<parent_id>']
Parameters
list
List a dataset's contents.
clearml-data list [--id <dataset_id>]
Parameters
delete
Delete an entire dataset from ClearML. This can also be used to delete a newly created dataset.
This does not work on datasets with children.
clearml-data delete [--id <dataset_id_to_delete>]
Parameters
search
Search datasets in the system by project, name, ID, and/or tags.
Returns list of all datasets in the system that match the search request, sorted by creation time.
clearml-data search [--name <name>] [--ids [IDS [IDS ...]]] [--project <project_name>] [--tags <tag>]
Parameters
| Name | Description | Optional |
|---|---|---|
--ids |
A list of dataset IDs | |
--project |
The project name of the datasets | |
--name |
A dataset name or a partial name to filter datasets by | |
--tags |
A list of dataset user tags |
compare
Compare two datasets (target vs. source). The command returns a comparison summary that looks like this:
Comparison summary: 4 files removed, 3 files modified, 0 files added
clearml-data compare [--source SOURCE] [--target TARGET]
Parameters
squash
Squash multiple datasets into a single dataset version (merge down).
clearml-data squash --name NAME --ids [IDS [IDS ...]]
Parameters
verify
Verify that the dataset content matches the data from the local source.
clearml-data verify [--id ID] [--folder FOLDER]
Parameters
get
Get a local copy of a dataset. By default, you get a read only cached folder, but you can get a mutable copy by using the
--copy flag.
clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite]
Parameters
publish
Publish the dataset for public use. The dataset must be finalized before it is published.
clearml-data publish --id ID
Parameters