diff --git a/docs/clearml_data/clearml_data_cli.md b/docs/clearml_data/clearml_data_cli.md index 0576e781..63a07baf 100644 --- a/docs/clearml_data/clearml_data_cli.md +++ b/docs/clearml_data/clearml_data_cli.md @@ -11,16 +11,13 @@ The `clearml-data` utility is a CLI tool for controlling and managing your data The following page provides a reference to `clearml-data`'s CLI commands. -### Creating a Dataset +## create + +Creates a new dataset. + ```bash clearml-data create --project --name --parents ``` -Creates a new dataset.
- -:::tip Locating Dataset ID -To locate a dataset's ID, go to the dataset task's info panel in the [WebApp](../webapp/webapp_overview.md). In the top of the panel, -to the right of the dataset task name, click `ID` and the dataset ID appears -::: **Parameters** @@ -36,18 +33,23 @@ to the right of the dataset task name, click `ID` and the dataset ID appears -:::info -clearml-data works in a stateful mode so once a new dataset is created, the following commands +:::tip Dataset ID +* To locate a dataset's ID, go to the dataset task's info panel in the [WebApp](../webapp/webapp_overview.md). In the top of the panel, +to the right of the dataset task name, click `ID` and the dataset ID appears. + +* clearml-data works in a stateful mode so once a new dataset is created, the following commands do not require the `--id` flag. :::
-### Adding Files +## add + +Add individual files or complete folders to the dataset. + ```bash clearml-data add --id --files ``` -It's possible to add individual files or complete folders.
**Parameters** @@ -65,7 +67,10 @@ It's possible to add individual files or complete folders.

-### Removing Files +## remove + +Remove files from the dataset. + ```bash clearml-data remove --id --files ``` @@ -85,12 +90,14 @@ clearml-data remove --id --files -### Uploading Dataset Content +## upload + +Upload the local dataset changes to the server. By default, it's uploaded to the [ClearML Server](../deploying_clearml/clearml_server.md). It's possible to specify a different storage +medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`. + ```bash clearml-data upload [--id ] [--storage ] ``` -Uploads added files to [ClearML Server](../deploying_clearml/clearml_server.md) by default. It's possible to specify a different storage -medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`. **Parameters** @@ -107,13 +114,14 @@ medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure
-### Finalizing a Dataset +## close + +Finalize the dataset and makes it ready to be consumed. This automatically uploads all files that were not previously uploaded. +Once a dataset is finalized, it can no longer be modified. + ```bash clearml-data close --id ``` -Finalizes the dataset and makes it ready to be consumed. -It automatically uploads all files that were not previously uploaded. -Once a dataset is finalized, it can no longer be modified. **Parameters** @@ -130,19 +138,20 @@ Once a dataset is finalized, it can no longer be modified.
-### Syncing Local Storage -``` -clearml-data sync [--id [--parents ''] -``` -This option syncs a folder's content with ClearML. It is useful in case a user has a single point of truth (i.e. a folder) which +## sync + +Sync a folder's content with ClearML. This option is useful in case a user has a single point of truth (i.e. a folder) which updates from time to time. - -Once an update should be reflected into ClearML's system, users can call `clearml-data sync`, create a new dataset, enter the folder, +Once an update should be reflected in ClearML's system, call `clearml-data sync` and pass the folder path, and the changes (either file addition, modification and removal) will be reflected in ClearML. This command also uploads the data and finalizes the dataset automatically. +```bash +clearml-data sync [--id [--parents ''] +``` + **Parameters**
@@ -163,7 +172,10 @@ This command also uploads the data and finalizes the dataset automatically.
-### Listing Dataset Content +## list + +List a dataset's contents. + ```bash clearml-data list [--id ] ``` @@ -184,13 +196,16 @@ clearml-data list [--id ]
-### Deleting a Dataset +## delete + +Delete an entire dataset from ClearML. This can also be used to delete a newly created dataset. + +This does not work on datasets with children. + ``` clearml-data delete [--id ] ``` -Deletes an entire dataset from ClearML. This can also be used to delete a newly created dataset. -This does not work on datasets with children. **Parameters** @@ -205,13 +220,15 @@ This does not work on datasets with children.
-### Searching for a Dataset -``` -clearml-data search [--name ] [--project ] [--tags ] -``` -Lists all datasets in the system that match the search request. +## search -Datasets can be searched by project, name, ID, and tags. +Search datasets in the system by project, name, ID, and/or tags. + +Returns list of all datasets in the system that match the search request, sorted by creation time. + +```bash +clearml-data search [--name ] [--ids [IDS [IDS ...]]] [--project ] [--tags ] +``` **Parameters** @@ -228,17 +245,16 @@ Datasets can be searched by project, name, ID, and tags.
-### Comparing Two Datasets +## compare -``` + +Compare two datasets (target vs. source). The command returns a comparison summary that looks like this: +`Comparison summary: 4 files removed, 3 files modified, 0 files added` + +```bash clearml-data compare [--source SOURCE] [--target TARGET] ``` -Compare two datasets (target vs. source). The command returns a comparison summary that looks like this: - -``` -Comparison summary: 4 files removed, 3 files modified, 0 files added -``` **Parameters** @@ -246,20 +262,20 @@ Comparison summary: 4 files removed, 3 files modified, 0 files added |Name|Description|Optional| |---|---|---| -|`--source`|Source dataset id (used as baseline)|No| -|`--target`|Target dataset id (compare against the source baseline dataset)|No| +|`--source`|Source dataset ID (used as baseline)|No| +|`--target`|Target dataset ID (compare against the source baseline dataset)|No| |`--verbose`|Verbose report all file changes (instead of summary)|Yes|
-### Merging Datasets +## squash -``` +Squash multiple datasets into a single dataset version (merge down). + +```bash clearml-data squash --name NAME --ids [IDS [IDS ...]] ``` -Squash (merge) multiple datasets into a single dataset version. - **Parameters**
@@ -273,14 +289,14 @@ Squash (merge) multiple datasets into a single dataset version.
-### Verifying a Dataset - -``` -clearml-data verify [--id ID] [--folder FOLDER] -``` +## verify Verify that the dataset content matches the data from the local source. +```bash +clearml-data verify [--id ID] [--folder FOLDER] +``` + **Parameters**
@@ -289,20 +305,20 @@ Verify that the dataset content matches the data from the local source. |---|---|---| |`--id`|Specify dataset ID. Default: previously created/accessed dataset|Yes| |`--folder`|Specify dataset local copy (if not provided the local cache folder will be verified)|Yes| -|`--filesize`| If True, only verify file size and skip hash checks (default: false)|Yes| +|`--filesize`| If `True`, only verify file size and skip hash checks (default: `False`)|Yes| |`--verbose`|Verbose report all file changes (instead of summary)|Yes|
-### Getting a Dataset - -``` -clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite] -``` +## get Get a local copy of a dataset. By default, you get a read only cached folder, but you can get a mutable copy by using the `--copy` flag. +```bash +clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite] +``` + **Parameters**
@@ -312,26 +328,25 @@ Get a local copy of a dataset. By default, you get a read only cached folder, bu |`--id`| Specify dataset ID. Default: previously created / accessed dataset|Yes| |`--copy`| Get a writable copy of the dataset to a specific output folder|Yes| |`--link`| Create a soft link (not supported on Windows) to a read-only cached folder containing the dataset|Yes| -|`--overwrite`| If True, overwrite the target folder|Yes| +|`--overwrite`| If `True`, overwrite the target folder|Yes| |`--verbose`| Verbose report all file changes (instead of summary)|Yes|
-### Publishing a Dataset +## publish -``` +Publish the dataset for public use. The dataset must be [finalized](#close) before it is published. + +```bash clearml-data publish --id ID ``` -Publish the dataset for public use. The dataset must be [finalized](#finalizing-a-dataset) before it is published. - - **Parameters**
|Name|Description|Optional| |---|---|---| -|`--id`| The dataset task id to be published.|No| +|`--id`| The dataset task ID to be published.|No|