From 24f42c60266effa2867eb4c5d6cfcc9437dfc515 Mon Sep 17 00:00:00 2001 From: pollfly <75068813+pollfly@users.noreply.github.com> Date: Tue, 10 May 2022 10:46:15 +0300 Subject: [PATCH] Edit ClearML Data CLI page (#248) --- docs/clearml_data/clearml_data_cli.md | 48 ++++++++++++++++++--------- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/docs/clearml_data/clearml_data_cli.md b/docs/clearml_data/clearml_data_cli.md index cdeedfc7..e549fe2a 100644 --- a/docs/clearml_data/clearml_data_cli.md +++ b/docs/clearml_data/clearml_data_cli.md @@ -16,7 +16,8 @@ The following page provides a reference to `clearml-data`'s CLI commands. Creates a new dataset. ```bash -clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT] --name NAME [--tags [TAGS [TAGS ...]]] +clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT] + --name NAME [--tags [TAGS [TAGS ...]]] ``` **Parameters** @@ -75,7 +76,8 @@ clearml-data add [-h] [--id ID] [--dataset-folder DATASET_FOLDER] Remove files/links from the dataset. ```bash -clearml-data remove [-h] [--id ID] [--files [FILES [FILES ...]]] [--non-recursive] [--verbose] +clearml-data remove [-h] [--id ID] [--files [FILES [FILES ...]]] + [--non-recursive] [--verbose] ``` **Parameters** @@ -99,7 +101,8 @@ Upload the local dataset changes to the server. By default, it's uploaded to the medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`. ```bash -clearml-data upload [--id ] [--storage ] +clearml-data upload [-h] [--id ID] [--storage STORAGE] [--chunk-size CHUNK_SIZE] + [--verbose] ``` @@ -111,6 +114,7 @@ clearml-data upload [--id ] [--storage ] |---|---|---| |`--id`| Dataset's ID. Default: previously created / accessed dataset| Yes | |`--storage`| Remote storage to use for the dataset files. Default: files_server | Yes | +|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. | Yes| |`--verbose` | Verbose reporting | Yes| @@ -123,7 +127,8 @@ Finalize the dataset and makes it ready to be consumed. This automatically uploa Once a dataset is finalized, it can no longer be modified. ```bash -clearml-data close --id +clearml-data close [-h] [--id ID] [--storage STORAGE] [--disable-upload] + [--chunk-size CHUNK_SIZE] [--verbose] ``` **Parameters** @@ -135,6 +140,7 @@ clearml-data close --id |`--id`| Dataset's ID. Default: previously created / accessed dataset| Yes | |`--storage`| Remote storage to use for the dataset files. Default: files_server | Yes | |`--disable-upload` | Disable automatic upload when closing the dataset | Yes | +|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. | Yes| |`--verbose` | Verbose reporting | Yes| @@ -152,7 +158,10 @@ and the changes (either file addition, modification and removal) will be reflect This command also uploads the data and finalizes the dataset automatically. ```bash -clearml-data sync [--id [--parents ''] +clearml-data sync [-h] [--id ID] [--dataset-folder DATASET_FOLDER] --folder FOLDER + [--parents [PARENTS [PARENTS ...]]] [--project PROJECT] [--name NAME] + [--tags [TAGS [TAGS ...]]] [--storage STORAGE] [--skip-close] + [--chunk-size CHUNK_SIZE] [--verbose] ``` **Parameters** @@ -162,6 +171,7 @@ clearml-data sync [--id [--parents ' | +|`--dataset-folder`|Dataset base folder to add the files to (default: Dataset root)|Yes| |`--folder`|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|No| |`--storage`|Remote storage to use for the dataset files. Default: files_server |Yes| |`--parents`|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|Yes| @@ -169,6 +179,7 @@ clearml-data sync [--id [--parents '| |`--tags`|Dataset user tags|Yes| |`--skip-close`|Do not auto close dataset after syncing folders|Yes| +|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. |Yes| |`--verbose` | Verbose reporting |Yes| @@ -180,7 +191,8 @@ clearml-data sync [--id [--parents '] +clearml-data list [-h] [--id ID] [--project PROJECT] [--name NAME] + [--filter [FILTER [FILTER ...]]] [--modified] ``` **Parameters** @@ -205,8 +217,8 @@ Delete an entire dataset from ClearML. This can also be used to delete a newly c This does not work on datasets with children. -``` -clearml-data delete [--id ] +```bash +clearml-data delete [-h] [--id ID] [--force] ``` @@ -230,7 +242,8 @@ Search datasets in the system by project, name, ID, and/or tags. Returns list of all datasets in the system that match the search request, sorted by creation time. ```bash -clearml-data search [--name ] [--ids [IDS [IDS ...]]] [--project ] [--tags ] +clearml-data search [-h] [--ids [IDS [IDS ...]]] [--project PROJECT] + [--name NAME] [--tags [TAGS [TAGS ...]]] ``` **Parameters** @@ -255,7 +268,7 @@ Compare two datasets (target vs. source). The command returns a comparison summa `Comparison summary: 4 files removed, 3 files modified, 0 files added` ```bash -clearml-data compare [--source SOURCE] [--target TARGET] +clearml-data compare [-h] --source SOURCE --target TARGET [--verbose] ``` @@ -276,7 +289,7 @@ clearml-data compare [--source SOURCE] [--target TARGET] Squash multiple datasets into a single dataset version (merge down). ```bash -clearml-data squash --name NAME --ids [IDS [IDS ...]] +clearml-data squash [-h] --name NAME --ids [IDS [IDS ...]] [--storage STORAGE] [--verbose] ``` **Parameters** @@ -297,7 +310,7 @@ clearml-data squash --name NAME --ids [IDS [IDS ...]] Verify that the dataset content matches the data from the local source. ```bash -clearml-data verify [--id ID] [--folder FOLDER] +clearml-data verify [-h] [--id ID] [--folder FOLDER] [--filesize] [--verbose] ``` **Parameters** @@ -319,7 +332,8 @@ Get a local copy of a dataset. By default, you get a read only cached folder, bu `--copy` flag. ```bash -clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite] +clearml-data get [-h] [--id ID] [--copy COPY] [--link LINK] [--part PART] + [--num-parts NUM_PARTS] [--overwrite] [--verbose] ``` **Parameters** @@ -331,8 +345,10 @@ clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite] |`--id`| Specify dataset ID. Default: previously created / accessed dataset|Yes| |`--copy`| Get a writable copy of the dataset to a specific output folder|Yes| |`--link`| Create a soft link (not supported on Windows) to a read-only cached folder containing the dataset|Yes| +|`--part`|Retrieve a partial copy of the dataset. Part number (0 to `--num-parts`-1) of total parts `--num-parts`.|Yes| +|`--num-parts`|Total number of parts to divide the dataset into. Notice, minimum retrieved part is a single chunk in a dataset (or its parents). Example: Dataset gen4, with 3 parents, each with a single chunk, can be divided into 4 parts |Yes| |`--overwrite`| If `True`, overwrite the target folder|Yes| -|`--verbose`| Verbose report all file changes (instead of summary)|Yes| +|`--verbose`| Verbose report all file changes (instead of summary)| Yes| @@ -341,7 +357,7 @@ clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite] Publish the dataset for public use. The dataset must be [finalized](#close) before it is published. ```bash -clearml-data publish --id ID +clearml-data publish [-h] --id ID ``` **Parameters** @@ -350,6 +366,6 @@ clearml-data publish --id ID |Name|Description|Optional| |---|---|---| -|`--id`| The dataset task ID to be published.|No| +|`--id`| The dataset task ID to be published.| No|