Edit ClearML Data CLI page (#248)

This commit is contained in:
pollfly 2022-05-10 10:46:15 +03:00 committed by GitHub
parent 88b53fa5c9
commit 24f42c6026
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -16,7 +16,8 @@ The following page provides a reference to `clearml-data`'s CLI commands.
Creates a new dataset. Creates a new dataset.
```bash ```bash
clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT] --name NAME [--tags [TAGS [TAGS ...]]] clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT]
--name NAME [--tags [TAGS [TAGS ...]]]
``` ```
**Parameters** **Parameters**
@ -75,7 +76,8 @@ clearml-data add [-h] [--id ID] [--dataset-folder DATASET_FOLDER]
Remove files/links from the dataset. Remove files/links from the dataset.
```bash ```bash
clearml-data remove [-h] [--id ID] [--files [FILES [FILES ...]]] [--non-recursive] [--verbose] clearml-data remove [-h] [--id ID] [--files [FILES [FILES ...]]]
[--non-recursive] [--verbose]
``` ```
**Parameters** **Parameters**
@ -99,7 +101,8 @@ Upload the local dataset changes to the server. By default, it's uploaded to the
medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`. medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`.
```bash ```bash
clearml-data upload [--id <dataset_id>] [--storage <upload_destination>] clearml-data upload [-h] [--id ID] [--storage STORAGE] [--chunk-size CHUNK_SIZE]
[--verbose]
``` ```
@ -111,6 +114,7 @@ clearml-data upload [--id <dataset_id>] [--storage <upload_destination>]
|---|---|---| |---|---|---|
|`--id`| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | |`--id`| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--storage`| Remote storage to use for the dataset files. Default: files_server | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | |`--storage`| Remote storage to use for the dataset files. Default: files_server | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--verbose` | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--verbose` | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
</div> </div>
@ -123,7 +127,8 @@ Finalize the dataset and makes it ready to be consumed. This automatically uploa
Once a dataset is finalized, it can no longer be modified. Once a dataset is finalized, it can no longer be modified.
```bash ```bash
clearml-data close --id <dataset_id> clearml-data close [-h] [--id ID] [--storage STORAGE] [--disable-upload]
[--chunk-size CHUNK_SIZE] [--verbose]
``` ```
**Parameters** **Parameters**
@ -135,6 +140,7 @@ clearml-data close --id <dataset_id>
|`--id`| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | |`--id`| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--storage`| Remote storage to use for the dataset files. Default: files_server | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | |`--storage`| Remote storage to use for the dataset files. Default: files_server | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--disable-upload` | Disable automatic upload when closing the dataset | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | |`--disable-upload` | Disable automatic upload when closing the dataset | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--verbose` | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--verbose` | Verbose reporting | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
</div> </div>
@ -152,7 +158,10 @@ and the changes (either file addition, modification and removal) will be reflect
This command also uploads the data and finalizes the dataset automatically. This command also uploads the data and finalizes the dataset automatically.
```bash ```bash
clearml-data sync [--id <dataset_id] --folder <folder_location> [--parents '<parent_id>'] clearml-data sync [-h] [--id ID] [--dataset-folder DATASET_FOLDER] --folder FOLDER
[--parents [PARENTS [PARENTS ...]]] [--project PROJECT] [--name NAME]
[--tags [TAGS [TAGS ...]]] [--storage STORAGE] [--skip-close]
[--chunk-size CHUNK_SIZE] [--verbose]
``` ```
**Parameters** **Parameters**
@ -162,6 +171,7 @@ clearml-data sync [--id <dataset_id] --folder <folder_location> [--parents '<pa
|Name|Description|Optional| |Name|Description|Optional|
|---|---|---| |---|---|---|
|`--id`| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | |`--id`| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--dataset-folder`|Dataset base folder to add the files to (default: Dataset root)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--folder`|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />| |`--folder`|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|`--storage`|Remote storage to use for the dataset files. Default: files_server |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--storage`|Remote storage to use for the dataset files. Default: files_server |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--parents`|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--parents`|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
@ -169,6 +179,7 @@ clearml-data sync [--id <dataset_id] --folder <folder_location> [--parents '<pa
|`--name`|If creating a new dataset, specify the dataset's name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--name`|If creating a new dataset, specify the dataset's name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--tags`|Dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--tags`|Dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--skip-close`|Do not auto close dataset after syncing folders|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--skip-close`|Do not auto close dataset after syncing folders|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--verbose` | Verbose reporting |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--verbose` | Verbose reporting |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
</div> </div>
@ -180,7 +191,8 @@ clearml-data sync [--id <dataset_id] --folder <folder_location> [--parents '<pa
List a dataset's contents. List a dataset's contents.
```bash ```bash
clearml-data list [--id <dataset_id>] clearml-data list [-h] [--id ID] [--project PROJECT] [--name NAME]
[--filter [FILTER [FILTER ...]]] [--modified]
``` ```
**Parameters** **Parameters**
@ -205,8 +217,8 @@ Delete an entire dataset from ClearML. This can also be used to delete a newly c
This does not work on datasets with children. This does not work on datasets with children.
``` ```bash
clearml-data delete [--id <dataset_id_to_delete>] clearml-data delete [-h] [--id ID] [--force]
``` ```
@ -230,7 +242,8 @@ Search datasets in the system by project, name, ID, and/or tags.
Returns list of all datasets in the system that match the search request, sorted by creation time. Returns list of all datasets in the system that match the search request, sorted by creation time.
```bash ```bash
clearml-data search [--name <name>] [--ids [IDS [IDS ...]]] [--project <project_name>] [--tags <tag>] clearml-data search [-h] [--ids [IDS [IDS ...]]] [--project PROJECT]
[--name NAME] [--tags [TAGS [TAGS ...]]]
``` ```
**Parameters** **Parameters**
@ -255,7 +268,7 @@ Compare two datasets (target vs. source). The command returns a comparison summa
`Comparison summary: 4 files removed, 3 files modified, 0 files added` `Comparison summary: 4 files removed, 3 files modified, 0 files added`
```bash ```bash
clearml-data compare [--source SOURCE] [--target TARGET] clearml-data compare [-h] --source SOURCE --target TARGET [--verbose]
``` ```
@ -276,7 +289,7 @@ clearml-data compare [--source SOURCE] [--target TARGET]
Squash multiple datasets into a single dataset version (merge down). Squash multiple datasets into a single dataset version (merge down).
```bash ```bash
clearml-data squash --name NAME --ids [IDS [IDS ...]] clearml-data squash [-h] --name NAME --ids [IDS [IDS ...]] [--storage STORAGE] [--verbose]
``` ```
**Parameters** **Parameters**
@ -297,7 +310,7 @@ clearml-data squash --name NAME --ids [IDS [IDS ...]]
Verify that the dataset content matches the data from the local source. Verify that the dataset content matches the data from the local source.
```bash ```bash
clearml-data verify [--id ID] [--folder FOLDER] clearml-data verify [-h] [--id ID] [--folder FOLDER] [--filesize] [--verbose]
``` ```
**Parameters** **Parameters**
@ -319,7 +332,8 @@ Get a local copy of a dataset. By default, you get a read only cached folder, bu
`--copy` flag. `--copy` flag.
```bash ```bash
clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite] clearml-data get [-h] [--id ID] [--copy COPY] [--link LINK] [--part PART]
[--num-parts NUM_PARTS] [--overwrite] [--verbose]
``` ```
**Parameters** **Parameters**
@ -331,8 +345,10 @@ clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite]
|`--id`| Specify dataset ID. Default: previously created / accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--id`| Specify dataset ID. Default: previously created / accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--copy`| Get a writable copy of the dataset to a specific output folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--copy`| Get a writable copy of the dataset to a specific output folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--link`| Create a soft link (not supported on Windows) to a read-only cached folder containing the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--link`| Create a soft link (not supported on Windows) to a read-only cached folder containing the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--part`|Retrieve a partial copy of the dataset. Part number (0 to `--num-parts`-1) of total parts `--num-parts`.|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--num-parts`|Total number of parts to divide the dataset into. Notice, minimum retrieved part is a single chunk in a dataset (or its parents). Example: Dataset gen4, with 3 parents, each with a single chunk, can be divided into 4 parts |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--overwrite`| If `True`, overwrite the target folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--overwrite`| If `True`, overwrite the target folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--verbose`| Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--verbose`| Verbose report all file changes (instead of summary)| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
</div> </div>
@ -341,7 +357,7 @@ clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite]
Publish the dataset for public use. The dataset must be [finalized](#close) before it is published. Publish the dataset for public use. The dataset must be [finalized](#close) before it is published.
```bash ```bash
clearml-data publish --id ID clearml-data publish [-h] --id ID
``` ```
**Parameters** **Parameters**
@ -350,6 +366,6 @@ clearml-data publish --id ID
|Name|Description|Optional| |Name|Description|Optional|
|---|---|---| |---|---|---|
|`--id`| The dataset task ID to be published.|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />| |`--id`| The dataset task ID to be published.| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
</div> </div>