Reformat clearml-data CLI page (#237)

This commit is contained in:
pollfly 2022-04-24 13:06:07 +03:00 committed by GitHub
parent f2198857f7
commit efbaafb40c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -11,16 +11,13 @@ The `clearml-data` utility is a CLI tool for controlling and managing your data
The following page provides a reference to `clearml-data`'s CLI commands. The following page provides a reference to `clearml-data`'s CLI commands.
### Creating a Dataset ## create
Creates a new dataset.
```bash ```bash
clearml-data create --project <project_name> --name <dataset_name> --parents <existing_dataset_id> clearml-data create --project <project_name> --name <dataset_name> --parents <existing_dataset_id>
``` ```
Creates a new dataset. <br/>
:::tip Locating Dataset ID
To locate a dataset's ID, go to the dataset task's info panel in the [WebApp](../webapp/webapp_overview.md). In the top of the panel,
to the right of the dataset task name, click `ID` and the dataset ID appears
:::
**Parameters** **Parameters**
@ -36,18 +33,23 @@ to the right of the dataset task name, click `ID` and the dataset ID appears
</div> </div>
:::info :::tip Dataset ID
clearml-data works in a stateful mode so once a new dataset is created, the following commands * To locate a dataset's ID, go to the dataset task's info panel in the [WebApp](../webapp/webapp_overview.md). In the top of the panel,
to the right of the dataset task name, click `ID` and the dataset ID appears.
* clearml-data works in a stateful mode so once a new dataset is created, the following commands
do not require the `--id` flag. do not require the `--id` flag.
::: :::
<br/> <br/>
### Adding Files ## add
Add individual files or complete folders to the dataset.
```bash ```bash
clearml-data add --id <dataset_id> --files <filenames/folders_to_add> clearml-data add --id <dataset_id> --files <filenames/folders_to_add>
``` ```
It's possible to add individual files or complete folders.<br/>
**Parameters** **Parameters**
@ -65,7 +67,10 @@ It's possible to add individual files or complete folders.<br/>
<br/> <br/>
### Removing Files ## remove
Remove files from the dataset.
```bash ```bash
clearml-data remove --id <dataset_id_to_remove_from> --files <filenames/folders_to_remove> clearml-data remove --id <dataset_id_to_remove_from> --files <filenames/folders_to_remove>
``` ```
@ -85,12 +90,14 @@ clearml-data remove --id <dataset_id_to_remove_from> --files <filenames/folders_
<br/> <br/>
### Uploading Dataset Content ## upload
Upload the local dataset changes to the server. By default, it's uploaded to the [ClearML Server](../deploying_clearml/clearml_server.md). It's possible to specify a different storage
medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`.
```bash ```bash
clearml-data upload [--id <dataset_id>] [--storage <upload_destination>] clearml-data upload [--id <dataset_id>] [--storage <upload_destination>]
``` ```
Uploads added files to [ClearML Server](../deploying_clearml/clearml_server.md) by default. It's possible to specify a different storage
medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`.
**Parameters** **Parameters**
@ -107,13 +114,14 @@ medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure
<br/> <br/>
### Finalizing a Dataset ## close
Finalize the dataset and makes it ready to be consumed. This automatically uploads all files that were not previously uploaded.
Once a dataset is finalized, it can no longer be modified.
```bash ```bash
clearml-data close --id <dataset_id> clearml-data close --id <dataset_id>
``` ```
Finalizes the dataset and makes it ready to be consumed.
It automatically uploads all files that were not previously uploaded.
Once a dataset is finalized, it can no longer be modified.
**Parameters** **Parameters**
@ -130,19 +138,20 @@ Once a dataset is finalized, it can no longer be modified.
<br/> <br/>
### Syncing Local Storage ## sync
```
clearml-data sync [--id <dataset_id] --folder <folder_location> [--parents '<parent_id>'] Sync a folder's content with ClearML. This option is useful in case a user has a single point of truth (i.e. a folder) which
```
This option syncs a folder's content with ClearML. It is useful in case a user has a single point of truth (i.e. a folder) which
updates from time to time. updates from time to time.
Once an update should be reflected in ClearML's system, call `clearml-data sync` and pass the folder path,
Once an update should be reflected into ClearML's system, users can call `clearml-data sync`, create a new dataset, enter the folder,
and the changes (either file addition, modification and removal) will be reflected in ClearML. and the changes (either file addition, modification and removal) will be reflected in ClearML.
This command also uploads the data and finalizes the dataset automatically. This command also uploads the data and finalizes the dataset automatically.
```bash
clearml-data sync [--id <dataset_id] --folder <folder_location> [--parents '<parent_id>']
```
**Parameters** **Parameters**
<div className="tbl-cmd"> <div className="tbl-cmd">
@ -163,7 +172,10 @@ This command also uploads the data and finalizes the dataset automatically.
<br/> <br/>
### Listing Dataset Content ## list
List a dataset's contents.
```bash ```bash
clearml-data list [--id <dataset_id>] clearml-data list [--id <dataset_id>]
``` ```
@ -184,13 +196,16 @@ clearml-data list [--id <dataset_id>]
<br/> <br/>
### Deleting a Dataset ## delete
Delete an entire dataset from ClearML. This can also be used to delete a newly created dataset.
This does not work on datasets with children.
``` ```
clearml-data delete [--id <dataset_id_to_delete>] clearml-data delete [--id <dataset_id_to_delete>]
``` ```
Deletes an entire dataset from ClearML. This can also be used to delete a newly created dataset.
This does not work on datasets with children.
**Parameters** **Parameters**
@ -205,13 +220,15 @@ This does not work on datasets with children.
<br/> <br/>
### Searching for a Dataset ## search
```
clearml-data search [--name <name>] [--project <project_name>] [--tags <tag>]
```
Lists all datasets in the system that match the search request.
Datasets can be searched by project, name, ID, and tags. Search datasets in the system by project, name, ID, and/or tags.
Returns list of all datasets in the system that match the search request, sorted by creation time.
```bash
clearml-data search [--name <name>] [--ids [IDS [IDS ...]]] [--project <project_name>] [--tags <tag>]
```
**Parameters** **Parameters**
@ -228,17 +245,16 @@ Datasets can be searched by project, name, ID, and tags.
<br/> <br/>
### Comparing Two Datasets ## compare
```
Compare two datasets (target vs. source). The command returns a comparison summary that looks like this:
`Comparison summary: 4 files removed, 3 files modified, 0 files added`
```bash
clearml-data compare [--source SOURCE] [--target TARGET] clearml-data compare [--source SOURCE] [--target TARGET]
``` ```
Compare two datasets (target vs. source). The command returns a comparison summary that looks like this:
```
Comparison summary: 4 files removed, 3 files modified, 0 files added
```
**Parameters** **Parameters**
@ -246,20 +262,20 @@ Comparison summary: 4 files removed, 3 files modified, 0 files added
|Name|Description|Optional| |Name|Description|Optional|
|---|---|---| |---|---|---|
|`--source`|Source dataset id (used as baseline)|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />| |`--source`|Source dataset ID (used as baseline)|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|`--target`|Target dataset id (compare against the source baseline dataset)|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />| |`--target`|Target dataset ID (compare against the source baseline dataset)|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|`--verbose`|Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--verbose`|Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
</div> </div>
### Merging Datasets ## squash
``` Squash multiple datasets into a single dataset version (merge down).
```bash
clearml-data squash --name NAME --ids [IDS [IDS ...]] clearml-data squash --name NAME --ids [IDS [IDS ...]]
``` ```
Squash (merge) multiple datasets into a single dataset version.
**Parameters** **Parameters**
<div className="tbl-cmd"> <div className="tbl-cmd">
@ -273,14 +289,14 @@ Squash (merge) multiple datasets into a single dataset version.
</div> </div>
### Verifying a Dataset ## verify
```
clearml-data verify [--id ID] [--folder FOLDER]
```
Verify that the dataset content matches the data from the local source. Verify that the dataset content matches the data from the local source.
```bash
clearml-data verify [--id ID] [--folder FOLDER]
```
**Parameters** **Parameters**
<div className="tbl-cmd"> <div className="tbl-cmd">
@ -289,20 +305,20 @@ Verify that the dataset content matches the data from the local source.
|---|---|---| |---|---|---|
|`--id`|Specify dataset ID. Default: previously created/accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--id`|Specify dataset ID. Default: previously created/accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--folder`|Specify dataset local copy (if not provided the local cache folder will be verified)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--folder`|Specify dataset local copy (if not provided the local cache folder will be verified)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--filesize`| If True, only verify file size and skip hash checks (default: false)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--filesize`| If `True`, only verify file size and skip hash checks (default: `False`)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--verbose`|Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--verbose`|Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
</div> </div>
### Getting a Dataset ## get
```
clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite]
```
Get a local copy of a dataset. By default, you get a read only cached folder, but you can get a mutable copy by using the Get a local copy of a dataset. By default, you get a read only cached folder, but you can get a mutable copy by using the
`--copy` flag. `--copy` flag.
```bash
clearml-data get [--id ID] [--copy COPY] [--link LINK] [--overwrite]
```
**Parameters** **Parameters**
<div className="tbl-cmd"> <div className="tbl-cmd">
@ -312,26 +328,25 @@ Get a local copy of a dataset. By default, you get a read only cached folder, bu
|`--id`| Specify dataset ID. Default: previously created / accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--id`| Specify dataset ID. Default: previously created / accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--copy`| Get a writable copy of the dataset to a specific output folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--copy`| Get a writable copy of the dataset to a specific output folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--link`| Create a soft link (not supported on Windows) to a read-only cached folder containing the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--link`| Create a soft link (not supported on Windows) to a read-only cached folder containing the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--overwrite`| If True, overwrite the target folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--overwrite`| If `True`, overwrite the target folder|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--verbose`| Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--verbose`| Verbose report all file changes (instead of summary)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
</div> </div>
### Publishing a Dataset ## publish
``` Publish the dataset for public use. The dataset must be [finalized](#close) before it is published.
```bash
clearml-data publish --id ID clearml-data publish --id ID
``` ```
Publish the dataset for public use. The dataset must be [finalized](#finalizing-a-dataset) before it is published.
**Parameters** **Parameters**
<div className="tbl-cmd"> <div className="tbl-cmd">
|Name|Description|Optional| |Name|Description|Optional|
|---|---|---| |---|---|---|
|`--id`| The dataset task id to be published.|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />| |`--id`| The dataset task ID to be published.|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
</div> </div>