mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Small edits (#724)
This commit is contained in:
@@ -9,7 +9,7 @@ See [Hyper-Datasets](../hyperdatasets/overview.md) for ClearML's advanced querya
|
||||
|
||||
`clearml-data` is a data management CLI tool that comes as part of the `clearml` python package. Use `clearml-data` to
|
||||
create, modify, and manage your datasets. You can upload your dataset to any storage service of your choice (S3 / GS /
|
||||
Azure / Network Storage) by setting the dataset’s upload destination (see [`--storage`](#upload)). Once you have uploaded
|
||||
Azure / Network Storage) by setting the dataset's upload destination (see [`--storage`](#upload)). Once you have uploaded
|
||||
your dataset, you can access it from any machine.
|
||||
|
||||
The following page provides a reference to `clearml-data`'s CLI commands.
|
||||
@@ -41,7 +41,7 @@ clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT]
|
||||
|
||||
|
||||
:::tip Dataset ID
|
||||
* For datasets created with `clearml` v1.6 or newer on ClearML Server v1.6 or newer, find the ID in the dataset version’s info panel in the [Dataset UI](../webapp/datasets/webapp_dataset_viewing.md).
|
||||
* For datasets created with `clearml` v1.6 or newer on ClearML Server v1.6 or newer, find the ID in the dataset version's info panel in the [Dataset UI](../webapp/datasets/webapp_dataset_viewing.md).
|
||||
For datasets created with earlier versions of `clearml`, or if using an earlier version of ClearML Server, find the ID in the task header of the [dataset task's info panel](../webapp/webapp_exp_track_visual.md).
|
||||
* clearml-data works in a stateful mode so once a new dataset is created, the following commands
|
||||
do not require the `--id` flag.
|
||||
@@ -66,7 +66,7 @@ clearml-data add [-h] [--id ID] [--dataset-folder DATASET_FOLDER]
|
||||
|Name|Description|Optional|
|
||||
|---|---|---|
|
||||
|`--id` | Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
|`--files`| Files / folders to add. Items will be uploaded to the dataset’s designated storage. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
|`--files`| Files / folders to add. Items will be uploaded to the dataset's designated storage. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
|`--wildcard`| Add specific set of files, denoted by these wildcards. For example: `~/data/*.jpg ~/data/json`. Multiple wildcards can be passed. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
|`--links`| Files / folders link to add. Supports S3, GS, Azure links. Example: `s3://bucket/data` `azure://bucket/folder`. Items remain in their original location. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
|`--dataset-folder` | Dataset base folder to add the files to in the dataset. Default: dataset root| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
@@ -183,7 +183,7 @@ clearml-data sync [-h] [--id ID] [--dataset-folder DATASET_FOLDER] --folder FOLD
|
||||
|`--parents`|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--project`|If creating a new dataset, specify the dataset's project name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--name`|If creating a new dataset, specify the dataset's name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--version`|Specify the dataset’s version using the [semantic versioning](https://semver.org) scheme. Default: `1.0.0`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--version`|Specify the dataset's version using the [semantic versioning](https://semver.org) scheme. Default: `1.0.0`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--tags`|Dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--skip-close`|Do not auto close dataset after syncing folders|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
@@ -233,7 +233,7 @@ clearml-data set-description [-h] [--id ID] [--description DESCRIPTION]
|
||||
|
||||
|Name|Description|Optional|
|
||||
|---|---|---|
|
||||
|`--id`|Dataset’s ID|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|`--id`|Dataset's ID|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|`--description`|Description to be set|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|
||||
|
||||
|
||||
@@ -51,7 +51,7 @@ dataset = Dataset.create(
|
||||
```
|
||||
|
||||
:::tip Locating Dataset ID
|
||||
For datasets created with `clearml` v1.6 or newer on ClearML Server v1.6 or newer, find the ID in the dataset version’s info panel in the [Dataset UI](../webapp/datasets/webapp_dataset_viewing.md).
|
||||
For datasets created with `clearml` v1.6 or newer on ClearML Server v1.6 or newer, find the ID in the dataset version's info panel in the [Dataset UI](../webapp/datasets/webapp_dataset_viewing.md).
|
||||
For datasets created with earlier versions of `clearml`, or if using an earlier version of ClearML Server, find the ID in the task header of the [dataset task's info panel](../webapp/webapp_exp_track_visual.md).
|
||||
:::
|
||||
|
||||
@@ -64,7 +64,7 @@ and auto-increments the version number.
|
||||
Use the `output_uri` parameter to specify a network storage target to upload the dataset files, and associated information
|
||||
(such as previews) to (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://bucket/data`, `file:///mnt/share/data`).
|
||||
By default, the dataset uploads to ClearML's file server. The `output_uri` parameter of the [`Dataset.upload`](#uploading-files)
|
||||
method overrides this parameter’s value.
|
||||
method overrides this parameter's value.
|
||||
|
||||
The created dataset inherits the content of the `parent_datasets`. When multiple dataset parents are listed,
|
||||
they are merged in order of specification. Each parent overrides any overlapping files from a previous parent dataset.
|
||||
@@ -99,7 +99,7 @@ In addition, the target storage location for the squashed dataset can be specifi
|
||||
Once a dataset has been created and uploaded to a server, the dataset can be accessed programmatically from anywhere.
|
||||
|
||||
Use the [`Dataset.get`](../references/sdk/dataset.md#datasetget) class method to access a specific Dataset object, by
|
||||
providing any of the dataset’s following attributes: dataset ID, project, name, tags, and or version. If multiple
|
||||
providing any of the dataset's following attributes: dataset ID, project, name, tags, and or version. If multiple
|
||||
datasets match the query, the most recent one is returned.
|
||||
|
||||
```python
|
||||
@@ -117,10 +117,10 @@ dataset = Dataset.get(
|
||||
Pass `auto_create=True`, and a dataset will be created on-the-fly with the input attributes (project name, dataset name,
|
||||
and tags) if no datasets match the query.
|
||||
|
||||
In cases where you use a dataset in a task (e.g. consuming a dataset), you can have its ID stored in the task’s
|
||||
hyperparameters: pass `alias=<dataset_alias_string>`, and the task using the dataset will store the dataset’s ID in the
|
||||
In cases where you use a dataset in a task (e.g. consuming a dataset), you can have its ID stored in the task's
|
||||
hyperparameters: pass `alias=<dataset_alias_string>`, and the task using the dataset will store the dataset's ID in the
|
||||
`dataset_alias_string` parameter under the `Datasets` hyperparameters section. This way you can easily track which
|
||||
dataset the task is using. If you use `alias` with `overridable=True`, you can override the dataset ID from the UI’s
|
||||
dataset the task is using. If you use `alias` with `overridable=True`, you can override the dataset ID from the UI's
|
||||
**CONFIGURATION > HYPERPARAMETERS >** `Datasets` section, allowing you to change the dataset used when running a task
|
||||
remotely.
|
||||
|
||||
@@ -135,8 +135,8 @@ of an entire dataset. This method downloads the dataset to a specific folder (no
|
||||
the specified folder already has contents, specify whether to overwrite its contents with the dataset contents, using the `overwrite` parameter.
|
||||
|
||||
ClearML supports parallel downloading of datasets. Use the `max_workers` parameter of the `Dataset.get_local_copy` or
|
||||
`Dataset.get_mutable_copy` methods to specify the number of threads to use when downloading the dataset. By default, it’s
|
||||
the number of your machine’s logical cores.
|
||||
`Dataset.get_mutable_copy` methods to specify the number of threads to use when downloading the dataset. By default, it's
|
||||
the number of your machine's logical cores.
|
||||
|
||||
## Modifying Datasets
|
||||
|
||||
@@ -225,7 +225,7 @@ By default, the dataset uploads to ClearML's file server. This target storage ov
|
||||
[`Dataset.create`](#creating-datasets) method.
|
||||
|
||||
ClearML supports parallel uploading of datasets. Use the `max_workers` parameter to specify the number of threads to use
|
||||
when uploading the dataset. By default, it’s the number of your machine’s logical cores.
|
||||
when uploading the dataset. By default, it's the number of your machine's logical cores.
|
||||
|
||||
Dataset files must be uploaded before a dataset is [finalized](#finalizing-a-dataset).
|
||||
|
||||
@@ -317,9 +317,9 @@ You can enable offline mode in one of the following ways:
|
||||
|
||||
* Before creating a dataset, set `CLEARML_OFFLINE_MODE=1`
|
||||
|
||||
All the dataset’s information is zipped and is saved locally.
|
||||
All the dataset's information is zipped and is saved locally.
|
||||
|
||||
The dataset task's console output displays the task’s ID and a path to the local dataset folder:
|
||||
The dataset task's console output displays the task's ID and a path to the local dataset folder:
|
||||
|
||||
```
|
||||
ClearML Task: created new task id=offline-372657bb04444c25a31bc6af86552cc9
|
||||
|
||||
@@ -84,7 +84,7 @@ Now that a new dataset is registered, you can consume it!
|
||||
The [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) script
|
||||
demonstrates data ingestion using the dataset created in the first script.
|
||||
|
||||
The following script gets the dataset and uses [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy)
|
||||
The following script gets the dataset and uses [`Dataset.get_local_copy()`](../../references/sdk/dataset.md#get_local_copy)
|
||||
to return a path to the cached, read-only local dataset.
|
||||
|
||||
```python
|
||||
|
||||
Reference in New Issue
Block a user