Small edits (#724)

This commit is contained in:
pollfly
2023-12-03 14:27:46 +02:00
committed by GitHub
parent 4b02af91f7
commit 680bca6644
44 changed files with 131 additions and 131 deletions

View File

@@ -9,7 +9,7 @@ See [Hyper-Datasets](../hyperdatasets/overview.md) for ClearML's advanced querya
`clearml-data` is a data management CLI tool that comes as part of the `clearml` python package. Use `clearml-data` to
create, modify, and manage your datasets. You can upload your dataset to any storage service of your choice (S3 / GS /
Azure / Network Storage) by setting the datasets upload destination (see [`--storage`](#upload)). Once you have uploaded
Azure / Network Storage) by setting the dataset's upload destination (see [`--storage`](#upload)). Once you have uploaded
your dataset, you can access it from any machine.
The following page provides a reference to `clearml-data`'s CLI commands.
@@ -41,7 +41,7 @@ clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT]
:::tip Dataset ID
* For datasets created with `clearml` v1.6 or newer on ClearML Server v1.6 or newer, find the ID in the dataset versions info panel in the [Dataset UI](../webapp/datasets/webapp_dataset_viewing.md).
* For datasets created with `clearml` v1.6 or newer on ClearML Server v1.6 or newer, find the ID in the dataset version's info panel in the [Dataset UI](../webapp/datasets/webapp_dataset_viewing.md).
For datasets created with earlier versions of `clearml`, or if using an earlier version of ClearML Server, find the ID in the task header of the [dataset task's info panel](../webapp/webapp_exp_track_visual.md).
* clearml-data works in a stateful mode so once a new dataset is created, the following commands
do not require the `--id` flag.
@@ -66,7 +66,7 @@ clearml-data add [-h] [--id ID] [--dataset-folder DATASET_FOLDER]
|Name|Description|Optional|
|---|---|---|
|`--id` | Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--files`| Files / folders to add. Items will be uploaded to the datasets designated storage. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--files`| Files / folders to add. Items will be uploaded to the dataset's designated storage. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--wildcard`| Add specific set of files, denoted by these wildcards. For example: `~/data/*.jpg ~/data/json`. Multiple wildcards can be passed. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--links`| Files / folders link to add. Supports S3, GS, Azure links. Example: `s3://bucket/data` `azure://bucket/folder`. Items remain in their original location. | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--dataset-folder` | Dataset base folder to add the files to in the dataset. Default: dataset root| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
@@ -183,7 +183,7 @@ clearml-data sync [-h] [--id ID] [--dataset-folder DATASET_FOLDER] --folder FOLD
|`--parents`|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--project`|If creating a new dataset, specify the dataset's project name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--name`|If creating a new dataset, specify the dataset's name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--version`|Specify the datasets version using the [semantic versioning](https://semver.org) scheme. Default: `1.0.0`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--version`|Specify the dataset's version using the [semantic versioning](https://semver.org) scheme. Default: `1.0.0`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--tags`|Dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--skip-close`|Do not auto close dataset after syncing folders|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
@@ -233,7 +233,7 @@ clearml-data set-description [-h] [--id ID] [--description DESCRIPTION]
|Name|Description|Optional|
|---|---|---|
|`--id`|Datasets ID|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|`--id`|Dataset's ID|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|`--description`|Description to be set|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|

View File

@@ -51,7 +51,7 @@ dataset = Dataset.create(
```
:::tip Locating Dataset ID
For datasets created with `clearml` v1.6 or newer on ClearML Server v1.6 or newer, find the ID in the dataset versions info panel in the [Dataset UI](../webapp/datasets/webapp_dataset_viewing.md).
For datasets created with `clearml` v1.6 or newer on ClearML Server v1.6 or newer, find the ID in the dataset version's info panel in the [Dataset UI](../webapp/datasets/webapp_dataset_viewing.md).
For datasets created with earlier versions of `clearml`, or if using an earlier version of ClearML Server, find the ID in the task header of the [dataset task's info panel](../webapp/webapp_exp_track_visual.md).
:::
@@ -64,7 +64,7 @@ and auto-increments the version number.
Use the `output_uri` parameter to specify a network storage target to upload the dataset files, and associated information
(such as previews) to (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://bucket/data`, `file:///mnt/share/data`).
By default, the dataset uploads to ClearML's file server. The `output_uri` parameter of the [`Dataset.upload`](#uploading-files)
method overrides this parameters value.
method overrides this parameter's value.
The created dataset inherits the content of the `parent_datasets`. When multiple dataset parents are listed,
they are merged in order of specification. Each parent overrides any overlapping files from a previous parent dataset.
@@ -99,7 +99,7 @@ In addition, the target storage location for the squashed dataset can be specifi
Once a dataset has been created and uploaded to a server, the dataset can be accessed programmatically from anywhere.
Use the [`Dataset.get`](../references/sdk/dataset.md#datasetget) class method to access a specific Dataset object, by
providing any of the datasets following attributes: dataset ID, project, name, tags, and or version. If multiple
providing any of the dataset's following attributes: dataset ID, project, name, tags, and or version. If multiple
datasets match the query, the most recent one is returned.
```python
@@ -117,10 +117,10 @@ dataset = Dataset.get(
Pass `auto_create=True`, and a dataset will be created on-the-fly with the input attributes (project name, dataset name,
and tags) if no datasets match the query.
In cases where you use a dataset in a task (e.g. consuming a dataset), you can have its ID stored in the tasks
hyperparameters: pass `alias=<dataset_alias_string>`, and the task using the dataset will store the datasets ID in the
In cases where you use a dataset in a task (e.g. consuming a dataset), you can have its ID stored in the task's
hyperparameters: pass `alias=<dataset_alias_string>`, and the task using the dataset will store the dataset's ID in the
`dataset_alias_string` parameter under the `Datasets` hyperparameters section. This way you can easily track which
dataset the task is using. If you use `alias` with `overridable=True`, you can override the dataset ID from the UIs
dataset the task is using. If you use `alias` with `overridable=True`, you can override the dataset ID from the UI's
**CONFIGURATION > HYPERPARAMETERS >** `Datasets` section, allowing you to change the dataset used when running a task
remotely.
@@ -135,8 +135,8 @@ of an entire dataset. This method downloads the dataset to a specific folder (no
the specified folder already has contents, specify whether to overwrite its contents with the dataset contents, using the `overwrite` parameter.
ClearML supports parallel downloading of datasets. Use the `max_workers` parameter of the `Dataset.get_local_copy` or
`Dataset.get_mutable_copy` methods to specify the number of threads to use when downloading the dataset. By default, its
the number of your machines logical cores.
`Dataset.get_mutable_copy` methods to specify the number of threads to use when downloading the dataset. By default, it's
the number of your machine's logical cores.
## Modifying Datasets
@@ -225,7 +225,7 @@ By default, the dataset uploads to ClearML's file server. This target storage ov
[`Dataset.create`](#creating-datasets) method.
ClearML supports parallel uploading of datasets. Use the `max_workers` parameter to specify the number of threads to use
when uploading the dataset. By default, its the number of your machines logical cores.
when uploading the dataset. By default, it's the number of your machine's logical cores.
Dataset files must be uploaded before a dataset is [finalized](#finalizing-a-dataset).
@@ -317,9 +317,9 @@ You can enable offline mode in one of the following ways:
* Before creating a dataset, set `CLEARML_OFFLINE_MODE=1`
All the datasets information is zipped and is saved locally.
All the dataset's information is zipped and is saved locally.
The dataset task's console output displays the tasks ID and a path to the local dataset folder:
The dataset task's console output displays the task's ID and a path to the local dataset folder:
```
ClearML Task: created new task id=offline-372657bb04444c25a31bc6af86552cc9

View File

@@ -84,7 +84,7 @@ Now that a new dataset is registered, you can consume it!
The [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) script
demonstrates data ingestion using the dataset created in the first script.
The following script gets the dataset and uses [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy)
The following script gets the dataset and uses [`Dataset.get_local_copy()`](../../references/sdk/dataset.md#get_local_copy)
to return a path to the cached, read-only local dataset.
```python