mirror of
https://github.com/clearml/clearml-docs
synced 2025-04-16 14:02:49 +00:00
Update Clearml Data (#277)
This commit is contained in:
parent
48b70440a8
commit
110e7b5fe7
@ -17,7 +17,8 @@ Creates a new dataset.
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT]
|
clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT]
|
||||||
--name NAME [--tags [TAGS [TAGS ...]]]
|
--name NAME [--version VERSION] [--output-uri OUTPUT_URI]
|
||||||
|
[--tags [TAGS [TAGS ...]]]
|
||||||
```
|
```
|
||||||
|
|
||||||
**Parameters**
|
**Parameters**
|
||||||
@ -28,7 +29,9 @@ clearml-data create [-h] [--parents [PARENTS [PARENTS ...]]] [--project PROJECT]
|
|||||||
|---|---|---|
|
|---|---|---|
|
||||||
|`--name` |Dataset's name| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
|`--name` |Dataset's name| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||||
|`--project`|Dataset's project| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
|`--project`|Dataset's project| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||||
|
|`--version` |Dataset version. If not specified a version will automatically be assigned | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||||
|`--parents`|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
|`--parents`|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||||
|
|`--output-uri`| Sets where dataset and its previews are uploaded to| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--tags` |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--tags` |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
@ -160,8 +163,8 @@ This command also uploads the data and finalizes the dataset automatically.
|
|||||||
```bash
|
```bash
|
||||||
clearml-data sync [-h] [--id ID] [--dataset-folder DATASET_FOLDER] --folder FOLDER
|
clearml-data sync [-h] [--id ID] [--dataset-folder DATASET_FOLDER] --folder FOLDER
|
||||||
[--parents [PARENTS [PARENTS ...]]] [--project PROJECT] [--name NAME]
|
[--parents [PARENTS [PARENTS ...]]] [--project PROJECT] [--name NAME]
|
||||||
[--tags [TAGS [TAGS ...]]] [--storage STORAGE] [--skip-close]
|
[--version VERSION] [--output-uri OUTPUT_URI] [--tags [TAGS [TAGS ...]]]
|
||||||
[--chunk-size CHUNK_SIZE] [--verbose]
|
[--storage STORAGE] [--skip-close] [--chunk-size CHUNK_SIZE] [--verbose]
|
||||||
```
|
```
|
||||||
|
|
||||||
**Parameters**
|
**Parameters**
|
||||||
@ -173,10 +176,11 @@ clearml-data sync [-h] [--id ID] [--dataset-folder DATASET_FOLDER] --folder FOLD
|
|||||||
|`--id`| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
|`--id`| Dataset's ID. Default: previously created / accessed dataset| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||||
|`--dataset-folder`|Dataset base folder to add the files to (default: Dataset root)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--dataset-folder`|Dataset base folder to add the files to (default: Dataset root)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--folder`|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
|`--folder`|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|`--storage`|Remote storage to use for the dataset files. Default: files_server |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--storage`|Remote storage to use for the dataset files. Default: files server |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--parents`|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--parents`|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--project`|If creating a new dataset, specify the dataset's project name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--project`|If creating a new dataset, specify the dataset's project name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--name`|If creating a new dataset, specify the dataset's name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--name`|If creating a new dataset, specify the dataset's name|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|`--version`|Specify the dataset’s version. Default: `1.0.0`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--tags`|Dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--tags`|Dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--skip-close`|Do not auto close dataset after syncing folders|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--skip-close`|Do not auto close dataset after syncing folders|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--chunk-size`| Set dataset artifact upload chunk size in MB. Default 512, (pass -1 for a single chunk). Example: 512, dataset will be split and uploaded in 512 MB chunks. |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
@ -191,7 +195,7 @@ clearml-data sync [-h] [--id ID] [--dataset-folder DATASET_FOLDER] --folder FOLD
|
|||||||
List a dataset's contents.
|
List a dataset's contents.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
clearml-data list [-h] [--id ID] [--project PROJECT] [--name NAME]
|
clearml-data list [-h] [--id ID] [--project PROJECT] [--name NAME] [--version VERSION]
|
||||||
[--filter [FILTER [FILTER ...]]] [--modified]
|
[--filter [FILTER [FILTER ...]]] [--modified]
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -204,6 +208,7 @@ clearml-data list [-h] [--id ID] [--project PROJECT] [--name NAME]
|
|||||||
|`--id`|Dataset ID whose contents will be shown (alternatively, use project / name combination). Default: previously accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--id`|Dataset ID whose contents will be shown (alternatively, use project / name combination). Default: previously accessed dataset|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--project`|Specify dataset project name (if used instead of ID, dataset name is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--project`|Specify dataset project name (if used instead of ID, dataset name is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--name`|Specify dataset name (if used instead of ID, dataset project is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--name`|Specify dataset name (if used instead of ID, dataset project is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|`--version`|Specify dataset version. Default: most recent version |<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--filter`|Filter files based on folder / wildcard. Multiple filters are supported. Example: `folder/date_*.json folder/sub-folder`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--filter`|Filter files based on folder / wildcard. Multiple filters are supported. Example: `folder/date_*.json folder/sub-folder`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--modified`|Only list file changes (add / remove / modify) introduced in this version|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--modified`|Only list file changes (add / remove / modify) introduced in this version|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|
||||||
@ -211,25 +216,103 @@ clearml-data list [-h] [--id ID] [--project PROJECT] [--name NAME]
|
|||||||
|
|
||||||
<br/>
|
<br/>
|
||||||
|
|
||||||
## delete
|
## set-description
|
||||||
|
|
||||||
Delete an entire dataset from ClearML. This can also be used to delete a newly created dataset.
|
Sets the description of an existing dataset.
|
||||||
|
|
||||||
This does not work on datasets with children.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
clearml-data delete [-h] [--id ID] [--force]
|
clearml-data set-description [-h] [--id ID] [--description DESCRIPTION]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
**Parameters**
|
**Parameters**
|
||||||
|
|
||||||
<div className="tbl-cmd">
|
<div className="tbl-cmd">
|
||||||
|
|
||||||
|Name|Description|Optional|
|
|Name|Description|Optional|
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
|`--id`|ID of dataset to be deleted. Default: previously created / accessed dataset that hasn't been finalized yet|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
|`--id`|Dataset’s ID|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|`--force`|Force dataset deletion even if other dataset versions depend on it|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />||
|
|`--description`|Description to be set|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<br/>
|
||||||
|
|
||||||
|
|
||||||
|
## delete
|
||||||
|
|
||||||
|
Deletes dataset(s). Pass any of the attributes of the dataset(s) you want to delete. Multiple datasets matching the
|
||||||
|
request will raise an exception, unless you pass `--entire-dataset` and `--force`. In this case, all matching datasets
|
||||||
|
will be deleted.
|
||||||
|
|
||||||
|
If a dataset is a parent to a dataset(s), you must pass `--force` in order to delete it.
|
||||||
|
|
||||||
|
:::warning
|
||||||
|
Deleting a parent dataset may cause child datasets to lose data!
|
||||||
|
:::
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clearml-data delete [-h] [--id ID] [--project PROJECT] [--name NAME]
|
||||||
|
[--version VERSION] [--force] [--entire-dataset]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parameters**
|
||||||
|
|
||||||
|
<div className="tbl-cmd">
|
||||||
|
|
||||||
|
|Name|Description|Optional|
|
||||||
|
|---|---|---|
|
||||||
|
|`--id`|ID of the dataset to delete (alternatively, use project / name combination).|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|`--project`|Specify dataset project name (if used instead of ID, dataset name is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|`--name`|Specify dataset name (if used instead of ID, dataset project is also required)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|`--version`|Specify dataset version|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|`-–force`|Force dataset deletion even if other dataset versions depend on it. Must also be used if `--entire-dataset` flag is used|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|`--entire-dataset`|Delete all found datasets|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<br/>
|
||||||
|
|
||||||
|
## rename
|
||||||
|
|
||||||
|
Rename a dataset (and all of its versions).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clearml-data rename [-h] --new-name NEW_NAME --project PROJECT --name NAME
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parameters**
|
||||||
|
|
||||||
|
<div className="tbl-cmd">
|
||||||
|
|
||||||
|
|Name|Description|Optional|
|
||||||
|
|---|---|---|
|
||||||
|
|`--new-name`|The new name of the dataset|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|
|`--project`|The project the dataset to be renamed belongs to|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|
|`--name`|The current name of the dataset(s) to be renamed|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<br/>
|
||||||
|
|
||||||
|
|
||||||
|
## move
|
||||||
|
|
||||||
|
Moves a dataset to another project
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clearml-data move [-h] --new-project NEW_PROJECT --project PROJECT --name NAME
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parameters**
|
||||||
|
|
||||||
|
<div className="tbl-cmd">
|
||||||
|
|
||||||
|
|Name|Description|Optional|
|
||||||
|
|---|---|---|
|
||||||
|
|`--new-project`|The new project of the dataset|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|
|`--project`|The current project the dataset to be move belongs to|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|
|`--name`|The name of the dataset to be moved|<img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@ -252,10 +335,10 @@ clearml-data search [-h] [--ids [IDS [IDS ...]]] [--project PROJECT]
|
|||||||
|
|
||||||
|Name|Description|Optional|
|
|Name|Description|Optional|
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
|`--ids`|A list of dataset IDs|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|
|`--ids`|A list of dataset IDs|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--project`|The project name of the datasets|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|
|`--project`|The project name of the datasets|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--name`|A dataset name or a partial name to filter datasets by|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|
|`--name`|A dataset name or a partial name to filter datasets by|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|`--tags`|A list of dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|
|`--tags`|A list of dataset user tags|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
@ -36,7 +36,9 @@ preprocessing code and the resulting dataset are saved in a single task (see `us
|
|||||||
dataset = Dataset.create(
|
dataset = Dataset.create(
|
||||||
dataset_name='dataset name',
|
dataset_name='dataset name',
|
||||||
dataset_project='dataset project',
|
dataset_project='dataset project',
|
||||||
parent_datasets=[PARENT_DS_ID_1, PARENT_DS_ID_2]
|
parent_datasets=[PARENT_DS_ID_1, PARENT_DS_ID_2],
|
||||||
|
dataset_version="1.0",
|
||||||
|
output_uri="gs://bucket-name/folder"
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -45,6 +47,11 @@ To locate a dataset's ID, go to the dataset task's info panel in the [WebApp](..
|
|||||||
to the right of the dataset task name, click `ID` and the dataset ID appears
|
to the right of the dataset task name, click `ID` and the dataset ID appears
|
||||||
:::
|
:::
|
||||||
|
|
||||||
|
Use the `output_uri` parameter to specify a network storage target to upload the dataset files, and associated information
|
||||||
|
(such as previews) to (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://bucket/data`, `file:///mnt/share/data`).
|
||||||
|
By default, the dataset uploads to ClearML's file server. The `output_uri` parameter of [`Dataset.upload`](#uploading-files),
|
||||||
|
and the storage parameter of [`Dataset.sync_folder`](../references/sdk/dataset.md#sync_folder) overrides this parameter’s value.
|
||||||
|
|
||||||
The created dataset inherits the content of the `parent_datasets`. When multiple dataset parents are listed,
|
The created dataset inherits the content of the `parent_datasets`. When multiple dataset parents are listed,
|
||||||
they are merged in order of specification. Each parent overrides any overlapping files from a previous parent dataset.
|
they are merged in order of specification. Each parent overrides any overlapping files from a previous parent dataset.
|
||||||
|
|
||||||
@ -71,15 +78,40 @@ squashed_dataset_2 = Dataset.squash(
|
|||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
In addition, the target storage location for the squashed dataset can be specified using the `output_url` parameter of the
|
In addition, the target storage location for the squashed dataset can be specified using the `output_uri` parameter of the
|
||||||
[`Dataset.squash`](../references/sdk/dataset.md#datasetsquash) method.
|
[`Dataset.squash`](../references/sdk/dataset.md#datasetsquash) method.
|
||||||
|
|
||||||
## Accessing Datasets
|
## Accessing Datasets
|
||||||
Once a dataset has been created and uploaded to a server, the dataset can be accessed programmatically from anywhere.
|
Once a dataset has been created and uploaded to a server, the dataset can be accessed programmatically from anywhere.
|
||||||
|
|
||||||
Use the [`Dataset.get`](../references/sdk/dataset.md#datasetget) class method to access a specific Dataset object, either
|
Use the [`Dataset.get`](../references/sdk/dataset.md#datasetget) class method to access a specific Dataset object, by
|
||||||
with the dataset's ID or with its project and name. If only a project name or tag is provided, the method returns the
|
providing any of the dataset’s following attributes: dataset ID, project, name, tags, and or version. If multiple
|
||||||
most recent dataset in the specified project, or the most recent dataset with the specified tag.
|
datasets match the query, the most recent one is returned.
|
||||||
|
|
||||||
|
```python
|
||||||
|
dataset = Dataset.get(
|
||||||
|
dataset_id=None,
|
||||||
|
dataset_project="Example Project",
|
||||||
|
dataset_name="Example Dataset",
|
||||||
|
dataset_tags="my tag",
|
||||||
|
dataset_version="1.2",
|
||||||
|
only_completed=True,
|
||||||
|
only_published=False,
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Pass `auto_create=True`, and a dataset will be created on-the-fly with the input attributes (project name, dataset name,
|
||||||
|
and tags) if no datasets match the query.
|
||||||
|
|
||||||
|
In cases where you use a dataset in a task (e.g. consuming a dataset), you can have its ID stored in the task’s hyper
|
||||||
|
parameters: pass `alias=<dataset_alias_string>`, and the task using the dataset will store the dataset’s ID in the
|
||||||
|
`dataset_alias_string` parameter under the `Datasets` hyper parameters section. This way you can easily track which
|
||||||
|
dataset the task is using. If you use `alias` with `overridable=True`, you can override the dataset ID from the UI’s
|
||||||
|
**CONFIGURATION > HYPER PARAMETERS >** `Datasets` section, allowing you to change the dataset used when running a task
|
||||||
|
remotely.
|
||||||
|
|
||||||
|
In case you want to get a modifiable dataset, you can get a newly created mutable dataset with the current one as its
|
||||||
|
parent, by passing `writable_copy=True`.
|
||||||
|
|
||||||
Once a specific dataset object has been obtained, get a local copy of the dataset using one of the following options:
|
Once a specific dataset object has been obtained, get a local copy of the dataset using one of the following options:
|
||||||
* [`Dataset.get_local_copy()`](../references/sdk/dataset.md#get_local_copy) - get a read-only local copy of an entire dataset.
|
* [`Dataset.get_local_copy()`](../references/sdk/dataset.md#get_local_copy) - get a read-only local copy of an entire dataset.
|
||||||
@ -168,7 +200,8 @@ dataset.remove_files(dataset_path="*.csv", recursive=True)
|
|||||||
To upload the dataset files to network storage, use the [`Dataset.upload`](../references/sdk/dataset.md#upload) method.
|
To upload the dataset files to network storage, use the [`Dataset.upload`](../references/sdk/dataset.md#upload) method.
|
||||||
|
|
||||||
Use the `output_url` parameter to specify storage target, such as S3 / GS / Azure (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://bucket/data` , `/mnt/share/data`).
|
Use the `output_url` parameter to specify storage target, such as S3 / GS / Azure (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://bucket/data` , `/mnt/share/data`).
|
||||||
By default, the dataset uploads to ClearML's file server.
|
By default, the dataset uploads to ClearML's file server. This target storage overrides the `output_uri` value of the
|
||||||
|
[`Dataset.create`](#creating-datasets) method.
|
||||||
|
|
||||||
ClearML supports parallel uploading of datasets. Use the `max_workers` parameter to specify the number of threads to use
|
ClearML supports parallel uploading of datasets. Use the `max_workers` parameter to specify the number of threads to use
|
||||||
when uploading the dataset. By default, it’s the number of your machine’s logical cores.
|
when uploading the dataset. By default, it’s the number of your machine’s logical cores.
|
||||||
@ -192,3 +225,53 @@ to a specific folder's content changes. Specify the folder to sync with the `loc
|
|||||||
This method is useful in the case where there's a single point of truth, either a local or network folder, that gets updated periodically.
|
This method is useful in the case where there's a single point of truth, either a local or network folder, that gets updated periodically.
|
||||||
The folder changes will be reflected in a new dataset version. This method saves time since you don't have to manually
|
The folder changes will be reflected in a new dataset version. This method saves time since you don't have to manually
|
||||||
update (add / remove) files in a dataset.
|
update (add / remove) files in a dataset.
|
||||||
|
|
||||||
|
## Deleting Datasets
|
||||||
|
Delete a dataset using the [`Dataset.delete`](../references/sdk/dataset.md#datasetdelete) class method. Input any of the
|
||||||
|
attributes of the dataset(s) you want to delete, including ID, project name, version, and/or dataset name. Multiple
|
||||||
|
datasets matching the query will raise an exception, unless you pass `entire_dataset=True` and `force=True`. In this
|
||||||
|
case, all matching datasets will be deleted.
|
||||||
|
|
||||||
|
If a dataset is a parent to a dataset(s), you must pass `force=True` in order to delete it.
|
||||||
|
|
||||||
|
:::warning
|
||||||
|
Deleting a parent dataset may cause child datasets to lose data!
|
||||||
|
:::
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
Dataset.delete(
|
||||||
|
dataset_id=None,
|
||||||
|
dataset_project="example project",
|
||||||
|
dataset_name="example dataset",
|
||||||
|
force=False,
|
||||||
|
dataset_version="3.0",
|
||||||
|
entire_dataset=False
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Renaming Datasets
|
||||||
|
Rename a dataset using the [`Dataset.rename`](../references/sdk/dataset.md#datasetrename) class method. All the datasets
|
||||||
|
with the given `dataset_project` and `dataset_name` will be renamed.
|
||||||
|
|
||||||
|
```python
|
||||||
|
Dataset.rename(
|
||||||
|
new_dataset_name="New name",
|
||||||
|
dataset_project="Example project",
|
||||||
|
dataset_name="Example dataset",
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Moving Datasets to Another Project
|
||||||
|
Move a dataset to another project using the [`Dataset.move_to_project`](../references/sdk/dataset.md#datasetmove_to_projetc)
|
||||||
|
class method. All the datasets with the given `dataset_project` and `dataset_name` will be moved to the new dataset
|
||||||
|
project.
|
||||||
|
|
||||||
|
```python
|
||||||
|
Dataset.move_to_project(
|
||||||
|
new_dataset_project="New project",
|
||||||
|
dataset_project="Example project",
|
||||||
|
dataset_name="Example dataset",
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user