Small changes (#113)

This commit is contained in:
pollfly 2021-11-09 15:58:40 +02:00 committed by GitHub
parent 794be47dc6
commit 540ae4d476
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 55 additions and 29 deletions

View File

@ -18,8 +18,8 @@ Creates a new dataset. <br/>
|Name|Description|Optional| |Name|Description|Optional|
|---|---|---| |---|---|---|
|`--name` |Dataset's name`| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | |`--name` |Dataset's name| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|`--project`|Dataset's project`| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | |`--project`|Dataset's project| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|`--parents`|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | |`--parents`|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--tags` |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />| |`--tags` |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|

View File

@ -74,14 +74,14 @@ most recent dataset in the specified project, or the most recent dataset with th
Once a specific dataset object has been obtained, get a local copy of the dataset using one of the following options: Once a specific dataset object has been obtained, get a local copy of the dataset using one of the following options:
* [`Dataset.get_local_copy()`](../references/sdk/dataset.md#get_local_copy) - get a read-only local copy of an entire dataset. * [`Dataset.get_local_copy()`](../references/sdk/dataset.md#get_local_copy) - get a read-only local copy of an entire dataset.
This method returns a path to the dataset in local cache (downloading the dataset if it is not already in cache) This method returns a path to the dataset in local cache (downloading the dataset if it is not already in cache).
* [`Dataset.get_mutable_local_copy()`](../references/sdk/dataset.md#get_mutable_local_copy) - get a writable local copy * [`Dataset.get_mutable_local_copy()`](../references/sdk/dataset.md#get_mutable_local_copy) - get a writable local copy
of an entire dataset. This method downloads the dataset to a specific folder (non-cached), specified with the `target_folder` parameter. If of an entire dataset. This method downloads the dataset to a specific folder (non-cached), specified with the `target_folder` parameter. If
the specified folder already has contents, specify whether to overwrite its contents with the dataset contents, using the `overwrite` parameter. the specified folder already has contents, specify whether to overwrite its contents with the dataset contents, using the `overwrite` parameter.
## Modifying Datasets ## Modifying Datasets
Once a dataset has been created, its contents can be modified and replaced. When your data is changes, you can Once a dataset has been created, its contents can be modified and replaced. When your data is changed, you can
add updated files or remove unnecessary files. add updated files or remove unnecessary files.
### add_files() ### add_files()

View File

@ -83,7 +83,10 @@ dataset_project = "dataset_examples"
from clearml import Dataset from clearml import Dataset
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy() dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project
).get_local_copy()
trainset = datasets.CIFAR10( trainset = datasets.CIFAR10(
root=dataset_path, root=dataset_path,

View File

@ -8,10 +8,8 @@ This example shows how to use the `clearml-data` folder sync function.
from time to time. When the point of truth is updated, users can call `clearml-data sync` and the from time to time. When the point of truth is updated, users can call `clearml-data sync` and the
changes (file addition, modification, or removal) will be reflected in ClearML. changes (file addition, modification, or removal) will be reflected in ClearML.
## Creating Initial Version
## Prerequisites ## Prerequisites
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all 1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. It contains all
the needed files. the needed files.
1. Open terminal and change directory to the cloned repository's examples folder 1. Open terminal and change directory to the cloned repository's examples folder

View File

@ -24,7 +24,9 @@ We first need to obtain a local copy of the CIFAR dataset.
from clearml import StorageManager from clearml import StorageManager
manager = StorageManager() manager = StorageManager()
dataset_path = manager.get_local_copy(remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz") dataset_path = manager.get_local_copy(
remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
)
``` ```
This script downloads the data and `dataset_path` contains the path to the downloaded data. This script downloads the data and `dataset_path` contains the path to the downloaded data.
@ -83,12 +85,19 @@ demonstrates data ingestion using the dataset created in the first script.
dataset_name = "cifar_dataset" dataset_name = "cifar_dataset"
dataset_project = "dataset_examples" dataset_project = "dataset_examples"
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy() dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project
).get_local_copy()
``` ```
The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy) The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy)
method to return a path to the cached, read-only local dataset. If you need a modifiable copy of the dataset, method to return a path to the cached, read-only local dataset.
use `Dataset.get(dataset_name, dataset_project).get_mutable_local_copy(path/to/download)`
If you need a modifiable copy of the dataset, use the following code:
```python
Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download")
```
The script then creates a neural network to train a model to classify images from the dataset that was The script then creates a neural network to train a model to classify images from the dataset that was
created above. created above.

View File

@ -5,7 +5,7 @@ title: Data Management from CLI
In this example we'll create a simple dataset and demonstrate basic actions on it, using the `clearml-data` CLI. In this example we'll create a simple dataset and demonstrate basic actions on it, using the `clearml-data` CLI.
## Prerequisites ## Prerequisites
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all 1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. It contains all
the needed files. the needed files.
1. Open terminal and change directory to the cloned repository's examples folder 1. Open terminal and change directory to the cloned repository's examples folder
@ -89,13 +89,13 @@ clearml-data - Dataset Management & Versioning CLI
List dataset content: 24d05040f3e14fbfbed8edb1bf08a88c List dataset content: 24d05040f3e14fbfbed8edb1bf08a88c
Listing dataset content Listing dataset content
file name | size | hash file name | size | hash
------------------------------------------------------------------------------------------------------------------------------------------------ -----------------------------------------------------------------------------------------------------------------
dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5 data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7 sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
Total 5 files, 248771 bytes Total 5 files, 248771 bytes
``` ```

View File

@ -4,9 +4,9 @@ title: Workflows
Take a look at the ClearML Data examples which demonstrate common workflows using the `clearml-data` CLI and the Take a look at the ClearML Data examples which demonstrate common workflows using the `clearml-data` CLI and the
`Dataset` class: `Dataset` class:
* [Dataset Management with CLI](data_man_simple.md) - Tutorial for creating, modifying, and consuming dataset with CLI * [Dataset Management with CLI](data_man_simple.md) - Tutorial for creating, modifying, and consuming dataset with CLI.
* [Folder Sync with CLI](data_man_folder_sync.md) - Tutorial for using `clearml-data sync` CLI option to update a dataset according * [Folder Sync with CLI](data_man_folder_sync.md) - Tutorial for using `clearml-data sync` CLI option to update a dataset according
to a local folder. to a local folder.
* [Dataset Management with CLI and SDK](data_man_cifar_classification.md) - Tutorial for creating a dataset with the CLI * [Dataset Management with CLI and SDK](data_man_cifar_classification.md) - Tutorial for creating a dataset with the CLI
then programmatically ingesting the data with the SDK then programmatically ingesting the data with the SDK.
* [Data Management with Python](data_man_python.md) - Example scripts for creating and consuming a dataset with the SDK. * [Data Management with Python](data_man_python.md) - Example scripts for creating and consuming a dataset with the SDK.

View File

@ -30,8 +30,8 @@ Once we have a Task in ClearML, we can clone and edit its definitions in the UI,
## Manage Your Data ## Manage Your Data
Use [ClearML Data](../../clearml_data/clearml_data.md) to version your data, then link it to running experiments for easy reproduction. Use [ClearML Data](../../clearml_data/clearml_data.md) to version your data, then link it to running experiments for easy reproduction.
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure) Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure).
ClearML Data supports efficient Dataset storage and caching, differentiable & compressed ClearML Data supports efficient Dataset storage and caching, differentiable & compressed.
## Scale Your Work ## Scale Your Work
Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (Remote or local) and manage Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (Remote or local) and manage

View File

@ -121,7 +121,12 @@ Log as many metrics from your processes! It improves visibility on their progres
Use the Logger class from to report scalars and plots. Use the Logger class from to report scalars and plots.
```python ```python
from clearml import Logger from clearml import Logger
Logger.current_logger().report_scalar(graph='metric', series='variant', value=13.37, iteration=counter) Logger.current_logger().report_scalar(
graph='metric',
series='variant',
value=13.37,
iteration=counter
)
``` ```
You can later analyze reported scalars You can later analyze reported scalars
@ -139,7 +144,11 @@ You can also search and query Tasks in the system.
Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more! Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more!
```python ```python
from clearml import Task from clearml import Task
tasks = Task.get_tasks(project_name='examples', task_name='partial_name_match', task_filter={'status': 'in_progress'}) tasks = Task.get_tasks(
project_name='examples',
task_name='partial_name_match',
task_filter={'status': 'in_progress'}
)
``` ```
## Manage Your Data ## Manage Your Data

View File

@ -83,12 +83,19 @@ demonstrates data ingestion using the dataset created in the first script.
dataset_name = "cifar_dataset" dataset_name = "cifar_dataset"
dataset_project = "dataset_examples" dataset_project = "dataset_examples"
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy() dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project
).get_local_copy()
``` ```
The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy) The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy)
method to return a path to the cached, read-only local dataset. If you need a modifiable copy of the dataset, method to return a path to the cached, read-only local dataset.
use `Dataset.get(dataset_name, dataset_project).get_mutable_local_copy(path/to/download)`
If you need a modifiable copy of the dataset, use the following:
```python
Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download")
```
The script then creates a neural network to train a model to classify images from the dataset that was The script then creates a neural network to train a model to classify images from the dataset that was
created above. created above.