Small changes (#113)

This commit is contained in:
pollfly 2021-11-09 15:58:40 +02:00 committed by GitHub
parent 794be47dc6
commit 540ae4d476
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 55 additions and 29 deletions

View File

@ -18,8 +18,8 @@ Creates a new dataset. <br/>
|Name|Description|Optional|
|---|---|---|
|`--name` |Dataset's name`| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|`--project`|Dataset's project`| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|`--name` |Dataset's name| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|`--project`|Dataset's project| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|`--parents`|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|`--tags` |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|

View File

@ -74,14 +74,14 @@ most recent dataset in the specified project, or the most recent dataset with th
Once a specific dataset object has been obtained, get a local copy of the dataset using one of the following options:
* [`Dataset.get_local_copy()`](../references/sdk/dataset.md#get_local_copy) - get a read-only local copy of an entire dataset.
This method returns a path to the dataset in local cache (downloading the dataset if it is not already in cache)
This method returns a path to the dataset in local cache (downloading the dataset if it is not already in cache).
* [`Dataset.get_mutable_local_copy()`](../references/sdk/dataset.md#get_mutable_local_copy) - get a writable local copy
of an entire dataset. This method downloads the dataset to a specific folder (non-cached), specified with the `target_folder` parameter. If
the specified folder already has contents, specify whether to overwrite its contents with the dataset contents, using the `overwrite` parameter.
## Modifying Datasets
Once a dataset has been created, its contents can be modified and replaced. When your data is changes, you can
Once a dataset has been created, its contents can be modified and replaced. When your data is changed, you can
add updated files or remove unnecessary files.
### add_files()

View File

@ -83,7 +83,10 @@ dataset_project = "dataset_examples"
from clearml import Dataset
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy()
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project
).get_local_copy()
trainset = datasets.CIFAR10(
root=dataset_path,

View File

@ -8,10 +8,8 @@ This example shows how to use the `clearml-data` folder sync function.
from time to time. When the point of truth is updated, users can call `clearml-data sync` and the
changes (file addition, modification, or removal) will be reflected in ClearML.
## Creating Initial Version
## Prerequisites
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. It contains all
the needed files.
1. Open terminal and change directory to the cloned repository's examples folder

View File

@ -24,7 +24,9 @@ We first need to obtain a local copy of the CIFAR dataset.
from clearml import StorageManager
manager = StorageManager()
dataset_path = manager.get_local_copy(remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz")
dataset_path = manager.get_local_copy(
remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
)
```
This script downloads the data and `dataset_path` contains the path to the downloaded data.
@ -83,12 +85,19 @@ demonstrates data ingestion using the dataset created in the first script.
dataset_name = "cifar_dataset"
dataset_project = "dataset_examples"
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy()
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project
).get_local_copy()
```
The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy)
method to return a path to the cached, read-only local dataset. If you need a modifiable copy of the dataset,
use `Dataset.get(dataset_name, dataset_project).get_mutable_local_copy(path/to/download)`
method to return a path to the cached, read-only local dataset.
If you need a modifiable copy of the dataset, use the following code:
```python
Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download")
```
The script then creates a neural network to train a model to classify images from the dataset that was
created above.

View File

@ -5,7 +5,7 @@ title: Data Management from CLI
In this example we'll create a simple dataset and demonstrate basic actions on it, using the `clearml-data` CLI.
## Prerequisites
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. It contains all
the needed files.
1. Open terminal and change directory to the cloned repository's examples folder
@ -89,13 +89,13 @@ clearml-data - Dataset Management & Versioning CLI
List dataset content: 24d05040f3e14fbfbed8edb1bf08a88c
Listing dataset content
file name | size | hash
------------------------------------------------------------------------------------------------------------------------------------------------
dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
file name | size | hash
-----------------------------------------------------------------------------------------------------------------
dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
Total 5 files, 248771 bytes
```

View File

@ -4,9 +4,9 @@ title: Workflows
Take a look at the ClearML Data examples which demonstrate common workflows using the `clearml-data` CLI and the
`Dataset` class:
* [Dataset Management with CLI](data_man_simple.md) - Tutorial for creating, modifying, and consuming dataset with CLI
* [Dataset Management with CLI](data_man_simple.md) - Tutorial for creating, modifying, and consuming dataset with CLI.
* [Folder Sync with CLI](data_man_folder_sync.md) - Tutorial for using `clearml-data sync` CLI option to update a dataset according
to a local folder.
* [Dataset Management with CLI and SDK](data_man_cifar_classification.md) - Tutorial for creating a dataset with the CLI
then programmatically ingesting the data with the SDK
then programmatically ingesting the data with the SDK.
* [Data Management with Python](data_man_python.md) - Example scripts for creating and consuming a dataset with the SDK.

View File

@ -30,8 +30,8 @@ Once we have a Task in ClearML, we can clone and edit its definitions in the UI,
## Manage Your Data
Use [ClearML Data](../../clearml_data/clearml_data.md) to version your data, then link it to running experiments for easy reproduction.
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure)
ClearML Data supports efficient Dataset storage and caching, differentiable & compressed
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure).
ClearML Data supports efficient Dataset storage and caching, differentiable & compressed.
## Scale Your Work
Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (Remote or local) and manage

View File

@ -121,7 +121,12 @@ Log as many metrics from your processes! It improves visibility on their progres
Use the Logger class from to report scalars and plots.
```python
from clearml import Logger
Logger.current_logger().report_scalar(graph='metric', series='variant', value=13.37, iteration=counter)
Logger.current_logger().report_scalar(
graph='metric',
series='variant',
value=13.37,
iteration=counter
)
```
You can later analyze reported scalars
@ -139,7 +144,11 @@ You can also search and query Tasks in the system.
Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more!
```python
from clearml import Task
tasks = Task.get_tasks(project_name='examples', task_name='partial_name_match', task_filter={'status': 'in_progress'})
tasks = Task.get_tasks(
project_name='examples',
task_name='partial_name_match',
task_filter={'status': 'in_progress'}
)
```
## Manage Your Data

View File

@ -83,12 +83,19 @@ demonstrates data ingestion using the dataset created in the first script.
dataset_name = "cifar_dataset"
dataset_project = "dataset_examples"
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy()
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project
).get_local_copy()
```
The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy)
method to return a path to the cached, read-only local dataset. If you need a modifiable copy of the dataset,
use `Dataset.get(dataset_name, dataset_project).get_mutable_local_copy(path/to/download)`
method to return a path to the cached, read-only local dataset.
If you need a modifiable copy of the dataset, use the following:
```python
Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download")
```
The script then creates a neural network to train a model to classify images from the dataset that was
created above.