mirror of
https://github.com/clearml/clearml-docs
synced 2025-04-01 08:14:42 +00:00
Small changes (#113)
This commit is contained in:
parent
794be47dc6
commit
540ae4d476
@ -18,8 +18,8 @@ Creates a new dataset. <br/>
|
||||
|
||||
|Name|Description|Optional|
|
||||
|---|---|---|
|
||||
|`--name` |Dataset's name`| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||
|`--project`|Dataset's project`| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||
|`--name` |Dataset's name| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||
|`--project`|Dataset's project| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||
|`--parents`|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
|`--tags` |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|
||||
|
@ -74,14 +74,14 @@ most recent dataset in the specified project, or the most recent dataset with th
|
||||
|
||||
Once a specific dataset object has been obtained, get a local copy of the dataset using one of the following options:
|
||||
* [`Dataset.get_local_copy()`](../references/sdk/dataset.md#get_local_copy) - get a read-only local copy of an entire dataset.
|
||||
This method returns a path to the dataset in local cache (downloading the dataset if it is not already in cache)
|
||||
This method returns a path to the dataset in local cache (downloading the dataset if it is not already in cache).
|
||||
* [`Dataset.get_mutable_local_copy()`](../references/sdk/dataset.md#get_mutable_local_copy) - get a writable local copy
|
||||
of an entire dataset. This method downloads the dataset to a specific folder (non-cached), specified with the `target_folder` parameter. If
|
||||
the specified folder already has contents, specify whether to overwrite its contents with the dataset contents, using the `overwrite` parameter.
|
||||
|
||||
## Modifying Datasets
|
||||
|
||||
Once a dataset has been created, its contents can be modified and replaced. When your data is changes, you can
|
||||
Once a dataset has been created, its contents can be modified and replaced. When your data is changed, you can
|
||||
add updated files or remove unnecessary files.
|
||||
|
||||
### add_files()
|
||||
|
@ -83,7 +83,10 @@ dataset_project = "dataset_examples"
|
||||
|
||||
from clearml import Dataset
|
||||
|
||||
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy()
|
||||
dataset_path = Dataset.get(
|
||||
dataset_name=dataset_name,
|
||||
dataset_project=dataset_project
|
||||
).get_local_copy()
|
||||
|
||||
trainset = datasets.CIFAR10(
|
||||
root=dataset_path,
|
||||
|
@ -8,10 +8,8 @@ This example shows how to use the `clearml-data` folder sync function.
|
||||
from time to time. When the point of truth is updated, users can call `clearml-data sync` and the
|
||||
changes (file addition, modification, or removal) will be reflected in ClearML.
|
||||
|
||||
## Creating Initial Version
|
||||
|
||||
## Prerequisites
|
||||
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all
|
||||
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. It contains all
|
||||
the needed files.
|
||||
|
||||
1. Open terminal and change directory to the cloned repository's examples folder
|
||||
|
@ -24,7 +24,9 @@ We first need to obtain a local copy of the CIFAR dataset.
|
||||
from clearml import StorageManager
|
||||
|
||||
manager = StorageManager()
|
||||
dataset_path = manager.get_local_copy(remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz")
|
||||
dataset_path = manager.get_local_copy(
|
||||
remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
|
||||
)
|
||||
```
|
||||
|
||||
This script downloads the data and `dataset_path` contains the path to the downloaded data.
|
||||
@ -83,12 +85,19 @@ demonstrates data ingestion using the dataset created in the first script.
|
||||
dataset_name = "cifar_dataset"
|
||||
dataset_project = "dataset_examples"
|
||||
|
||||
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy()
|
||||
dataset_path = Dataset.get(
|
||||
dataset_name=dataset_name,
|
||||
dataset_project=dataset_project
|
||||
).get_local_copy()
|
||||
```
|
||||
|
||||
The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy)
|
||||
method to return a path to the cached, read-only local dataset. If you need a modifiable copy of the dataset,
|
||||
use `Dataset.get(dataset_name, dataset_project).get_mutable_local_copy(path/to/download)`
|
||||
method to return a path to the cached, read-only local dataset.
|
||||
|
||||
If you need a modifiable copy of the dataset, use the following code:
|
||||
```python
|
||||
Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download")
|
||||
```
|
||||
|
||||
The script then creates a neural network to train a model to classify images from the dataset that was
|
||||
created above.
|
@ -5,7 +5,7 @@ title: Data Management from CLI
|
||||
In this example we'll create a simple dataset and demonstrate basic actions on it, using the `clearml-data` CLI.
|
||||
|
||||
## Prerequisites
|
||||
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all
|
||||
1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. It contains all
|
||||
the needed files.
|
||||
1. Open terminal and change directory to the cloned repository's examples folder
|
||||
|
||||
@ -89,13 +89,13 @@ clearml-data - Dataset Management & Versioning CLI
|
||||
|
||||
List dataset content: 24d05040f3e14fbfbed8edb1bf08a88c
|
||||
Listing dataset content
|
||||
file name | size | hash
|
||||
------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a
|
||||
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
|
||||
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
|
||||
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
|
||||
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
|
||||
file name | size | hash
|
||||
-----------------------------------------------------------------------------------------------------------------
|
||||
dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a
|
||||
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
|
||||
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
|
||||
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
|
||||
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
|
||||
Total 5 files, 248771 bytes
|
||||
```
|
||||
|
||||
|
@ -4,9 +4,9 @@ title: Workflows
|
||||
|
||||
Take a look at the ClearML Data examples which demonstrate common workflows using the `clearml-data` CLI and the
|
||||
`Dataset` class:
|
||||
* [Dataset Management with CLI](data_man_simple.md) - Tutorial for creating, modifying, and consuming dataset with CLI
|
||||
* [Dataset Management with CLI](data_man_simple.md) - Tutorial for creating, modifying, and consuming dataset with CLI.
|
||||
* [Folder Sync with CLI](data_man_folder_sync.md) - Tutorial for using `clearml-data sync` CLI option to update a dataset according
|
||||
to a local folder.
|
||||
* [Dataset Management with CLI and SDK](data_man_cifar_classification.md) - Tutorial for creating a dataset with the CLI
|
||||
then programmatically ingesting the data with the SDK
|
||||
then programmatically ingesting the data with the SDK.
|
||||
* [Data Management with Python](data_man_python.md) - Example scripts for creating and consuming a dataset with the SDK.
|
||||
|
@ -30,8 +30,8 @@ Once we have a Task in ClearML, we can clone and edit its definitions in the UI,
|
||||
|
||||
## Manage Your Data
|
||||
Use [ClearML Data](../../clearml_data/clearml_data.md) to version your data, then link it to running experiments for easy reproduction.
|
||||
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure)
|
||||
ClearML Data supports efficient Dataset storage and caching, differentiable & compressed
|
||||
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure).
|
||||
ClearML Data supports efficient Dataset storage and caching, differentiable & compressed.
|
||||
|
||||
## Scale Your Work
|
||||
Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (Remote or local) and manage
|
||||
|
@ -121,7 +121,12 @@ Log as many metrics from your processes! It improves visibility on their progres
|
||||
Use the Logger class from to report scalars and plots.
|
||||
```python
|
||||
from clearml import Logger
|
||||
Logger.current_logger().report_scalar(graph='metric', series='variant', value=13.37, iteration=counter)
|
||||
Logger.current_logger().report_scalar(
|
||||
graph='metric',
|
||||
series='variant',
|
||||
value=13.37,
|
||||
iteration=counter
|
||||
)
|
||||
```
|
||||
|
||||
You can later analyze reported scalars
|
||||
@ -139,7 +144,11 @@ You can also search and query Tasks in the system.
|
||||
Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more!
|
||||
```python
|
||||
from clearml import Task
|
||||
tasks = Task.get_tasks(project_name='examples', task_name='partial_name_match', task_filter={'status': 'in_progress'})
|
||||
tasks = Task.get_tasks(
|
||||
project_name='examples',
|
||||
task_name='partial_name_match',
|
||||
task_filter={'status': 'in_progress'}
|
||||
)
|
||||
```
|
||||
|
||||
## Manage Your Data
|
||||
|
@ -83,12 +83,19 @@ demonstrates data ingestion using the dataset created in the first script.
|
||||
dataset_name = "cifar_dataset"
|
||||
dataset_project = "dataset_examples"
|
||||
|
||||
dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy()
|
||||
dataset_path = Dataset.get(
|
||||
dataset_name=dataset_name,
|
||||
dataset_project=dataset_project
|
||||
).get_local_copy()
|
||||
```
|
||||
|
||||
The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy)
|
||||
method to return a path to the cached, read-only local dataset. If you need a modifiable copy of the dataset,
|
||||
use `Dataset.get(dataset_name, dataset_project).get_mutable_local_copy(path/to/download)`
|
||||
method to return a path to the cached, read-only local dataset.
|
||||
|
||||
If you need a modifiable copy of the dataset, use the following:
|
||||
```python
|
||||
Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download")
|
||||
```
|
||||
|
||||
The script then creates a neural network to train a model to classify images from the dataset that was
|
||||
created above.
|
Loading…
Reference in New Issue
Block a user