mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Small edits (#476)
This commit is contained in:
@@ -41,7 +41,7 @@ guide for more info!
|
||||
|
||||
## Using ClearML Data
|
||||
|
||||
ClearML Data offers two interfaces:
|
||||
ClearML Data supports two interfaces:
|
||||
- `clearml-data` - A CLI utility for creating, uploading, and managing datasets. See [CLI](clearml_data_cli.md) for a reference of `clearml-data` commands.
|
||||
- `clearml.Dataset` - A python interface for creating, retrieving, managing, and using datasets. See [SDK](clearml_data_sdk.md) for an overview of the basic methods of the `Dataset` module.
|
||||
|
||||
|
||||
@@ -2,13 +2,13 @@
|
||||
title: Dataset Management with CLI and SDK
|
||||
---
|
||||
|
||||
In this tutorial, we are going to manage the CIFAR dataset with `clearml-data` CLI, and then use ClearML's [`Dataset`](../../references/sdk/dataset.md)
|
||||
In this tutorial, you are going to manage the CIFAR dataset with `clearml-data` CLI, and then use ClearML's [`Dataset`](../../references/sdk/dataset.md)
|
||||
class to ingest the data.
|
||||
|
||||
## Creating the Dataset
|
||||
|
||||
### Downloading the Data
|
||||
Before we can register the CIFAR dataset with `clearml-data`, we need to obtain a local copy of it.
|
||||
Before registering the CIFAR dataset with `clearml-data`, you need to obtain a local copy of it.
|
||||
|
||||
Execute this python script to download the data
|
||||
```python
|
||||
@@ -43,7 +43,7 @@ New dataset created id=ee1c35f60f384e65bc800f42f0aca5ec
|
||||
Where `ee1c35f60f384e65bc800f42f0aca5ec` is the dataset ID.
|
||||
|
||||
## Adding Files
|
||||
Add the files we just downloaded to the dataset:
|
||||
Add the files that were just downloaded to the dataset:
|
||||
|
||||
```
|
||||
clearml-data add --files <dataset_path>
|
||||
@@ -72,7 +72,7 @@ In the panel's **CONTENT** tab, you can see a table summarizing version contents
|
||||
|
||||
## Using the Dataset
|
||||
|
||||
Now that we have a new dataset registered, we can consume it.
|
||||
Now that you have a new dataset registered, you can consume it.
|
||||
|
||||
The [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) example
|
||||
script demonstrates using the dataset within Python code.
|
||||
@@ -102,7 +102,7 @@ hyperparameters. Passing `alias=<dataset_alias_string>` stores the dataset’s I
|
||||
`dataset_alias_string` parameter in the experiment's **CONFIGURATION > HYPERPARAMETERS > Datasets** section. This way
|
||||
you can easily track which dataset the task is using.
|
||||
|
||||
The Dataset's [`get_local_copy`](../../references/sdk/dataset.md#get_local_copy) method will return a path to the cached,
|
||||
downloaded dataset. Then we provide the path to PyTorch's dataset object.
|
||||
The Dataset's [`get_local_copy`](../../references/sdk/dataset.md#get_local_copy) method returns a path to the cached,
|
||||
downloaded dataset. Then the dataset path is input to PyTorch's `datasets` object.
|
||||
|
||||
The script then trains a neural network to classify images using the dataset created above.
|
||||
@@ -78,4 +78,4 @@ Upload completed (742 bytes)
|
||||
Dataset closed and finalized
|
||||
```
|
||||
|
||||
We can see that 2 files were added or modified, just as we expected!
|
||||
See that 2 files were added or modified, just as expected!
|
||||
|
||||
@@ -18,7 +18,7 @@ demonstrates how to do the following:
|
||||
|
||||
### Downloading the Data
|
||||
|
||||
We first need to obtain a local copy of the CIFAR dataset.
|
||||
You first need to obtain a local copy of the CIFAR dataset.
|
||||
|
||||
```python
|
||||
from clearml import StorageManager
|
||||
@@ -79,7 +79,7 @@ In the panel's **CONTENT** tab, you can see a table summarizing version contents
|
||||
|
||||
## Data Ingestion
|
||||
|
||||
Now that we have a new dataset registered, we can consume it!
|
||||
Now that a new dataset is registered, you can consume it!
|
||||
|
||||
The [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) script
|
||||
demonstrates data ingestion using the dataset created in the first script.
|
||||
|
||||
@@ -48,7 +48,7 @@ to captures all files and sub-folders:
|
||||
|
||||
|
||||
:::note
|
||||
After creating a dataset, we don't have to specify its ID when running commands, such as *add*, *remove* or *list*
|
||||
After creating a dataset, its ID doesn't need to be specified when running commands, such as `add`, `remove`, or `list`
|
||||
:::
|
||||
|
||||
3. Close the dataset - this command uploads the files. By default, the files are uploaded to the file server, but
|
||||
@@ -151,7 +151,7 @@ You'll need to input the Dataset ID you received when created the dataset above
|
||||
clearml-data list --id 8b68686a4af040d081027ba3cf6bbca6
|
||||
```
|
||||
|
||||
And we see that our changes have been made! `new_data.txt` has been added, and `dancing.jpg` has been removed.
|
||||
And see that the changes have been made! `new_data.txt` has been added, and `dancing.jpg` has been removed.
|
||||
|
||||
```
|
||||
file name | size | hash
|
||||
|
||||
Reference in New Issue
Block a user