Small edits (#476)

This commit is contained in:
pollfly
2023-02-16 12:17:53 +02:00
committed by GitHub
parent 5458f8036b
commit 2cf096f7ec
27 changed files with 64 additions and 64 deletions

View File

@@ -41,7 +41,7 @@ guide for more info!
## Using ClearML Data
ClearML Data offers two interfaces:
ClearML Data supports two interfaces:
- `clearml-data` - A CLI utility for creating, uploading, and managing datasets. See [CLI](clearml_data_cli.md) for a reference of `clearml-data` commands.
- `clearml.Dataset` - A python interface for creating, retrieving, managing, and using datasets. See [SDK](clearml_data_sdk.md) for an overview of the basic methods of the `Dataset` module.

View File

@@ -2,13 +2,13 @@
title: Dataset Management with CLI and SDK
---
In this tutorial, we are going to manage the CIFAR dataset with `clearml-data` CLI, and then use ClearML's [`Dataset`](../../references/sdk/dataset.md)
In this tutorial, you are going to manage the CIFAR dataset with `clearml-data` CLI, and then use ClearML's [`Dataset`](../../references/sdk/dataset.md)
class to ingest the data.
## Creating the Dataset
### Downloading the Data
Before we can register the CIFAR dataset with `clearml-data`, we need to obtain a local copy of it.
Before registering the CIFAR dataset with `clearml-data`, you need to obtain a local copy of it.
Execute this python script to download the data
```python
@@ -43,7 +43,7 @@ New dataset created id=ee1c35f60f384e65bc800f42f0aca5ec
Where `ee1c35f60f384e65bc800f42f0aca5ec` is the dataset ID.
## Adding Files
Add the files we just downloaded to the dataset:
Add the files that were just downloaded to the dataset:
```
clearml-data add --files <dataset_path>
@@ -72,7 +72,7 @@ In the panel's **CONTENT** tab, you can see a table summarizing version contents
## Using the Dataset
Now that we have a new dataset registered, we can consume it.
Now that you have a new dataset registered, you can consume it.
The [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) example
script demonstrates using the dataset within Python code.
@@ -102,7 +102,7 @@ hyperparameters. Passing `alias=<dataset_alias_string>` stores the datasets I
`dataset_alias_string` parameter in the experiment's **CONFIGURATION > HYPERPARAMETERS > Datasets** section. This way
you can easily track which dataset the task is using.
The Dataset's [`get_local_copy`](../../references/sdk/dataset.md#get_local_copy) method will return a path to the cached,
downloaded dataset. Then we provide the path to PyTorch's dataset object.
The Dataset's [`get_local_copy`](../../references/sdk/dataset.md#get_local_copy) method returns a path to the cached,
downloaded dataset. Then the dataset path is input to PyTorch's `datasets` object.
The script then trains a neural network to classify images using the dataset created above.

View File

@@ -78,4 +78,4 @@ Upload completed (742 bytes)
Dataset closed and finalized
```
We can see that 2 files were added or modified, just as we expected!
See that 2 files were added or modified, just as expected!

View File

@@ -18,7 +18,7 @@ demonstrates how to do the following:
### Downloading the Data
We first need to obtain a local copy of the CIFAR dataset.
You first need to obtain a local copy of the CIFAR dataset.
```python
from clearml import StorageManager
@@ -79,7 +79,7 @@ In the panel's **CONTENT** tab, you can see a table summarizing version contents
## Data Ingestion
Now that we have a new dataset registered, we can consume it!
Now that a new dataset is registered, you can consume it!
The [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) script
demonstrates data ingestion using the dataset created in the first script.

View File

@@ -48,7 +48,7 @@ to captures all files and sub-folders:
:::note
After creating a dataset, we don't have to specify its ID when running commands, such as *add*, *remove* or *list*
After creating a dataset, its ID doesn't need to be specified when running commands, such as `add`, `remove`, or `list`
:::
3. Close the dataset - this command uploads the files. By default, the files are uploaded to the file server, but
@@ -151,7 +151,7 @@ You'll need to input the Dataset ID you received when created the dataset above
clearml-data list --id 8b68686a4af040d081027ba3cf6bbca6
```
And we see that our changes have been made! `new_data.txt` has been added, and `dancing.jpg` has been removed.
And see that the changes have been made! `new_data.txt` has been added, and `dancing.jpg` has been removed.
```
file name | size | hash