Small edits (#526)

2025-05-10 07:31:15 +00:00 · 2023-04-04 16:16:54 +03:00 · 2023-04-04 16:16:54 +03:00 · 3b71c66636
commit 3b71c66636
parent 4700306b9d
21 changed files with 41 additions and 255 deletions
--- a/docs/guides/datasets/data_man_cifar_classification.md
+++ b/docs/guides/datasets/data_man_cifar_classification.md
@ -1,108 +0,0 @@
 ---
 title: Dataset Management with CLI and SDK
 ---
 In this tutorial, you are going to manage the CIFAR dataset with `clearml-data` CLI, and then use ClearML's [`Dataset`](../../references/sdk/dataset.md) 
 class to ingest the data.
 ## Creating the Dataset
 ### Downloading the Data
 Before registering the CIFAR dataset with `clearml-data`, you need to obtain a local copy of it.
 Execute this python script to download the data
 ```python
 from clearml import StorageManager
 manager = StorageManager()
 dataset_path = manager.get_local_copy(
    remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
 )
 # make sure to copy the printed value
 print("COPY THIS DATASET PATH: {}".format(dataset_path))
 ```
 Expected response:
 ```bash
 COPY THIS DATASET PATH: ~/.clearml/cache/storage_manager/global/f2751d3a22ccb78db0e07874912b5c43.cifar-10-python_artifacts_archive_None
 ```
 The script prints the path to the downloaded data. It will be needed later on.
 ### Creating the Dataset
 To create the dataset, execute the following command:
 ```
 clearml-data create --project dataset_examples --name cifar_dataset
 ```
 Expected response:
 ```
 clearml-data - Dataset Management & Versioning CLI 
 Creating a new dataset: 
 New dataset created id=ee1c35f60f384e65bc800f42f0aca5ec
 ```
 Where `ee1c35f60f384e65bc800f42f0aca5ec` is the dataset ID.
 ## Adding Files
 Add the files that were just downloaded to the dataset: 
 ```
 clearml-data add --files <dataset_path>
 ```
 where `dataset_path` is the path that was printed earlier, which denotes the location of the downloaded dataset.
 :::note
 There's no need to specify a `dataset_id`, since the `clearml-data` session stores it.
 :::
 ## Finalizing the Dataset
 Run the [`close`](../../references/sdk/dataset.md#close) command to upload the files (it'll be uploaded to ClearML Server by default):<br/>
 ```
 clearml-data close 
 ```
 This command sets the dataset task's status to *completed*, so it will no longer be modifiable. This ensures future
 reproducibility. 
 Information about the dataset can be viewed in the WebApp, in the dataset's [details panel](../../webapp/datasets/webapp_dataset_viewing.md#version-details-panel). 
 In the panel's **CONTENT** tab, you can see a table summarizing version contents, including file names, file sizes, and hashes.
 ![Dataset content tab](../../img/examples_data_management_cifar_dataset.png)
 ## Using the Dataset
 Now that a new dataset is registered, you can consume it.
 The [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) example 
 script demonstrates using the dataset within Python code.
 ```python
 dataset_name = "cifar_dataset"
 dataset_project = "dataset_examples"
 from clearml import Dataset
 dataset_path = Dataset.get(
    dataset_name=dataset_name, 
    dataset_project=dataset_project,
    alias="Cifar dataset"
 ).get_local_copy()
 trainset = datasets.CIFAR10(
    root=dataset_path,
    train=True,
    download=False,
    transform=transform
 )
 ```
 In cases like this, where you use a dataset in a task, you can have the dataset's ID stored in the task’s 
 hyperparameters. Passing `alias=<dataset_alias_string>` stores the dataset’s ID in the 
 `dataset_alias_string` parameter in the experiment's **CONFIGURATION > HYPERPARAMETERS > Datasets** section. This way 
 you can easily track which dataset the task is using. 
 The Dataset's [`get_local_copy`](../../references/sdk/dataset.md#get_local_copy) method will return a path to the cached, 
 downloaded dataset. Then the dataset path is input to PyTorch's `datasets` object.
 The script then trains a neural network to classify images using the dataset created above.
--- a/docs/guides/datasets/data_man_python.md
+++ b/docs/guides/datasets/data_man_python.md
@ -1,106 +0,0 @@
 ---
 title: Data Management with Python
 ---
 The [dataset_creation.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/dataset_creation.py) and 
 [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) 
 together demonstrate how to use ClearML's [`Dataset`](../../references/sdk/dataset.md) class to create a dataset and 
 subsequently ingest the data. 
 ## Dataset Creation
 The [dataset_creation.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/dataset_creation.py) script 
 demonstrates how to do the following:
 * Create a dataset and add files to it
 * Upload the dataset to the ClearML Server
 * Finalize the dataset
 ### Downloading the Data
 You first need to obtain a local copy of the CIFAR dataset.
 ```python
 from clearml import StorageManager
 manager = StorageManager()
 dataset_path = manager.get_local_copy(
     remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
 )
 ```
 This script downloads the data and `dataset_path` contains the path to the downloaded data. 
 ### Creating the Dataset
 ```python
 from clearml import Dataset
 dataset = Dataset.create(
    dataset_name="cifar_dataset", 
    dataset_project="dataset examples" 
 )
 ```
 This creates a data processing task called `cifar_dataset` in the `dataset examples` project, which
 can be viewed in the WebApp.
 ### Adding Files
 ```python
 dataset.add_files(path=dataset_path)
 ```
 This adds the downloaded files to the current dataset.  
 ### Uploading the Files
 ```python
 dataset.upload()
 ```
 This uploads the dataset to the ClearML Server by default. The dataset's destination can be changed by specifying the 
 target storage with the `output_url` parameter of the [`upload`](../../references/sdk/dataset.md#upload) method. 
 ### Finalizing the Dataset
 Run the [`finalize`](../../references/sdk/dataset.md#finalize) command to close the dataset and set that dataset's tasks
 status to *completed*. The dataset can only be finalized if it doesn't have any pending uploads. 
 ```python
 dataset.finalize()
 ```
 After a dataset has been closed, it can no longer be modified. This ensures future reproducibility. 
 Information about the dataset can be viewed in the WebApp, in the dataset's [details panel](../../webapp/datasets/webapp_dataset_viewing.md#version-details-panel). 
 In the panel's **CONTENT** tab, you can see a table summarizing version contents, including file names, file sizes, and hashes.
 ![Dataset content tab](../../img/examples_data_management_cifar_dataset.png)
 ## Data Ingestion
 Now that a new dataset is registered, you can consume it!
 The [data_ingestion.py](https://github.com/allegroai/clearml/blob/master/examples/datasets/data_ingestion.py) script 
 demonstrates data ingestion using the dataset created in the first script.
 ```python
 dataset_name = "cifar_dataset"
 dataset_project = "dataset_examples"
 dataset_path = Dataset.get(
    dataset_name=dataset_name, 
    dataset_project=dataset_project
 ).get_local_copy()
 ```
 The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy) 
 method to return a path to the cached, read-only local dataset. 
 If you need a modifiable copy of the dataset, use the following: 
 ```python
 Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download")
 ```
 The script then creates a neural network to train a model to classify images from the dataset that was
 created above.
--- a/sidebars.js
+++ b/sidebars.js
@ -128,7 +128,7 @@ module.exports = {
            {'Automation': ['guides/automation/manual_random_param_search_example', 'guides/automation/task_piping']},
            {'ClearML Task': ['guides/clearml-task/clearml_task_tutorial']},
            {'ClearML Agent': ['guides/clearml_agent/executable_exp_containers', 'guides/clearml_agent/exp_environment_containers']},
-            {'Datasets': ['guides/datasets/data_man_cifar_classification', 'guides/datasets/data_man_python']},
+            {'Datasets': ['clearml_data/data_management_examples/data_man_cifar_classification', 'clearml_data/data_management_examples/data_man_python']},
            {'Distributed': ['guides/distributed/distributed_pytorch_example', 'guides/distributed/subprocess_example']},
            {'Docker': ['guides/docker/extra_docker_shell_script']},
            {'Frameworks': [