diff --git a/docs/clearml_data/clearml_data_cli.md b/docs/clearml_data/clearml_data_cli.md index 09d7bf70..e262e0c6 100644 --- a/docs/clearml_data/clearml_data_cli.md +++ b/docs/clearml_data/clearml_data_cli.md @@ -18,8 +18,8 @@ Creates a new dataset.
|Name|Description|Optional| |---|---|---| -|`--name` |Dataset's name`| No | -|`--project`|Dataset's project`| No | +|`--name` |Dataset's name| No | +|`--project`|Dataset's project| No | |`--parents`|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| Yes | |`--tags` |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| Yes| diff --git a/docs/clearml_data/clearml_data_sdk.md b/docs/clearml_data/clearml_data_sdk.md index d947ad1c..32613c2d 100644 --- a/docs/clearml_data/clearml_data_sdk.md +++ b/docs/clearml_data/clearml_data_sdk.md @@ -74,14 +74,14 @@ most recent dataset in the specified project, or the most recent dataset with th Once a specific dataset object has been obtained, get a local copy of the dataset using one of the following options: * [`Dataset.get_local_copy()`](../references/sdk/dataset.md#get_local_copy) - get a read-only local copy of an entire dataset. - This method returns a path to the dataset in local cache (downloading the dataset if it is not already in cache) + This method returns a path to the dataset in local cache (downloading the dataset if it is not already in cache). * [`Dataset.get_mutable_local_copy()`](../references/sdk/dataset.md#get_mutable_local_copy) - get a writable local copy of an entire dataset. This method downloads the dataset to a specific folder (non-cached), specified with the `target_folder` parameter. If the specified folder already has contents, specify whether to overwrite its contents with the dataset contents, using the `overwrite` parameter. ## Modifying Datasets -Once a dataset has been created, its contents can be modified and replaced. When your data is changes, you can +Once a dataset has been created, its contents can be modified and replaced. When your data is changed, you can add updated files or remove unnecessary files. ### add_files() diff --git a/docs/clearml_data/data_management_examples/data_man_cifar_classification.md b/docs/clearml_data/data_management_examples/data_man_cifar_classification.md index 48bf65f4..77fda20f 100644 --- a/docs/clearml_data/data_management_examples/data_man_cifar_classification.md +++ b/docs/clearml_data/data_management_examples/data_man_cifar_classification.md @@ -83,7 +83,10 @@ dataset_project = "dataset_examples" from clearml import Dataset -dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy() +dataset_path = Dataset.get( + dataset_name=dataset_name, + dataset_project=dataset_project +).get_local_copy() trainset = datasets.CIFAR10( root=dataset_path, diff --git a/docs/clearml_data/data_management_examples/data_man_folder_sync.md b/docs/clearml_data/data_management_examples/data_man_folder_sync.md index ae2e80c7..825008fe 100644 --- a/docs/clearml_data/data_management_examples/data_man_folder_sync.md +++ b/docs/clearml_data/data_management_examples/data_man_folder_sync.md @@ -8,10 +8,8 @@ This example shows how to use the `clearml-data` folder sync function. from time to time. When the point of truth is updated, users can call `clearml-data sync` and the changes (file addition, modification, or removal) will be reflected in ClearML. -## Creating Initial Version - ## Prerequisites -1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all +1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. It contains all the needed files. 1. Open terminal and change directory to the cloned repository's examples folder diff --git a/docs/clearml_data/data_management_examples/data_man_python.md b/docs/clearml_data/data_management_examples/data_man_python.md index 2766a10f..d017a1b1 100644 --- a/docs/clearml_data/data_management_examples/data_man_python.md +++ b/docs/clearml_data/data_management_examples/data_man_python.md @@ -24,7 +24,9 @@ We first need to obtain a local copy of the CIFAR dataset. from clearml import StorageManager manager = StorageManager() - dataset_path = manager.get_local_copy(remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz") + dataset_path = manager.get_local_copy( + remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz" + ) ``` This script downloads the data and `dataset_path` contains the path to the downloaded data. @@ -83,12 +85,19 @@ demonstrates data ingestion using the dataset created in the first script. dataset_name = "cifar_dataset" dataset_project = "dataset_examples" -dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy() +dataset_path = Dataset.get( + dataset_name=dataset_name, + dataset_project=dataset_project +).get_local_copy() ``` The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy) -method to return a path to the cached, read-only local dataset. If you need a modifiable copy of the dataset, -use `Dataset.get(dataset_name, dataset_project).get_mutable_local_copy(path/to/download)` +method to return a path to the cached, read-only local dataset. + +If you need a modifiable copy of the dataset, use the following code: +```python +Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download") +``` The script then creates a neural network to train a model to classify images from the dataset that was created above. \ No newline at end of file diff --git a/docs/clearml_data/data_management_examples/data_man_simple.md b/docs/clearml_data/data_management_examples/data_man_simple.md index cb6cd233..ca10168d 100644 --- a/docs/clearml_data/data_management_examples/data_man_simple.md +++ b/docs/clearml_data/data_management_examples/data_man_simple.md @@ -5,7 +5,7 @@ title: Data Management from CLI In this example we'll create a simple dataset and demonstrate basic actions on it, using the `clearml-data` CLI. ## Prerequisites -1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all +1. First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. It contains all the needed files. 1. Open terminal and change directory to the cloned repository's examples folder @@ -89,13 +89,13 @@ clearml-data - Dataset Management & Versioning CLI List dataset content: 24d05040f3e14fbfbed8edb1bf08a88c Listing dataset content -file name | size | hash ------------------------------------------------------------------------------------------------------------------------------------------------- -dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a -data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5 -picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c -sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7 -sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b +file name | size | hash +----------------------------------------------------------------------------------------------------------------- +dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a +data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5 +picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c +sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7 +sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b Total 5 files, 248771 bytes ``` diff --git a/docs/clearml_data/data_management_examples/workflows.md b/docs/clearml_data/data_management_examples/workflows.md index d862a287..a0f3fa3f 100644 --- a/docs/clearml_data/data_management_examples/workflows.md +++ b/docs/clearml_data/data_management_examples/workflows.md @@ -4,9 +4,9 @@ title: Workflows Take a look at the ClearML Data examples which demonstrate common workflows using the `clearml-data` CLI and the `Dataset` class: -* [Dataset Management with CLI](data_man_simple.md) - Tutorial for creating, modifying, and consuming dataset with CLI +* [Dataset Management with CLI](data_man_simple.md) - Tutorial for creating, modifying, and consuming dataset with CLI. * [Folder Sync with CLI](data_man_folder_sync.md) - Tutorial for using `clearml-data sync` CLI option to update a dataset according to a local folder. * [Dataset Management with CLI and SDK](data_man_cifar_classification.md) - Tutorial for creating a dataset with the CLI - then programmatically ingesting the data with the SDK + then programmatically ingesting the data with the SDK. * [Data Management with Python](data_man_python.md) - Example scripts for creating and consuming a dataset with the SDK. diff --git a/docs/getting_started/mlops/mlops_best_practices.md b/docs/getting_started/mlops/mlops_best_practices.md index e1aea17a..badd92d7 100644 --- a/docs/getting_started/mlops/mlops_best_practices.md +++ b/docs/getting_started/mlops/mlops_best_practices.md @@ -30,8 +30,8 @@ Once we have a Task in ClearML, we can clone and edit its definitions in the UI, ## Manage Your Data Use [ClearML Data](../../clearml_data/clearml_data.md) to version your data, then link it to running experiments for easy reproduction. -Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure) -ClearML Data supports efficient Dataset storage and caching, differentiable & compressed +Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure). +ClearML Data supports efficient Dataset storage and caching, differentiable & compressed. ## Scale Your Work Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (Remote or local) and manage diff --git a/docs/getting_started/mlops/mlops_first_steps.md b/docs/getting_started/mlops/mlops_first_steps.md index 623dbcb8..31ba8f81 100644 --- a/docs/getting_started/mlops/mlops_first_steps.md +++ b/docs/getting_started/mlops/mlops_first_steps.md @@ -121,7 +121,12 @@ Log as many metrics from your processes! It improves visibility on their progres Use the Logger class from to report scalars and plots. ```python from clearml import Logger -Logger.current_logger().report_scalar(graph='metric', series='variant', value=13.37, iteration=counter) +Logger.current_logger().report_scalar( + graph='metric', + series='variant', + value=13.37, + iteration=counter +) ``` You can later analyze reported scalars @@ -139,7 +144,11 @@ You can also search and query Tasks in the system. Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more! ```python from clearml import Task -tasks = Task.get_tasks(project_name='examples', task_name='partial_name_match', task_filter={'status': 'in_progress'}) +tasks = Task.get_tasks( + project_name='examples', + task_name='partial_name_match', + task_filter={'status': 'in_progress'} +) ``` ## Manage Your Data diff --git a/docs/guides/datasets/data_man_python.md b/docs/guides/datasets/data_man_python.md index 2766a10f..c7851236 100644 --- a/docs/guides/datasets/data_man_python.md +++ b/docs/guides/datasets/data_man_python.md @@ -83,12 +83,19 @@ demonstrates data ingestion using the dataset created in the first script. dataset_name = "cifar_dataset" dataset_project = "dataset_examples" -dataset_path = Dataset.get(dataset_name=dataset_name, dataset_project=dataset_project).get_local_copy() +dataset_path = Dataset.get( + dataset_name=dataset_name, + dataset_project=dataset_project +).get_local_copy() ``` The script above gets the dataset and uses the [`Dataset.get_local_copy`](../../references/sdk/dataset.md#get_local_copy) -method to return a path to the cached, read-only local dataset. If you need a modifiable copy of the dataset, -use `Dataset.get(dataset_name, dataset_project).get_mutable_local_copy(path/to/download)` +method to return a path to the cached, read-only local dataset. + +If you need a modifiable copy of the dataset, use the following: +```python +Dataset.get(dataset_name, dataset_project).get_mutable_local_copy("path/to/download") +``` The script then creates a neural network to train a model to classify images from the dataset that was created above. \ No newline at end of file