mirror of
https://github.com/clearml/clearml-docs
synced 2025-02-07 13:21:46 +00:00
Data storage clarification (#298)
This commit is contained in:
parent
5476908523
commit
03b8862d70
@ -17,14 +17,17 @@ ClearML Data Management solves two important challenges:
|
||||
**We believe Data is not code**. It should not be stored in a git tree, because progress on datasets is not always linear.
|
||||
Moreover, it can be difficult and inefficient to find on a git tree the commit associated with a certain version of a dataset.
|
||||
|
||||
A `clearml-data` dataset is a collection of files, stored on a central storage location (S3 / GS / Azure / Network Storage).
|
||||
Datasets can be set up to inherit from other datasets, so data lineages can be created,
|
||||
and users can track when and how their data changes.
|
||||
Use ClearML Data to create, manage, and version your datasets. Store your files in any storage location of your choice
|
||||
(S3 / GS / Azure / Network Storage) by setting the dataset’s upload destination (see [`--storage`](clearml_data_cli.md#upload)
|
||||
CLI option or [`output_url`](clearml_data_sdk.md#uploading-files) parameter).
|
||||
|
||||
Dataset changes are stored using differentiable storage, meaning a version will store the change-set from its previous dataset parents.
|
||||
Datasets can be set up to inherit from other datasets, so data lineages can be created, and users can track when and how
|
||||
their data changes. Dataset changes are stored using differentiable storage, meaning a version will store the change-set
|
||||
from its previous dataset parents.
|
||||
|
||||
Local copies of datasets are always cached, so the same data never needs to be downloaded twice.
|
||||
When a dataset is pulled it will automatically pull all parent datasets and merge them into one output folder for you to work with.
|
||||
You can get a local copy of your dataset on any machine. Local copies of datasets are always cached, so the same data
|
||||
never needs to be downloaded twice. When a dataset is pulled it will automatically pull all parent datasets and merge
|
||||
them into one output folder for you to work with.
|
||||
|
||||
The [Dataset Versions](../webapp/datasets/webapp_dataset_viewing.md) page in the web UI displays dataset versions'
|
||||
lineage and content information. See [dataset UI](../webapp/datasets/webapp_dataset_page.md) for more details.
|
||||
|
@ -7,7 +7,10 @@ This page covers `clearml-data`, ClearML's file-based data management solution.
|
||||
See [Hyper-Datasets](../hyperdatasets/overview.md) for ClearML's advanced queryable dataset management solution.
|
||||
:::
|
||||
|
||||
The `clearml-data` utility is a CLI tool for controlling and managing your data with ClearML.
|
||||
`clearml-data` is a data management CLI tool that comes as part of the `clearml` python package. Use `clearml-data` to
|
||||
create, modify, and manage your datasets. You can upload your dataset to any storage service of your choice (S3 / GS /
|
||||
Azure / Network Storage) by setting the dataset’s upload destination (see [`--storage`](#upload)). Once you have uploaded
|
||||
your dataset, you can access it from any machine.
|
||||
|
||||
The following page provides a reference to `clearml-data`'s CLI commands.
|
||||
|
||||
|
@ -7,8 +7,12 @@ This page covers `clearml-data`, ClearML's file-based data management solution.
|
||||
See [Hyper-Datasets](../hyperdatasets/overview.md) for ClearML's advanced queryable dataset management solution.
|
||||
:::
|
||||
|
||||
Datasets can be created, modified, and managed with ClearML Data's python interface. The following page provides an overview
|
||||
for using the most basic methods of the `Dataset` class. See the [Dataset reference page](../references/sdk/dataset.md)
|
||||
Datasets can be created, modified, and managed with ClearML Data's python interface. You can upload your dataset to any
|
||||
storage service of your choice (S3 / GS / Azure / Network Storage) by setting the dataset’s upload destination (see
|
||||
[`output_url`](#uploading-files) parameter of `Dataset.upload` method). Once you have uploaded your dataset, you can access
|
||||
it from any machine.
|
||||
|
||||
The following page provides an overview for using the most basic methods of the `Dataset` class. See the [Dataset reference page](../references/sdk/dataset.md)
|
||||
for a complete list of available methods.
|
||||
|
||||
Import the `Dataset` class, and let's get started!
|
||||
|
Loading…
Reference in New Issue
Block a user