clearml-docs/docs/hyperdatasets/overview.md

34 lines
1.4 KiB
Markdown
Raw Normal View History

2021-06-20 22:00:16 +00:00
---
title: Hyper-Datasets
2021-06-20 22:00:16 +00:00
---
ClearML's Hyper-Datasets are an MLOps-oriented abstraction of your data, which facilitates traceable, reproducible model development
2022-09-22 06:42:44 +00:00
through parameterized data access and meta-data version control.
2021-06-20 22:00:16 +00:00
The basic premise is that a user-formed query is a full representation of the dataset used by the ML/DL process.
2021-12-27 08:41:43 +00:00
ClearML Enterprise's Hyper-Datasets supports rapid prototyping, creating new opportunities such as:
2021-06-20 22:00:16 +00:00
* Hyperparameter optimization of the data itself
* QA/QC pipelining
* CD/CT (continuous training) during deployment
* Enabling complex applications like collaborative (federated) learning.
2021-11-09 12:51:13 +00:00
## Hyper-Dataset Components
2021-06-20 22:00:16 +00:00
2021-11-09 12:51:13 +00:00
A Hyper-Dataset is composed of the following components:
2021-06-20 22:00:16 +00:00
* [Frames](frames.md)
* [SingleFrames](single_frames.md)
* [FrameGroups](frame_groups.md)
* [Datasets and Dataset Versions](dataset.md)
* [Dataviews](dataviews.md)
2022-09-22 06:42:44 +00:00
These components interact in a way that enables revising data and tracking and accessing all of its versions.
2021-06-20 22:00:16 +00:00
2022-09-22 06:42:44 +00:00
Frames are the basic units of data in ClearML Enterprise. SingleFrames and FrameGroups make up a Dataset version.
2021-06-20 22:00:16 +00:00
Dataset versions can be created, modified, and removed. The different version are recorded and available,
2021-12-27 08:41:43 +00:00
so experiments, and their data are reproducible and traceable.
2021-06-20 22:00:16 +00:00
Lastly, Dataviews manage views of the dataset with queries, so the input data to an experiment can be defined from a
subset of a Dataset or combinations of Datasets.