2021-06-20 22:00:16 +00:00
|
|
|
---
|
2021-10-21 06:40:53 +00:00
|
|
|
title: Hyper-Datasets
|
2021-06-20 22:00:16 +00:00
|
|
|
---
|
|
|
|
|
2021-10-21 06:40:53 +00:00
|
|
|
ClearML's Hyper-Datasets are an MLOps-oriented abstraction of your data, which facilitates traceable, reproducible model development
|
2022-09-22 06:42:44 +00:00
|
|
|
through parameterized data access and meta-data version control.
|
2021-06-20 22:00:16 +00:00
|
|
|
|
|
|
|
The basic premise is that a user-formed query is a full representation of the dataset used by the ML/DL process.
|
|
|
|
|
2021-12-27 08:41:43 +00:00
|
|
|
ClearML Enterprise's Hyper-Datasets supports rapid prototyping, creating new opportunities such as:
|
2021-06-20 22:00:16 +00:00
|
|
|
* Hyperparameter optimization of the data itself
|
|
|
|
* QA/QC pipelining
|
|
|
|
* CD/CT (continuous training) during deployment
|
|
|
|
* Enabling complex applications like collaborative (federated) learning.
|
|
|
|
|
|
|
|
|
2021-11-09 12:51:13 +00:00
|
|
|
## Hyper-Dataset Components
|
2021-06-20 22:00:16 +00:00
|
|
|
|
2021-11-09 12:51:13 +00:00
|
|
|
A Hyper-Dataset is composed of the following components:
|
2021-06-20 22:00:16 +00:00
|
|
|
|
|
|
|
* [Frames](frames.md)
|
|
|
|
* [SingleFrames](single_frames.md)
|
|
|
|
* [FrameGroups](frame_groups.md)
|
|
|
|
* [Datasets and Dataset Versions](dataset.md)
|
|
|
|
* [Dataviews](dataviews.md)
|
|
|
|
|
2022-09-22 06:42:44 +00:00
|
|
|
These components interact in a way that enables revising data and tracking and accessing all of its versions.
|
2021-06-20 22:00:16 +00:00
|
|
|
|
2022-09-22 06:42:44 +00:00
|
|
|
Frames are the basic units of data in ClearML Enterprise. SingleFrames and FrameGroups make up a Dataset version.
|
2021-06-20 22:00:16 +00:00
|
|
|
Dataset versions can be created, modified, and removed. The different version are recorded and available,
|
2021-12-27 08:41:43 +00:00
|
|
|
so experiments, and their data are reproducible and traceable.
|
2021-06-20 22:00:16 +00:00
|
|
|
|
|
|
|
Lastly, Dataviews manage views of the dataset with queries, so the input data to an experiment can be defined from a
|
|
|
|
subset of a Dataset or combinations of Datasets.
|