clearml-docs/overview.md at 5a65a2654b1d6687278486efddf423f3e722409c

mirror of https://github.com/clearml/clearml-docs synced 2025-01-31 14:37:18 +00:00

2021-12-27 10:41:43 +02:00

1.4 KiB

Raw Blame History

title
Hyper-Datasets

ClearML's Hyper-Datasets are an MLOps-oriented abstraction of your data, which facilitates traceable, reproducible model development through parametrized data access and meta-data version control.

The basic premise is that a user-formed query is a full representation of the dataset used by the ML/DL process.

ClearML Enterprise's Hyper-Datasets supports rapid prototyping, creating new opportunities such as:

Hyperparameter optimization of the data itself
QA/QC pipelining
CD/CT (continuous training) during deployment
Enabling complex applications like collaborative (federated) learning.

Hyper-Dataset Components

A Hyper-Dataset is composed of the following components:

These components interact in a way that enables revising data and tracking and accessing all of its version.

Frames are the basics units of data in ClearML Enterprise. SingleFrames and FrameGroups make up a Dataset version. Dataset versions can be created, modified, and removed. The different version are recorded and available, so experiments, and their data are reproducible and traceable.

Lastly, Dataviews manage views of the dataset with queries, so the input data to an experiment can be defined from a subset of a Dataset or combinations of Datasets.

1.4 KiB Raw Blame History

Hyper-Dataset Components

1.4 KiB

Raw Blame History