Update docs (#153)

* add info for clear filter, float behavior, and tag exclusion

* add data examples to guides

* fix typo

* fix filtering wording, and float admonition title

* add abort all children action

* dataset metadata initial

* fix expand icon

* initial new exp comparison window

* initial scalar full screen

* fix typo

* edit dataset version metada

* edit exp comparison and add missing alts to icons

* add info about dataset-level datasets

* add info about full screen mode

* HOCON >>> JSON

* HOCON >>> JSON

* dataset card edits

* edit fullscreen scalar mode

* edit fullscreen scalar plots

* edit pipeline example based on code fixes

* full screen scalar edit

* add context navigation

* Add experiment selection info

* experiment selection edit

* custom ui plugin

* datasets and versioning

* fix link
This commit is contained in:
pollfly
2022-01-10 11:45:29 +02:00
committed by GitHub
parent 8339e4902f
commit eb91aaa361
14 changed files with 416 additions and 140 deletions

View File

@@ -9,25 +9,11 @@ and functionality for the following purposes:
* Integrating the powerful features of [Dataviews](dataviews.md) with an experiment
* [Annotating](webapp/webapp_datasets_frames.md#annotations) images and videos
Datasets consist of versions with SingleFrames and / or FrameGroups. Each Dataset can contain multiple versions, where
each version can have multiple children that inherit their parent's SingleFrames and / or FrameGroups. This inheritance
includes the frame metadata and data connecting the source data to the ClearML Enterprise platform, as well as the other
metadata and data.
Datasets consist of versions with SingleFrames and / or FrameGroups. Each Dataset can contain multiple versions, which
can have multiple children that inherit their parent's contents.
These parent-child version relationships can be represented as version trees with a root-level parent. A Dataset
can contain one or more trees.
Mask-labels can be defined globally, for a DatasetVersion, which will be applied to all masks in that version.
## Dataset Version State
Dataset versions can have either **Draft** or **Published** status.
A **Draft** version is editable, so frames can be added to and deleted and / or modified from the Dataset.
A **Published** version is read-only, which ensures reproducible experiments and preserves a version of a Dataset.
Child versions can only be created from *Published* versions. To create a child of a *Draft* Dataset version,
it must be published first.
Mask-labels can be defined globally, for a DatasetVersion. When defined this way, they will be applied to all masks in
that version.
## Example Datasets
@@ -123,19 +109,31 @@ Dataset.delete(dataset_name='MyDataset', delete_all_versions=True, force=True)
Dataset versioning refers to the group of ClearML Enterprise SDK and WebApp (UI) features for creating,
modifying, and deleting Dataset versions.
ClearML Enterprise supports simple and sophisticated Dataset versioning, including **simple version structures** and
**advanced version structures**.
ClearML Enterprise supports simple and advanced Dataset versioning paradigms. A **simple version structure** consists of
a single evolving version, with historic static snapshots. Continuously push your changes to your single dataset version,
and take a snapshot to record the content of your dataset at a specific point in time.
In a **simple version structure**, a parent can have one and only one child, and the last child in the Dataset versions
tree must be a *Draft*. This simple structure allows working with a single set of versions of a Dataset. Create children
and publish versions to preserve data history. Each version whose status is *Published* in a simple version structure is
referred to as a **snapshot**.
You can, alternatively, employ any **advanced structure**, where each version evolves in parallel, and you control which
versions are locked for further changes and which can be modified. See details [below](#dataset-version-structure).
In an **advanced version structure**, at least one parent has more than one child (this can include more than one parent
version at the root level), or the last child in the Dataset versions tree is *Published*.
## Dataset Version State
Dataset versions can have either *Draft* or *Published* state.
A *Draft* version is editable, so frames can be added to and deleted and / or modified.
A *Published* version is read-only, which ensures reproducible experiments and preserves the Dataset version contents.
Child versions can only be created from *Published* versions, as they inherit their predecessor version contents.
## Dataset Version Structure
To implement a simple version structure, where the dataset is ever evolving, with a linear set of historic snapshots,
a parent version can have one and only one child, with the last child in the Dataset versions tree in *Draft* state.
Different version structures, such as where at least one parent has more than one child, or the single last child in the
Dataset versions tree is *Published* are considered advanced version structures.
For details about programmatically implementing simple and advanced version structures, see [Creating Snapshots](#creating-snapshots)
and [Creating Child Versions](#creating-child-versions) respectively below.
Creating a version in a simple version structure may convert it to an advanced structure. This happens when creating
a Dataset version that yields a parent with two children, or when publishing the last child version.
## DatasetVersion Usage