Update Hyper-Dataset mask information (#532)
@ -5,15 +5,14 @@ title: Datasets and Dataset Versions
|
||||
ClearML Enterprise's **Datasets** and **Dataset versions** provide the internal data structure
|
||||
and functionality for the following purposes:
|
||||
* Connecting source data to the ClearML Enterprise platform
|
||||
* Using ClearML Enterprise's GIT-like [Dataset versioning](#dataset-versioning)
|
||||
* Using ClearML Enterprise's Git-like [Dataset versioning](#dataset-versioning)
|
||||
* Integrating the powerful features of [Dataviews](dataviews.md) with an experiment
|
||||
* [Annotating](webapp/webapp_datasets_frames.md#annotations) images and videos
|
||||
|
||||
Datasets consist of versions with SingleFrames and/or FrameGroups. Each Dataset can contain multiple versions, which
|
||||
can have multiple children that inherit their parent's contents.
|
||||
|
||||
Mask-labels can be defined globally, for a DatasetVersion. When defined this way, they will be applied to all masks in
|
||||
that version.
|
||||
Mask-labels are defined at the DatasetVersion level, and are applied to all masks in a DatasetVersion.
|
||||
|
||||
## Example Datasets
|
||||
|
||||
|
@ -2,241 +2,152 @@
|
||||
title: Masks
|
||||
---
|
||||
|
||||
When applicable, [`sources`](sources.md) contains `masks`, a list of dictionaries used to connect a special type of
|
||||
source data to the ClearML Enterprise platform. That source data is a **mask**.
|
||||
Masks are source data used in deep learning for image segmentation. Mask URIs are a property of a SingleFrame.
|
||||
|
||||
Masks are used in deep learning for semantic segmentation.
|
||||
ClearML applies the masks in one of two modes:
|
||||
* [Pixel segmentation](#pixel-segmentation-masks) - Pixel RGB values are each mapped to segmentation labels.
|
||||
* [Alpha channel](#alpha-channel-masks) - Pixel RGB values are interpreted as opacity levels.
|
||||
|
||||
Masks correspond to raw data where the objects to be detected are marked with colors in the masks. The colors
|
||||
are RGB values and represent the objects that are labeled for segmentation.
|
||||
In the WebApp's [frame viewer](webapp/webapp_datasets_frames.md#frame-viewer), you can select how to apply a mask over
|
||||
a frame.
|
||||
|
||||
In frames used for semantic segmentation, the metadata connecting the mask files / images to the ClearML Enterprise platform,
|
||||
and the RGB values and labels used for segmentation are separate. They are contained in two different dictionaries of
|
||||
a SingleFrame:
|
||||
## Pixel Segmentation Masks
|
||||
For pixel segmentation, mask RGB pixel values are mapped to labels.
|
||||
|
||||
* **`masks`** (plural) is in [`sources`](sources.md) and contains the mask files / images `URI` (in addition to other keys
|
||||
and values).
|
||||
Mask-label mapping is defined at the dataset level, through the `mask_labels` property in a version's metadata.
|
||||
|
||||
* **`mask`** (singular) is in the `rois` array of a Frame.
|
||||
|
||||
Each `rois` dictionary contains:
|
||||
`mask_labels` is a list of dictionaries, where each dictionary includes the following keys:
|
||||
* `value` - Mask's RGB pixel value
|
||||
* `labels` - Label associated with the value.
|
||||
|
||||
* RGB values and labels of a **mask** (in addition to other keys and values)
|
||||
See how to manage dataset version mask labels pythonically [here](dataset.md#managing-version-mask-labels).
|
||||
|
||||
* Metadata and data for the labeled area of an image
|
||||
|
||||
|
||||
See [Example 1](#example-1), which shows `masks` in `sources`, `mask` in `rois`, and the key-value pairs used to relate
|
||||
a mask to its source in a frame.
|
||||
In the UI, you can view the mapping in a dataset version's [Metadata](webapp/webapp_datasets_versioning.md#metadata) tab.
|
||||
|
||||

|
||||
|
||||
## Masks Structure
|
||||
When viewing a frame with a mask corresponding with the version’s mask-label mapping, the UI arbitrarily assigns a color
|
||||
to each label . The color assignment can be [customized](webapp/webapp_datasets_frames.md#labels).
|
||||
|
||||
The chart below explains the keys and values of the `masks` dictionary (in the [`sources`](sources.md)
|
||||
section of a Frame).
|
||||
For example:
|
||||
* Original frame image:
|
||||
|
||||
|Key|Value Description|
|
||||
|---|----|
|
||||
|`id`|**Type**: integer. <ul><li> The ID is used to relate this mask data source to the `mask` dictionary containing the label and RGB value for the mask.</li><li> See the `mask` key in `rois`.</li></ul>|
|
||||
|`content_type`| **Type**: string. <ul><li> Type of mask data. For example, image / png or video / mp4.</li></ul>|
|
||||
|`timestamp`|**Type**: integer. <ul><li>For images from a video, indicates the absolute position of the frame from the source (video) </li><li> For still images, set this to 0 (for example, video from a camera on a car, at 30 frames per second, would have a timestamp of 0 for the first frame, and 33 for the second frame).</li></ul>|
|
||||
|`uri`|**Type**: string. <ul><li> URI of the mask file / image.</li></ul>|
|
||||

|
||||
|
||||
* Frame image with the semantic segmentation mask enabled. Labels are applied according to the dataset version’s
|
||||
mask-label mapping:
|
||||
|
||||
## Examples
|
||||
### Example 1
|
||||

|
||||
|
||||
This example demonstrates an original image, its masks, and its frame containing
|
||||
the `sources` and ROI metadata.
|
||||
The frame's sources array contains a masks list of dictionaries that looks something like this:
|
||||
|
||||
<Collapsible type="info" title="Example 1: View the frame">
|
||||
This frame contains the `masks` list of dictionaries in `sources`,
|
||||
and the `rois` array, as well as several top-level key-value pairs.
|
||||
|
||||
|
||||
```json
|
||||
```editorconfig
|
||||
{
|
||||
"timestamp": 1234567889,
|
||||
"context_id": "car_1",
|
||||
"meta": {
|
||||
"velocity": "60"
|
||||
},
|
||||
"sources": [
|
||||
{
|
||||
"id": "front",
|
||||
"content_type": "video/mp4",
|
||||
"width": 800,
|
||||
"height": 600,
|
||||
"uri": "https://s3.amazonaws.com/my_cars/car_1/front.mp4",
|
||||
"timestamp": 1234567889,
|
||||
"meta" :{
|
||||
"angle":45,
|
||||
"fov":129
|
||||
},
|
||||
"masks": [
|
||||
{
|
||||
"id": "seg",
|
||||
"content_type": "video/mp4",
|
||||
"uri": "https://s3.amazonaws.com/seg_masks/car_1/front_seg.mp4",
|
||||
"timestamp": 123456789
|
||||
},
|
||||
{
|
||||
"id": "seg_instance",
|
||||
"content_type": "video/mp4",
|
||||
"uri": "https://s3.amazonaws.com/seg_masks/car_1/front_instance_seg.mp4",
|
||||
"timestamp": 123456789
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"rois": [
|
||||
{
|
||||
"sources":["front"],
|
||||
"label": ["seg"],
|
||||
"mask": {
|
||||
"id": "car",
|
||||
"value": [210,210,120]
|
||||
}
|
||||
},
|
||||
{
|
||||
"sources":["front"],
|
||||
"label": ["seg"],
|
||||
"mask": {
|
||||
"id": "person",
|
||||
"value": [147,44,209]
|
||||
}
|
||||
},
|
||||
{
|
||||
"sources":["front"],
|
||||
"label": ["seg"],
|
||||
"mask": {
|
||||
"id": "road",
|
||||
"value": [197,135,146]
|
||||
}
|
||||
},
|
||||
{
|
||||
"sources":["front"],
|
||||
"label": ["seg"],
|
||||
"mask": {
|
||||
"id": "street",
|
||||
"value": [135,198,145]
|
||||
}
|
||||
},
|
||||
{
|
||||
"sources":["front"],
|
||||
"label": ["seg"],
|
||||
"mask": {
|
||||
"id": "building",
|
||||
"value": [72,191,65]
|
||||
}
|
||||
}
|
||||
]
|
||||
"id": "<framegroup_id>",
|
||||
"timestamp": "<timestamp>" ,
|
||||
"context_id": "car_1",
|
||||
"sources": [
|
||||
{
|
||||
"id": "<source_id>",
|
||||
"content_type": "<type>",
|
||||
"uri": "<image_uri>",
|
||||
"timestamp": 1234567889,
|
||||
...
|
||||
"masks": [
|
||||
{
|
||||
"id": "<mask_id>",
|
||||
"content_type": "video/mp4",
|
||||
"uri": "<mask_uri>",
|
||||
"timestamp": 123456789
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</Collapsible>
|
||||
The masks dictionary includes the frame's masks’ URIs and IDs.
|
||||
|
||||
## Alpha Channel Masks
|
||||
For alpha channel, mask RGB pixel values are interpreted as opacity values so that when the mask is applied, only the
|
||||
desired sections of the source are visible.
|
||||
|
||||
* In `sources`:
|
||||
* The source ID is `front`.
|
||||
* In the `masks` dictionary, the source contains mask sources with IDs of `seg` and `seg_instance`.
|
||||
* In `rois`:
|
||||
* Each ROI source is `front`, relating the ROI to its original source image.
|
||||
* Each ROI has a label of `seg`, indicating segmentation.
|
||||
* Each `mask` has an `id` (`car`, `person`, `road`, `street`, and `building`) and a unique RGB `value`
|
||||
(color-coding).
|
||||
|
||||
For example:
|
||||
* Original frame:
|
||||
|
||||

|
||||
|
||||
<Collapsible type="screenshot" title="Example image and masks">
|
||||
Original Image
|
||||
* Same frame with an alpha channel mask, emphasizing the troll doll:
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
Mask image
|
||||
The frame's sources array contains a masks list of dictionaries that looks something like this:
|
||||
|
||||

|
||||
|
||||
</Collapsible>
|
||||
|
||||
### Example 2
|
||||
|
||||
This example shows two masks for video from a camera. The masks label cars and the road.
|
||||
|
||||
<Collapsible type="info" title="Example 2: View the frame">
|
||||
|
||||
```json
|
||||
"sources": [
|
||||
{
|
||||
"id": "front",
|
||||
"content_type": "video/mp4",
|
||||
"width": 800,
|
||||
"height": 600,
|
||||
"uri": "https://s3.amazonaws.com/my_cars/car_1/front.mp4",
|
||||
"timestamp": 1234567889,
|
||||
"meta" :{
|
||||
"angle":45,
|
||||
"fov":129
|
||||
},
|
||||
"masks": [
|
||||
{
|
||||
"id": "car",
|
||||
"content_type": "video/mp4",
|
||||
"uri": "https://s3.amazonaws.com/seg_masks/car_1/front_seg.mp4",
|
||||
"timestamp": 123456789
|
||||
},
|
||||
{
|
||||
"id": "road",
|
||||
"content_type": "video/mp4",
|
||||
"uri": "https://s3.amazonaws.com/seg_masks/car_1/front_instance_seg.mp4",
|
||||
"timestamp": 123456789
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"rois": [
|
||||
{
|
||||
"sources":["front"],
|
||||
"label": ["right_lane"],
|
||||
"mask": {
|
||||
"id": "car",
|
||||
"value": [210,210,120]
|
||||
}
|
||||
},
|
||||
{
|
||||
"sources":["front"],
|
||||
"label": ["right_lane"],
|
||||
"mask": {
|
||||
"id": "road",
|
||||
"value": [197,135,146]
|
||||
}
|
||||
}
|
||||
```editorconfig
|
||||
{
|
||||
"sources" : [
|
||||
{
|
||||
"id" : "321"
|
||||
"uri" : "https://i.ibb.co/bs7R9k6/troll.png"
|
||||
"masks" : [
|
||||
{
|
||||
"id" : "troll",
|
||||
"uri" : "https://i.ibb.co/TmJ3mvT/troll-alpha.png"
|
||||
}
|
||||
]
|
||||
"timestamp" : 0
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</Collapsible>
|
||||
Note that for alpha channel masks, no labels are used.
|
||||
|
||||
* In `sources`:
|
||||
* The source ID is `front`.
|
||||
* The source contains mask sources with IDs of `car` and `road`.
|
||||
* In `rois`:
|
||||
* Each ROI source is `front` relating the ROI to its original source image.
|
||||
* Each ROI has a label of `right_lane` indicating the ROI object.
|
||||
* Each `mask` has an `id` (`car`, `person`) and a unique RGB `value` (color-coding).
|
||||
|
||||
## Usage
|
||||
|
||||
### Adding Mask Annotations
|
||||
|
||||
To add a mask annotation to a frame, use the [`SingleFrame.add_annotation`](../references/hyperdataset/singleframe.md#add_annotation).
|
||||
This method is generally used to add ROI annotations, but it can also be used to add frame specific mask labels. Input the
|
||||
mask value as a list with the RGB values in the `mask_rgb` parameter, and a list of labels in the `labels` parameter.
|
||||
### Register Frames with a Masks
|
||||
To register frames with a mask, create a frame and specify the frame's mask file's URI.
|
||||
|
||||
```python
|
||||
frame = SingleFrame(
|
||||
source='/home/user/woof_meow.jpg',
|
||||
preview_uri='https://storage.googleapis.com/kaggle-competitions/kaggle/3362/media/woof_meow.jpg',
|
||||
# create dataset version
|
||||
version = DatasetVersion.create_version(
|
||||
dataset_name="Example",
|
||||
version_name="Registering frame with mask"
|
||||
)
|
||||
|
||||
frame.add_annotation(mask_rgb=[0, 0, 0], labels=['cat'])
|
||||
|
||||
# create frame with mask
|
||||
frame = SingleFrame(
|
||||
source='https://s3.amazonaws.com/allegro-datasets/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/val/frankfurt/frankfurt_000000_000294_leftImg8bit.png',
|
||||
mask_source='https://s3.amazonaws.com/allegro-datasets/cityscapes/gtFine_trainvaltest/gtFine/val/frankfurt/frankfurt_000000_000294_gtFine_labelIds.png'
|
||||
)
|
||||
|
||||
# add frame to version
|
||||
version.add_frames([frame])
|
||||
```
|
||||
|
||||
To use the mask for pixel segmentation, define the pixel-label mapping for the DatasetVersion:
|
||||
|
||||
```python
|
||||
version.set_masks_labels(
|
||||
{(0,0,0): ["background"], (1,1,1): ["person", "sitting"], (2,2,2): ["cat"]}
|
||||
)
|
||||
```
|
||||
|
||||
The relevant label is applied to all masks in the version according to the version’s mask-label mapping dictionary.
|
||||
|
||||
### Registering Frames with Multiple Masks
|
||||
Frames can contain multiple masks. To add multiple masks, use the SingleFrame’s `masks_source` property. Input one of
|
||||
the following:
|
||||
* A dictionary with mask string ID keys and mask URI values
|
||||
* A list of mask URIs. Number IDs are automatically assigned to the masks ( "00", "01", etc.)
|
||||
|
||||
```python
|
||||
frame = SingleFrame(source='https://s3.amazonaws.com/allegro-datasets/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/val/frankfurt/frankfurt_000000_000294_leftImg8bit.png',)
|
||||
|
||||
# add multiple masks
|
||||
# with dictionary
|
||||
frame.masks_source={"ID 1 ": "<mask_URI_1>", "ID 2": "<mask_URI_2>"}
|
||||
# with list
|
||||
frame.masks_source=[ "<mask_URI_1>", "<mask_URI_2>"]
|
||||
```
|
||||
|
||||
|
@ -69,14 +69,6 @@ The following is an example of preview metadata.
|
||||
}
|
||||
],
|
||||
"rois": [
|
||||
{
|
||||
"sources":["front"],
|
||||
"label": ["right_lane"],
|
||||
"mask": {
|
||||
"id": "seg",
|
||||
"value": [-1, 1, 255]
|
||||
}
|
||||
},
|
||||
{
|
||||
"sources": ["front"],
|
||||
"label": ["bike"],
|
||||
|
@ -35,8 +35,8 @@ For more information, see [Annotations](annotations.md).
|
||||
|
||||
|
||||
### Masks
|
||||
A `SingleFrame` includes a URI link to a mask file if applicable. Masks correspond to raw data where the objects to be
|
||||
detected in raw data are marked with colors in the masks.
|
||||
A `SingleFrame` can include a URI link to masks file if applicable. Masks correspond to raw data where the objects to be
|
||||
detected are marked with colors or different opacity levels in the masks.
|
||||
|
||||
For more information, see [Masks](masks.md).
|
||||
|
||||
@ -100,7 +100,12 @@ The panel below describes the details contained within a `frame`:
|
||||
|
||||
* `id` - ID of the mask dictionary in `sources`.
|
||||
* `value` - RGB value of the mask.
|
||||
|
||||
|
||||
:::info
|
||||
The `mask` dictionary is deprecated. Mask labels and their associated pixel values are now stored in the dataset
|
||||
version’s metadata. See [Masks](masks.md).
|
||||
:::
|
||||
|
||||
* `poly` (*[int]*) - Bounding area vertices.
|
||||
* `sources` (*[string]*) - The `id` in the `sources` dictionary which relates an annotation to its raw data source.
|
||||
|
||||
@ -112,11 +117,11 @@ The panel below describes the details contained within a `frame`:
|
||||
* `uri` - URI of the raw data.
|
||||
* `width` - Width of the image or video.
|
||||
* `height` - Height of the image or video.
|
||||
* `mask` - Sources of masks used in the `rois`.
|
||||
* `masks` - List of available masks.
|
||||
|
||||
* `id` - ID of the mask source. This relates a mask source to an ROI.
|
||||
* `content_type` - The type of mask source. For example, `image/jpeg`.
|
||||
* `uri` - URI of the mask source.
|
||||
* `id` - Mask ID
|
||||
* `content_type` - Mask type. For example, `image/jpeg`.
|
||||
* `uri` - Mask URI
|
||||
* `timestamp`
|
||||
|
||||
* `preview` - URI of the thumbnail preview image used in the ClearML Enterprise WebApp (UI)
|
||||
|
@ -7,12 +7,9 @@ Each frame contains `sources`, a list of dictionaries containing:
|
||||
* A `URI` pointing to the source data (image or video)
|
||||
* Sources for [masks](masks.md) used in semantic segmentation
|
||||
* Image [previews](previews.md), which are thumbnails used in the ClearML Enterprise WebApp (UI).
|
||||
|
||||
`sources` does not contain:
|
||||
* `rois` even though ROIs are directly associated with the images and `masks` in `sources`
|
||||
* ROI metadata, because ROIs can be used over multiple frames.
|
||||
|
||||
Instead, frames contain a top-level `rois` array, which is a list of ROI dictionaries, where each dictionary contains a
|
||||
|
||||
`sources` does not contain ROI metadata, because ROIs can be used over multiple frames. Instead, frames contain a
|
||||
top-level `rois` array, which is a list of ROI dictionaries, where each dictionary contains a
|
||||
list of source IDs. Those IDs connect `sources` to ROIs.
|
||||
|
||||
## Examples
|
||||
|
Before Width: | Height: | Size: 137 KiB |
Before Width: | Height: | Size: 1.1 MiB |
BIN
docs/img/hyperdatasets/dataset_alpha_masks_1.png
Normal file
After Width: | Height: | Size: 773 KiB |
BIN
docs/img/hyperdatasets/dataset_alpha_masks_2.png
Normal file
After Width: | Height: | Size: 517 KiB |
BIN
docs/img/hyperdatasets/dataset_metadata.png
Normal file
After Width: | Height: | Size: 35 KiB |
BIN
docs/img/hyperdatasets/dataset_pixel_masks_1.png
Normal file
After Width: | Height: | Size: 961 KiB |
BIN
docs/img/hyperdatasets/dataset_pixel_masks_2.png
Normal file
After Width: | Height: | Size: 165 KiB |