Update Hyper-Dataset mask information (#532)

This commit is contained in:
pollfly 2023-04-16 10:10:30 +03:00 committed by GitHub
parent ec39ad38cb
commit 360a042e79
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
12 changed files with 131 additions and 227 deletions

View File

@ -5,15 +5,14 @@ title: Datasets and Dataset Versions
ClearML Enterprise's **Datasets** and **Dataset versions** provide the internal data structure
and functionality for the following purposes:
* Connecting source data to the ClearML Enterprise platform
* Using ClearML Enterprise's GIT-like [Dataset versioning](#dataset-versioning)
* Using ClearML Enterprise's Git-like [Dataset versioning](#dataset-versioning)
* Integrating the powerful features of [Dataviews](dataviews.md) with an experiment
* [Annotating](webapp/webapp_datasets_frames.md#annotations) images and videos
Datasets consist of versions with SingleFrames and/or FrameGroups. Each Dataset can contain multiple versions, which
can have multiple children that inherit their parent's contents.
Mask-labels can be defined globally, for a DatasetVersion. When defined this way, they will be applied to all masks in
that version.
Mask-labels are defined at the DatasetVersion level, and are applied to all masks in a DatasetVersion.
## Example Datasets

View File

@ -2,241 +2,152 @@
title: Masks
---
When applicable, [`sources`](sources.md) contains `masks`, a list of dictionaries used to connect a special type of
source data to the ClearML Enterprise platform. That source data is a **mask**.
Masks are source data used in deep learning for image segmentation. Mask URIs are a property of a SingleFrame.
Masks are used in deep learning for semantic segmentation.
ClearML applies the masks in one of two modes:
* [Pixel segmentation](#pixel-segmentation-masks) - Pixel RGB values are each mapped to segmentation labels.
* [Alpha channel](#alpha-channel-masks) - Pixel RGB values are interpreted as opacity levels.
Masks correspond to raw data where the objects to be detected are marked with colors in the masks. The colors
are RGB values and represent the objects that are labeled for segmentation.
In the WebApp's [frame viewer](webapp/webapp_datasets_frames.md#frame-viewer), you can select how to apply a mask over
a frame.
In frames used for semantic segmentation, the metadata connecting the mask files / images to the ClearML Enterprise platform,
and the RGB values and labels used for segmentation are separate. They are contained in two different dictionaries of
a SingleFrame:
## Pixel Segmentation Masks
For pixel segmentation, mask RGB pixel values are mapped to labels.
* **`masks`** (plural) is in [`sources`](sources.md) and contains the mask files / images `URI` (in addition to other keys
and values).
Mask-label mapping is defined at the dataset level, through the `mask_labels` property in a version's metadata.
* **`mask`** (singular) is in the `rois` array of a Frame.
Each `rois` dictionary contains:
`mask_labels` is a list of dictionaries, where each dictionary includes the following keys:
* `value` - Mask's RGB pixel value
* `labels` - Label associated with the value.
* RGB values and labels of a **mask** (in addition to other keys and values)
See how to manage dataset version mask labels pythonically [here](dataset.md#managing-version-mask-labels).
* Metadata and data for the labeled area of an image
See [Example 1](#example-1), which shows `masks` in `sources`, `mask` in `rois`, and the key-value pairs used to relate
a mask to its source in a frame.
In the UI, you can view the mapping in a dataset version's [Metadata](webapp/webapp_datasets_versioning.md#metadata) tab.
![Dataset metadata panel](../img/hyperdatasets/dataset_metadata.png)
## Masks Structure
When viewing a frame with a mask corresponding with the versions mask-label mapping, the UI arbitrarily assigns a color
to each label . The color assignment can be [customized](webapp/webapp_datasets_frames.md#labels).
The chart below explains the keys and values of the `masks` dictionary (in the [`sources`](sources.md)
section of a Frame).
For example:
* Original frame image:
|Key|Value Description|
|---|----|
|`id`|**Type**: integer. <ul><li> The ID is used to relate this mask data source to the `mask` dictionary containing the label and RGB value for the mask.</li><li> See the `mask` key in `rois`.</li></ul>|
|`content_type`| **Type**: string. <ul><li> Type of mask data. For example, image / png or video / mp4.</li></ul>|
|`timestamp`|**Type**: integer. <ul><li>For images from a video, indicates the absolute position of the frame from the source (video) </li><li> For still images, set this to 0 (for example, video from a camera on a car, at 30 frames per second, would have a timestamp of 0 for the first frame, and 33 for the second frame).</li></ul>|
|`uri`|**Type**: string. <ul><li> URI of the mask file / image.</li></ul>|
![Frame without mask](../img/hyperdatasets/dataset_pixel_masks_1.png)
* Frame image with the semantic segmentation mask enabled. Labels are applied according to the dataset versions
mask-label mapping:
## Examples
### Example 1
![Frame with semantic seg mask](../img/hyperdatasets/dataset_pixel_masks_2.png)
This example demonstrates an original image, its masks, and its frame containing
the `sources` and ROI metadata.
The frame's sources array contains a masks list of dictionaries that looks something like this:
<Collapsible type="info" title="Example 1: View the frame">
This frame contains the `masks` list of dictionaries in `sources`,
and the `rois` array, as well as several top-level key-value pairs.
```json
```editorconfig
{
"timestamp": 1234567889,
"context_id": "car_1",
"meta": {
"velocity": "60"
},
"sources": [
{
"id": "front",
"content_type": "video/mp4",
"width": 800,
"height": 600,
"uri": "https://s3.amazonaws.com/my_cars/car_1/front.mp4",
"timestamp": 1234567889,
"meta" :{
"angle":45,
"fov":129
},
"masks": [
{
"id": "seg",
"content_type": "video/mp4",
"uri": "https://s3.amazonaws.com/seg_masks/car_1/front_seg.mp4",
"timestamp": 123456789
},
{
"id": "seg_instance",
"content_type": "video/mp4",
"uri": "https://s3.amazonaws.com/seg_masks/car_1/front_instance_seg.mp4",
"timestamp": 123456789
}
]
}
],
"rois": [
{
"sources":["front"],
"label": ["seg"],
"mask": {
"id": "car",
"value": [210,210,120]
}
},
{
"sources":["front"],
"label": ["seg"],
"mask": {
"id": "person",
"value": [147,44,209]
}
},
{
"sources":["front"],
"label": ["seg"],
"mask": {
"id": "road",
"value": [197,135,146]
}
},
{
"sources":["front"],
"label": ["seg"],
"mask": {
"id": "street",
"value": [135,198,145]
}
},
{
"sources":["front"],
"label": ["seg"],
"mask": {
"id": "building",
"value": [72,191,65]
}
}
]
"id": "<framegroup_id>",
"timestamp": "<timestamp>" ,
"context_id": "car_1",
"sources": [
{
"id": "<source_id>",
"content_type": "<type>",
"uri": "<image_uri>",
"timestamp": 1234567889,
...
"masks": [
{
"id": "<mask_id>",
"content_type": "video/mp4",
"uri": "<mask_uri>",
"timestamp": 123456789
}
]
}
]
}
```
</Collapsible>
The masks dictionary includes the frame's masks URIs and IDs.
## Alpha Channel Masks
For alpha channel, mask RGB pixel values are interpreted as opacity values so that when the mask is applied, only the
desired sections of the source are visible.
* In `sources`:
* The source ID is `front`.
* In the `masks` dictionary, the source contains mask sources with IDs of `seg` and `seg_instance`.
* In `rois`:
* Each ROI source is `front`, relating the ROI to its original source image.
* Each ROI has a label of `seg`, indicating segmentation.
* Each `mask` has an `id` (`car`, `person`, `road`, `street`, and `building`) and a unique RGB `value`
(color-coding).
For example:
* Original frame:
![Maskless frame](../img/hyperdatasets/dataset_alpha_masks_1.png)
<Collapsible type="screenshot" title="Example image and masks">
Original Image
* Same frame with an alpha channel mask, emphasizing the troll doll:
![Alpha mask frame](../img/hyperdatasets/dataset_alpha_masks_2.png)
![image](../img/hyperdatasets/concepts_masks_image_only.png)
Mask image
The frame's sources array contains a masks list of dictionaries that looks something like this:
![image](../img/hyperdatasets/concepts_masks.png)
</Collapsible>
### Example 2
This example shows two masks for video from a camera. The masks label cars and the road.
<Collapsible type="info" title="Example 2: View the frame">
```json
"sources": [
{
"id": "front",
"content_type": "video/mp4",
"width": 800,
"height": 600,
"uri": "https://s3.amazonaws.com/my_cars/car_1/front.mp4",
"timestamp": 1234567889,
"meta" :{
"angle":45,
"fov":129
},
"masks": [
{
"id": "car",
"content_type": "video/mp4",
"uri": "https://s3.amazonaws.com/seg_masks/car_1/front_seg.mp4",
"timestamp": 123456789
},
{
"id": "road",
"content_type": "video/mp4",
"uri": "https://s3.amazonaws.com/seg_masks/car_1/front_instance_seg.mp4",
"timestamp": 123456789
}
]
}
],
"rois": [
{
"sources":["front"],
"label": ["right_lane"],
"mask": {
"id": "car",
"value": [210,210,120]
}
},
{
"sources":["front"],
"label": ["right_lane"],
"mask": {
"id": "road",
"value": [197,135,146]
}
}
```editorconfig
{
"sources" : [
{
"id" : "321"
"uri" : "https://i.ibb.co/bs7R9k6/troll.png"
"masks" : [
{
"id" : "troll",
"uri" : "https://i.ibb.co/TmJ3mvT/troll-alpha.png"
}
]
"timestamp" : 0
}
]
}
```
</Collapsible>
Note that for alpha channel masks, no labels are used.
* In `sources`:
* The source ID is `front`.
* The source contains mask sources with IDs of `car` and `road`.
* In `rois`:
* Each ROI source is `front` relating the ROI to its original source image.
* Each ROI has a label of `right_lane` indicating the ROI object.
* Each `mask` has an `id` (`car`, `person`) and a unique RGB `value` (color-coding).
## Usage
### Adding Mask Annotations
To add a mask annotation to a frame, use the [`SingleFrame.add_annotation`](../references/hyperdataset/singleframe.md#add_annotation).
This method is generally used to add ROI annotations, but it can also be used to add frame specific mask labels. Input the
mask value as a list with the RGB values in the `mask_rgb` parameter, and a list of labels in the `labels` parameter.
### Register Frames with a Masks
To register frames with a mask, create a frame and specify the frame's mask file's URI.
```python
frame = SingleFrame(
source='/home/user/woof_meow.jpg',
preview_uri='https://storage.googleapis.com/kaggle-competitions/kaggle/3362/media/woof_meow.jpg',
# create dataset version
version = DatasetVersion.create_version(
dataset_name="Example",
version_name="Registering frame with mask"
)
frame.add_annotation(mask_rgb=[0, 0, 0], labels=['cat'])
# create frame with mask
frame = SingleFrame(
source='https://s3.amazonaws.com/allegro-datasets/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/val/frankfurt/frankfurt_000000_000294_leftImg8bit.png',
mask_source='https://s3.amazonaws.com/allegro-datasets/cityscapes/gtFine_trainvaltest/gtFine/val/frankfurt/frankfurt_000000_000294_gtFine_labelIds.png'
)
# add frame to version
version.add_frames([frame])
```
To use the mask for pixel segmentation, define the pixel-label mapping for the DatasetVersion:
```python
version.set_masks_labels(
{(0,0,0): ["background"], (1,1,1): ["person", "sitting"], (2,2,2): ["cat"]}
)
```
The relevant label is applied to all masks in the version according to the versions mask-label mapping dictionary.
### Registering Frames with Multiple Masks
Frames can contain multiple masks. To add multiple masks, use the SingleFrames `masks_source` property. Input one of
the following:
* A dictionary with mask string ID keys and mask URI values
* A list of mask URIs. Number IDs are automatically assigned to the masks ( "00", "01", etc.)
```python
frame = SingleFrame(source='https://s3.amazonaws.com/allegro-datasets/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/val/frankfurt/frankfurt_000000_000294_leftImg8bit.png',)
# add multiple masks
# with dictionary
frame.masks_source={"ID 1 ": "<mask_URI_1>", "ID 2": "<mask_URI_2>"}
# with list
frame.masks_source=[ "<mask_URI_1>", "<mask_URI_2>"]
```

View File

@ -69,14 +69,6 @@ The following is an example of preview metadata.
}
],
"rois": [
{
"sources":["front"],
"label": ["right_lane"],
"mask": {
"id": "seg",
"value": [-1, 1, 255]
}
},
{
"sources": ["front"],
"label": ["bike"],

View File

@ -35,8 +35,8 @@ For more information, see [Annotations](annotations.md).
### Masks
A `SingleFrame` includes a URI link to a mask file if applicable. Masks correspond to raw data where the objects to be
detected in raw data are marked with colors in the masks.
A `SingleFrame` can include a URI link to masks file if applicable. Masks correspond to raw data where the objects to be
detected are marked with colors or different opacity levels in the masks.
For more information, see [Masks](masks.md).
@ -100,7 +100,12 @@ The panel below describes the details contained within a `frame`:
* `id` - ID of the mask dictionary in `sources`.
* `value` - RGB value of the mask.
:::info
The `mask` dictionary is deprecated. Mask labels and their associated pixel values are now stored in the dataset
versions metadata. See [Masks](masks.md).
:::
* `poly` (*[int]*) - Bounding area vertices.
* `sources` (*[string]*) - The `id` in the `sources` dictionary which relates an annotation to its raw data source.
@ -112,11 +117,11 @@ The panel below describes the details contained within a `frame`:
* `uri` - URI of the raw data.
* `width` - Width of the image or video.
* `height` - Height of the image or video.
* `mask` - Sources of masks used in the `rois`.
* `masks` - List of available masks.
* `id` - ID of the mask source. This relates a mask source to an ROI.
* `content_type` - The type of mask source. For example, `image/jpeg`.
* `uri` - URI of the mask source.
* `id` - Mask ID
* `content_type` - Mask type. For example, `image/jpeg`.
* `uri` - Mask URI
* `timestamp`
* `preview` - URI of the thumbnail preview image used in the ClearML Enterprise WebApp (UI)

View File

@ -7,12 +7,9 @@ Each frame contains `sources`, a list of dictionaries containing:
* A `URI` pointing to the source data (image or video)
* Sources for [masks](masks.md) used in semantic segmentation
* Image [previews](previews.md), which are thumbnails used in the ClearML Enterprise WebApp (UI).
`sources` does not contain:
* `rois` even though ROIs are directly associated with the images and `masks` in `sources`
* ROI metadata, because ROIs can be used over multiple frames.
Instead, frames contain a top-level `rois` array, which is a list of ROI dictionaries, where each dictionary contains a
`sources` does not contain ROI metadata, because ROIs can be used over multiple frames. Instead, frames contain a
top-level `rois` array, which is a list of ROI dictionaries, where each dictionary contains a
list of source IDs. Those IDs connect `sources` to ROIs.
## Examples

Binary file not shown.

Before

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 773 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 517 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 961 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 165 KiB