Update Hyper-Dataset mask information (#532)

2025-06-26 18:17:44 +00:00 · 2023-04-16 10:10:30 +03:00 · 2023-04-16 10:10:30 +03:00 · 360a042e79
commit 360a042e79
parent ec39ad38cb
12 changed files with 131 additions and 227 deletions
--- a/docs/hyperdatasets/dataset.md
+++ b/docs/hyperdatasets/dataset.md
@ -5,15 +5,14 @@ title: Datasets and Dataset Versions
 ClearML Enterprise's **Datasets** and **Dataset versions** provide the internal data structure 
 and functionality for the following purposes:
 * Connecting source data to the ClearML Enterprise platform
-* Using ClearML Enterprise's GIT-like [Dataset versioning](#dataset-versioning)
+* Using ClearML Enterprise's Git-like [Dataset versioning](#dataset-versioning)
 * Integrating the powerful features of [Dataviews](dataviews.md) with an experiment
 * [Annotating](webapp/webapp_datasets_frames.md#annotations) images and videos

 Datasets consist of versions with SingleFrames and/or FrameGroups. Each Dataset can contain multiple versions, which 
 can have multiple children that inherit their parent's contents. 

-Mask-labels can be defined globally, for a DatasetVersion. When defined this way, they will be applied to all masks in 
-that version.
+Mask-labels are defined at the DatasetVersion level, and are applied to all masks in a DatasetVersion.

 ## Example Datasets

--- a/docs/hyperdatasets/masks.md
+++ b/docs/hyperdatasets/masks.md
@ -2,241 +2,152 @@
 title: Masks
 --- 

-When applicable, [`sources`](sources.md) contains `masks`, a list of dictionaries used to connect a special type of 
-source data to the ClearML Enterprise platform. That source data is a **mask**. 
+Masks are source data used in deep learning for image segmentation. Mask URIs are a property of a SingleFrame.

-Masks are used in deep learning for semantic segmentation.
+ClearML applies the masks in one of two modes:
+* [Pixel segmentation](#pixel-segmentation-masks) - Pixel RGB values are each mapped to segmentation labels. 
+* [Alpha channel](#alpha-channel-masks) - Pixel RGB values are interpreted as opacity levels. 

-Masks correspond to raw data where the objects to be detected are marked with colors in the masks. The colors 
-are RGB values and represent the objects that are labeled for segmentation.
+In the WebApp's [frame viewer](webapp/webapp_datasets_frames.md#frame-viewer), you can select how to apply a mask over 
+a frame.

-In frames used for semantic segmentation, the metadata connecting the mask files / images to the ClearML Enterprise platform,
-and the RGB values and labels used for segmentation are separate. They are contained in two different dictionaries of 
-a SingleFrame:
+## Pixel Segmentation Masks
+For pixel segmentation, mask RGB pixel values are mapped to labels.

-* **`masks`** (plural) is in [`sources`](sources.md) and contains the mask files / images `URI` (in addition to other keys 
-  and values). 
+Mask-label mapping is defined at the dataset level, through the `mask_labels` property in a version's metadata.

-* **`mask`** (singular) is in the `rois` array of a Frame. 
-  
-  Each `rois` dictionary contains: 
+`mask_labels` is a list of dictionaries, where each dictionary includes the following keys:
+* `value` - Mask's RGB pixel value
+* `labels` - Label associated with the value.

-  * RGB values and labels of a **mask** (in addition to other keys and values)
+See how to manage dataset version mask labels pythonically [here](dataset.md#managing-version-mask-labels).

-  * Metadata and data for the labeled area of an image
-  
-  
-See [Example 1](#example-1), which shows `masks` in `sources`, `mask` in `rois`, and the key-value pairs used to relate 
-a mask to its source in a frame.
+In the UI, you can view the mapping in a dataset version's [Metadata](webapp/webapp_datasets_versioning.md#metadata) tab.

+![Dataset metadata panel](../img/hyperdatasets/dataset_metadata.png)

-## Masks Structure
+When viewing a frame with a mask corresponding with the version’s mask-label mapping, the UI arbitrarily assigns a color 
+to each label . The color assignment can be [customized](webapp/webapp_datasets_frames.md#labels).

-The chart below explains the keys and values of the `masks` dictionary (in the [`sources`](sources.md) 
-section of a Frame).
+For example:
+* Original frame image:

-|Key|Value Description|
-|---|----|
-|`id`|**Type**: integer. <ul><li> The ID is used to relate this mask data source to the `mask` dictionary containing the label and RGB value for the mask.</li><li> See the `mask` key in `rois`.</li></ul>| 
-|`content_type`| **Type**: string. <ul><li> Type of mask data. For example, image / png or video / mp4.</li></ul>|
-|`timestamp`|**Type**: integer. <ul><li>For images from a video, indicates the absolute position of the frame from the source (video) </li><li> For still images, set this to 0 (for example, video from a camera on a car, at 30 frames per second, would have a timestamp of 0 for the first frame, and 33 for the second frame).</li></ul>|
-|`uri`|**Type**: string. <ul><li> URI of the mask file / image.</li></ul>|
+  ![Frame without mask](../img/hyperdatasets/dataset_pixel_masks_1.png)

+* Frame image with the semantic segmentation mask enabled. Labels are applied according to the dataset version’s 
+  mask-label mapping:

-## Examples
-### Example 1
+  ![Frame with semantic seg mask](../img/hyperdatasets/dataset_pixel_masks_2.png)

-This example demonstrates an original image, its masks, and its frame containing
-the `sources` and ROI metadata. 
+The frame's sources array contains a masks list of dictionaries that looks something like this:

-<Collapsible type="info" title="Example 1: View the frame">
-This frame contains the `masks` list of dictionaries in `sources`, 
-and the `rois` array, as well as several top-level key-value pairs.
-
-
-```json
+```editorconfig
 {
-    "timestamp": 1234567889,
-    "context_id": "car_1",
-    "meta": {
-        "velocity": "60"
-    },
-    "sources": [
-        {
-            "id": "front",
-            "content_type": "video/mp4",
-            "width": 800,
-            "height": 600,
-            "uri": "https://s3.amazonaws.com/my_cars/car_1/front.mp4",
-            "timestamp": 1234567889,
-            "meta" :{
-                "angle":45,
-                "fov":129
-            },
-            "masks": [
-                {
-                    "id": "seg",
-                    "content_type": "video/mp4",
-                    "uri": "https://s3.amazonaws.com/seg_masks/car_1/front_seg.mp4",
-                    "timestamp": 123456789
-                },
-                {
-                    "id": "seg_instance",
-                    "content_type": "video/mp4",
-                    "uri": "https://s3.amazonaws.com/seg_masks/car_1/front_instance_seg.mp4",
-                    "timestamp": 123456789
-                }
-            ]        
-        }
-    ],
-    "rois": [
-        {
-            "sources":["front"],
-            "label": ["seg"],
-            "mask": {
-                "id": "car",
-                "value": [210,210,120]
-            }
-        },
-        {
-            "sources":["front"],
-            "label": ["seg"],
-            "mask": {
-                "id": "person",
-                "value": [147,44,209]
-            }
-        },
-        {
-            "sources":["front"],
-            "label": ["seg"],
-            "mask": {
-                "id": "road",
-                "value": [197,135,146]
-            }
-        },
-        {
-            "sources":["front"],
-            "label": ["seg"],
-            "mask": {
-                "id": "street",
-                "value": [135,198,145]
-            }
-        },
-        {
-            "sources":["front"],
-            "label": ["seg"],
-            "mask": {
-                "id": "building",
-                "value": [72,191,65]
-            }
-        }
-    ]
+ "id": "<framegroup_id>",
+ "timestamp": "<timestamp>" ,
+ "context_id": "car_1",
+ "sources": [
+   {
+     "id": "<source_id>",
+     "content_type": "<type>",
+     "uri": "<image_uri>",
+     "timestamp": 1234567889,
+     ...
+     "masks": [
+       {
+         "id": "<mask_id>",
+         "content_type": "video/mp4",
+         "uri": "<mask_uri>",
+         "timestamp": 123456789
+       }
+     ]
+   }
+ ]
 }
 ```

-</Collapsible>
+The masks dictionary includes the frame's masks’ URIs and IDs.

+## Alpha Channel Masks
+For alpha channel, mask RGB pixel values are interpreted as opacity values so that when the mask is applied, only the 
+desired sections of the source are visible.

-* In `sources`:
-  * The source ID is `front`.
-  * In the `masks` dictionary, the source contains mask sources with IDs of `seg` and `seg_instance`.
-* In `rois`: 
-  * Each ROI source is `front`, relating the ROI to its original source image.
-  * Each ROI has a label of `seg`, indicating segmentation.
-  * Each `mask` has an `id` (`car`, `person`, `road`, `street`, and `building`) and a unique RGB `value` 
-  (color-coding).
-
+For example:
+* Original frame:
  
+  ![Maskless frame](../img/hyperdatasets/dataset_alpha_masks_1.png)

-<Collapsible type="screenshot" title="Example image and masks">
-Original Image 
+* Same frame with an alpha channel mask, emphasizing the troll doll:
+  
+  ![Alpha mask frame](../img/hyperdatasets/dataset_alpha_masks_2.png)

-![image](../img/hyperdatasets/concepts_masks_image_only.png)

-Mask image
+The frame's sources array contains a masks list of dictionaries that looks something like this:

-![image](../img/hyperdatasets/concepts_masks.png)
-
-</Collapsible>
-
-### Example 2 
-
-This example shows two masks for video from a camera. The masks label cars and the road.
-
-<Collapsible type="info" title="Example 2: View the frame">
-
-```json
-"sources": [
-        {
-            "id": "front",
-            "content_type": "video/mp4",
-            "width": 800,
-            "height": 600,
-            "uri": "https://s3.amazonaws.com/my_cars/car_1/front.mp4",
-            "timestamp": 1234567889,
-            "meta" :{
-                "angle":45,
-                "fov":129
-            },
-            "masks": [
-                {
-                    "id": "car",
-                    "content_type": "video/mp4",
-                    "uri": "https://s3.amazonaws.com/seg_masks/car_1/front_seg.mp4",
-                    "timestamp": 123456789
-                },
-                {
-                    "id": "road",
-                    "content_type": "video/mp4",
-                    "uri": "https://s3.amazonaws.com/seg_masks/car_1/front_instance_seg.mp4",
-                    "timestamp": 123456789
-                }
-            ]        
-        }
-    ],
-    "rois": [
-    {
-        "sources":["front"],
-        "label": ["right_lane"],
-        "mask": {
-            "id": "car",
-            "value": [210,210,120]
-        }
-    },
-    {
-        "sources":["front"],
-        "label": ["right_lane"],
-        "mask": {
-            "id": "road",
-            "value": [197,135,146]
-        }
-    }
+```editorconfig
+{
+ "sources" : [
+   {
+     "id" : "321"
+     "uri" : "https://i.ibb.co/bs7R9k6/troll.png"
+     "masks" : [
+       {
+         "id" : "troll",
+         "uri" : "https://i.ibb.co/TmJ3mvT/troll-alpha.png"
+       }
+     ]
+     "timestamp" : 0
+   }
+ ]
+}
 ```

-</Collapsible>
+Note that for alpha channel masks, no labels are used.

-* In `sources`:
-  * The source ID is `front`.
-  * The source contains mask sources with IDs of `car` and `road`.
-* In `rois`:
-  * Each ROI source is `front` relating the ROI to its original source image.
-  * Each ROI has a label of `right_lane` indicating the ROI object.
-  * Each `mask` has an `id` (`car`, `person`) and a unique RGB `value` (color-coding).
- 
 ## Usage
-
-### Adding Mask Annotations 
-
-To add a mask annotation to a frame, use the [`SingleFrame.add_annotation`](../references/hyperdataset/singleframe.md#add_annotation). 
-This method is generally used to add ROI annotations, but it can also be used to add frame specific mask labels. Input the 
-mask value as a list with the RGB values in the `mask_rgb` parameter, and a list of labels in the `labels` parameter.
+### Register Frames with a Masks
+To register frames with a mask, create a frame and specify the frame's mask file's URI.

 ```python
-frame = SingleFrame(
-    source='/home/user/woof_meow.jpg',
-    preview_uri='https://storage.googleapis.com/kaggle-competitions/kaggle/3362/media/woof_meow.jpg',
+# create dataset version
+version = DatasetVersion.create_version(
+ dataset_name="Example",
+ version_name="Registering frame with mask"
 )
-    
-frame.add_annotation(mask_rgb=[0, 0, 0], labels=['cat'])
+
+# create frame with mask
+frame = SingleFrame(
+ source='https://s3.amazonaws.com/allegro-datasets/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/val/frankfurt/frankfurt_000000_000294_leftImg8bit.png',
+ mask_source='https://s3.amazonaws.com/allegro-datasets/cityscapes/gtFine_trainvaltest/gtFine/val/frankfurt/frankfurt_000000_000294_gtFine_labelIds.png'
+)
+
+# add frame to version
+version.add_frames([frame])
 ```

+To use the mask for pixel segmentation, define the pixel-label mapping for the DatasetVersion:
+
+```python
+version.set_masks_labels(
+ {(0,0,0): ["background"], (1,1,1): ["person", "sitting"], (2,2,2): ["cat"]}
+)
+```
+
+The relevant label is applied to all masks in the version according to the version’s mask-label mapping dictionary.
+
+### Registering Frames with Multiple Masks
+Frames can contain multiple masks. To add multiple masks, use the SingleFrame’s `masks_source` property. Input one of 
+the following:
+* A dictionary with mask string ID keys and mask URI values
+* A list of mask URIs. Number IDs are automatically assigned to the masks ( "00", "01", etc.)   
+
+```python
+frame = SingleFrame(source='https://s3.amazonaws.com/allegro-datasets/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/val/frankfurt/frankfurt_000000_000294_leftImg8bit.png',)
+
+# add multiple masks
+# with dictionary 
+frame.masks_source={"ID 1 ": "<mask_URI_1>", "ID 2": "<mask_URI_2>"}
+# with list
+frame.masks_source=[ "<mask_URI_1>", "<mask_URI_2>"]
+``` 
+
--- a/docs/hyperdatasets/previews.md
+++ b/docs/hyperdatasets/previews.md
@ -69,14 +69,6 @@ The following is an example of preview metadata.
        }
    ],
    "rois": [
-        {
-            "sources":["front"],
-            "label": ["right_lane"],
-            "mask": {
-                "id": "seg",
-                "value": [-1, 1, 255]
-            }
-        },
        {
            "sources": ["front"],
            "label": ["bike"],
--- a/docs/hyperdatasets/single_frames.md
+++ b/docs/hyperdatasets/single_frames.md
@ -35,8 +35,8 @@ For more information, see [Annotations](annotations.md).


 ### Masks
-A `SingleFrame` includes a URI link to a mask file if applicable. Masks correspond to raw data where the objects to be 
-detected in raw data are marked with colors in the masks.
+A `SingleFrame` can include a URI link to masks file if applicable. Masks correspond to raw data where the objects to be 
+detected are marked with colors or different opacity levels in the masks.

 For more information, see [Masks](masks.md).

@ -100,7 +100,12 @@ The panel below describes the details contained within a `frame`:
    
        * `id` - ID of the mask dictionary in `sources`.
        * `value` - RGB value of the mask.
-        
+    
+      :::info
+      The `mask` dictionary is deprecated. Mask labels and their associated pixel values are now stored in the dataset 
+      version’s metadata. See [Masks](masks.md).
+      :::
+  
    * `poly` (*[int]*) - Bounding area vertices.
    * `sources` (*[string]*) - The `id` in the `sources` dictionary which relates an annotation to its raw data source.

@ -112,11 +117,11 @@ The panel below describes the details contained within a `frame`:
    * `uri` - URI of the raw data.
    * `width` - Width of the image or video.
    * `height` - Height of the image or video.
-    * `mask` - Sources of masks used in the `rois`.
+    * `masks` - List of available masks.
    
-        * `id` - ID of the mask source. This relates a mask source to an ROI.
-        * `content_type` - The type of mask source. For example, `image/jpeg`.
-        * `uri` - URI of the mask source.
+        * `id` - Mask ID
+        * `content_type` - Mask type. For example, `image/jpeg`.
+        * `uri` - Mask URI
        * `timestamp`
             
    * `preview` - URI of the thumbnail preview image used in the ClearML Enterprise WebApp (UI)
--- a/docs/hyperdatasets/sources.md
+++ b/docs/hyperdatasets/sources.md
@ -7,12 +7,9 @@ Each frame contains `sources`, a list of dictionaries containing:
 * A `URI` pointing to the source data (image or video)
 * Sources for [masks](masks.md) used in semantic segmentation
 * Image [previews](previews.md), which are thumbnails used in the ClearML Enterprise WebApp (UI).
-
-`sources` does not contain: 
-* `rois` even though ROIs are directly associated with the images and `masks` in `sources`
-* ROI metadata, because ROIs can be used over multiple frames. 
-  
-Instead, frames contain a top-level `rois` array, which is a list of ROI dictionaries, where each dictionary contains a 
+ 
+`sources` does not contain ROI metadata, because ROIs can be used over multiple frames. Instead, frames contain a 
+top-level `rois` array, which is a list of ROI dictionaries, where each dictionary contains a 
 list of source IDs. Those IDs connect `sources` to ROIs.

 ## Examples 
--- a/docs/img/hyperdatasets/concepts_masks.png
+++ b/docs/img/hyperdatasets/concepts_masks.png
--- a/docs/img/hyperdatasets/concepts_masks_image_only.png
+++ b/docs/img/hyperdatasets/concepts_masks_image_only.png
--- a/docs/img/hyperdatasets/dataset_alpha_masks_1.png
+++ b/docs/img/hyperdatasets/dataset_alpha_masks_1.png
--- a/docs/img/hyperdatasets/dataset_alpha_masks_2.png
+++ b/docs/img/hyperdatasets/dataset_alpha_masks_2.png
--- a/docs/img/hyperdatasets/dataset_metadata.png
+++ b/docs/img/hyperdatasets/dataset_metadata.png
--- a/docs/img/hyperdatasets/dataset_pixel_masks_1.png
+++ b/docs/img/hyperdatasets/dataset_pixel_masks_1.png
--- a/docs/img/hyperdatasets/dataset_pixel_masks_2.png
+++ b/docs/img/hyperdatasets/dataset_pixel_masks_2.png