# Segmentation Masks and Metadata

Celantur container can generate two different segmentation masks and metadata per processed image:

* Binary Segmentation
* Instance Segmentation

It's activated with the `--save-mask {all, instance, binary}` parameter. The segmentation is saved as a PNG file.

<figure><img src="https://1992407480-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fr2aH3h1rP4SjkhFlr7Mp%2Fuploads%2FUAJlWxHdEVBYQAnU1WAL%2Fimage.png?alt=media&#x26;token=79e8b759-cb12-429a-9483-a94b0564e806" alt=""><figcaption><p>Anonymization, binary segmentation and instance segmentation applied to an image with Celantur software.</p></figcaption></figure>

## Binary Segmentation

The binary segmentation mask consist of two colors:

* Background is black
* Anonymized segments are white

The file will be saved as `image-name_bin_mask.png`.

## Instance Segmentation&#x20;

#### v26.02.1 and later

In instance segmentation masks, the RGB color values are used to differentiate to individual instances/objects.

* The R (red) channel is 0.
* The G (green) channel encodes individual instances / objects.
* The B (blue) channel encodes the object type:

| Object type   | Blue channel value | Blue channel value in binary |
| ------------- | ------------------ | ---------------------------- |
| Person        | `128`              | `1000 0000`                  |
| License plate | `64`               | `0100 0000`                  |
| Face / head   | `32`               | `0010 0000`                  |
| Vehicle       | `16`               | `0001 0000`                  |

When objects overlap, their color codes are combined using the binary OR operation, e.g. overlap of license plate and vehicle results in 80 (16 + 64), overlap of two persons remains 128. &#x20;

E.g. `[0, 85, 16]` is a vehicle. 16 identifies the object type, 85 identifies the instance from others.

The file will be saved as `image-name_ins_maks.png`.

#### Pre v26.02.1

In instance segmentation masks, the RGB color values are used to differentiate to individual instances/objects.

* The R (red) channel is 0.
* The G (green) channel encodes individual instances / objects.
* The B (blue) channel encodes the object type:
  * Person: `128`&#x20;
  * License plate: `64`
  * Face: `192`
  * Vehicle: `255`

E.g. `[0, 85, 192]` is a face.

The file will be saved as `image-name_ins_maks.png`.

## Scale Down Mask Files

By adding the optional `--mask-scale {0-100}` (CLI) or `/v1/file/1/instance-mask?mask-scale={0-100}` (Container API) parameter, mask files will be scaled down by the specified ratio.

## Metadata

### Image metadata

Metadata about detected instances/objects are stored in the corresponding `image-name.json` file.

### Image metadata example

```json
{
    "id": "image-name.jpg",
    "detections": [
    {
        "id": 0,
        "parent_image": "image-name.jpg",
        "offset": [
            1560,
            744
        ],
        "bbox": [
            1957,
            744,
            3003,
            1855
        ],
        "type": 103,
        "score": 0.9993754029273987,
        "is_anonymised": true,
        "type_label": "face",
        "color": null
    },
    "size": [
        3456,
        5184
    ],
    "duration": 1.8533296539999355,
    "filename": "image-name.jpg",
    "folder": null
}
```

### Video metadata

Metadata is generated for individual frames and provided as a range of mulitple frames.&#x20;

* **Batch/Stream mode:** \
  The information is stored in `filename-[startframe]-[endframe].json` in the output directory. The maximum number of frames covered by a single file is 500.
* **REST API mode:** \
  The infromation can be retrieved via the [#download-video-metadata-json](https://doc.celantur.com/container/rest-api-v1-mode#download-video-metadata-json "mention") endpoint.

### Video metadata example

```json
[
  {
    "id": 0,
    "detections": [
      {
        "id": 0,
        "parent_image": 0,
        "offset": [
          3,
          4
        ],
        "bbox": [
          3,
          4,
          201,
          171
        ],
        "type": 103,
        "score": 0.8267934918403625,
        "is_anonymised": true,
        "type_label": "face",
        "color": null
      }
    ],
    "size": [
      178,
      320
    ],
    "duration": 0.35388135999892256
  },
  {
    "id": 1,
    "detections": [
      {
        "id": 0,
        "parent_image": 1,
        "offset": [
          0,
          6
        ],
        "bbox": [
          0,
          20,
          137,
          175
        ],
        "type": 103,
        "score": 0.8945223689079285,
        "is_anonymised": true,
        "type_label": "face",
        "color": null
      }
    ],
    "size": [
      178,
      320
    ],
    "duration": 0.2979645720006374
  },
  ...
]
```

### Metadata attribute reference

#### Attributes of a image or video frame

<table><thead><tr><th width="171">Attribute</th><th>Description</th></tr></thead><tbody><tr><td>id</td><td>The id of the image (file name) or video frame (sequential number)</td></tr><tr><td>detections</td><td>List of detections, see <a data-mention href="#detected-instances-objects-provided-as-a-list-under-the-detections-attribute">#detected-instances-objects-provided-as-a-list-under-the-detections-attribute</a>   </td></tr><tr><td>size</td><td>Size of the image or frame in [width, height]</td></tr><tr><td>duration</td><td>Duration of processing (inference and anonymization) an image or video frame. Does not include IO, e.g. read/write from hard drive.</td></tr><tr><td>filename</td><td>Name of the file</td></tr><tr><td>folder</td><td>Name of the folder (relative to root input folder)</td></tr></tbody></table>

#### Detected instances/objects provided as a list under the `detections` attribute

<table><thead><tr><th width="171">Attribute</th><th>Description</th></tr></thead><tbody><tr><td>id</td><td>The id of the detection, a sequential number starting from 0.</td></tr><tr><td>parent_image</td><td>The name of the image or the id of the video frame the detected instance/object was found on</td></tr><tr><td>offset</td><td>The offset of the detection's bounding box from the upper left corner of the image (x/y coordinates in pixels)</td></tr><tr><td>bbox</td><td>The coordinates of the detection's bounding box (x1, y1, x2, y2)</td></tr><tr><td>score</td><td>The detection's confidence score. States how confident the model is about the detection being a specific label (see type_label) between 0.0 and 1.0.</td></tr><tr><td>is_anonymized</td><td>Specifies whether the detection was anonymized (or detected with <code>method = detect</code>). </td></tr><tr><td>type_label</td><td>The detection's label assigned by the model. E.g. face, license plates, etc.</td></tr><tr><td>type</td><td>Numerical representation of the <code>type_label</code>. </td></tr><tr><td>color</td><td>The detections color (RGB) in the instance segmentation mask.</td></tr><tr><td>duration</td><td>Processing duration for a video frame (only for videos).</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doc.celantur.com/container/usage/segmentation-masks-and-metadata.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
