Panoptic segmentation in video

Author

Gunnarsson, Kristján Már

Term

4. term

Education

Vision, Graphics and Interactive Systems, Master

Publication year

2023

Submitted on

2023-06-02

Abstract

This thesis studies and applies panoptic segmentation in video to automate per-frame object masks in editing workflows. As part of Capture One’s AI Color Fashion initiative, the work aims to extend the software’s per-object color adjustments to video, where editors traditionally mask skin, clothing, and backgrounds manually. The core research question is: How can masking of different objects in a scene be automated frame by frame? The approach uses Detectron2 as the base framework, paired with a ResNet-101 backbone and a Feature Pyramid Network to handle multi-scale objects. The report details key components (including the region proposal network, ROI head, and semantic segmentation head), the training and inference setup, and practical considerations for high resolutions (up to 8K), such as increased computational load and real-time constraints. The tests and results section outlines learning behavior and showcases model outputs, and discusses evaluation with standard metrics such as Panoptic Quality, Recognition Quality, and Segmentation Quality. According to the project’s demonstrations, this combination shows robust capabilities in complex, multi-scale scenes; specific numbers and full results appear later in the report and are not included in this excerpt.

Denne afhandling undersøger og anvender panoptisk segmentering i video for at automatisere objektmasker frame for frame i redigeringsworkflows. Som led i Capture Ones AI Color Fashion-initiativ sigter arbejdet mod at udvide softwarets per-objekt farvejusteringer til video, hvor redaktører traditionelt maskerer hud, tøj og baggrunde manuelt. Den centrale forskningsspørgsmål er: Hvordan kan masking af forskellige objekter i en scene automatiseres frame for frame? Tilgangen bygger på Detectron2 som rammeværk, kombineret med en ResNet-101-backbone og et Feature Pyramid Network for at håndtere objekter på tværs af skalaer. Rapporten gennemgår nøglekomponenter (bl.a. region proposal network, ROI-head og semantisk segmenteringshead), opsætning til træning og inferens samt praktiske hensyn ved høj opløsning (op til 8K), herunder øget beregningskrav og realtidsudfordringer. Test- og resultatafsnittet beskriver læringsforløb og viser modeludfald samt drøfter evaluering med standardmål som Panoptic Quality, Recognition Quality og Segmentation Quality. Ifølge projektets demonstrationer udviser denne kombination robuste egenskaber i komplekse scener og på tværs af skalaer; konkrete tal og fulde resultater findes i rapportens senere kapitler og indgår ikke i dette uddrag.

[This abstract has been generated with the help of AI directly from the project full text]

Documents

Download PDF
View record in AAU Student Projects

A master's thesis from Aalborg University

Panoptic segmentation in video