Segmentation of RGB-D Indoor Scenes by Stacking Random Forests and Conditional Random Fields

Author

Thøgersen, Mikkel

Term

4. term

Education

Vision, Graphics and Interactive Systems, Master

Publication year

2015

Submitted on

2015-06-03

Abstract

This thesis tackles indoor semantic segmentation—assigning a category to every part of an indoor image, such as walls, floors, and furniture. We present a model based on the Multi-class Multi-scale Stacked Sequential Learning (MMSSL) framework, which combines simple components across multiple spatial scales. First, the image is partitioned into superpixels using SLIC, which groups neighboring pixels into small regions. This reduces data while preserving object boundaries. From each superpixel we extract color and depth cues and use them in a Conditional Random Field (CRF), a probabilistic model that encourages neighboring regions to have consistent labels, to predict categories. In parallel, a Random Forest classifier built on random offset features—descriptors sampled around each region to capture local context—provides an initial label estimate that is fed into the CRF. Finally, in a stacked stage, a second Random Forest refines the result. It analyzes a confidence map (per-class scores) at multiple spatial scales and corrects likely mistakes. The model is trained and tested on the widely used NYU‑v2 dataset of indoor scenes. The results show that, with simple features, the MMSSL framework can achieve better performance than similar methods.

Dette speciale tager fat på indendørs semantisk segmentering—at tildele en kategori til hver del af et indendørs billede, fx vægge, gulve og møbler. Vi præsenterer en model baseret på Multi-class Multi-scale Stacked Sequential Learning (MMSSL), som kombinerer simple komponenter på flere skalaer. Først opdeles billedet i superpixels med SLIC, der samler nabopixels i små regioner. Det reducerer datamængden, men bevarer skarpe objektgrænser. Fra hver superpixel udtrækker vi farve- og dybdeoplysninger og bruger dem i et Conditional Random Field (CRF)—en probabilistisk model, der fremmer ensartede etiketter hos naboregioner—til at forudsige kategorier. Parallelt giver en Random Forest-klassifikator baseret på random offset-features (beskrivelser udtaget omkring hver region for at indfange lokal kontekst) et første bud på etiketter, som føres ind i CRF’et. Til sidst forfiner en anden Random Forest resultatet i et stakket trin. Den analyserer et konfidenskort (scorer for hver klasse) på flere rumlige skalaer og retter sandsynlige fejl. Modellen trænes og testes på det udbredte NYU‑v2-datasæt med indendørs scener. Resultaterne viser, at MMSSL-rammen med simple features kan opnå bedre ydeevne end lignende metoder.

[This abstract has been rewritten with the help of AI based on the project's original abstract]

Keywords

RGBD ; Segmentering ; semantik ; dybde ; Conditional Random Field ; Random Forest ; Stacked Learning

Documents

Download PDF
View record in AAU Student Projects

A master's thesis from Aalborg University

Segmentation of RGB-D Indoor Scenes by Stacking Random Forests and Conditional Random Fields