Segmentation of RGB-D Indoor Scenes by Stacking Random Forests and Conditional Random Fields
Author
Thøgersen, Mikkel
Term
4. term
Publication year
2015
Submitted on
2015-06-03
Pages
89
Abstract
Dette speciale tager fat på indendørs semantisk segmentering—at tildele en kategori til hver del af et indendørs billede, fx vægge, gulve og møbler. Vi præsenterer en model baseret på Multi-class Multi-scale Stacked Sequential Learning (MMSSL), som kombinerer simple komponenter på flere skalaer. Først opdeles billedet i superpixels med SLIC, der samler nabopixels i små regioner. Det reducerer datamængden, men bevarer skarpe objektgrænser. Fra hver superpixel udtrækker vi farve- og dybdeoplysninger og bruger dem i et Conditional Random Field (CRF)—en probabilistisk model, der fremmer ensartede etiketter hos naboregioner—til at forudsige kategorier. Parallelt giver en Random Forest-klassifikator baseret på random offset-features (beskrivelser udtaget omkring hver region for at indfange lokal kontekst) et første bud på etiketter, som føres ind i CRF’et. Til sidst forfiner en anden Random Forest resultatet i et stakket trin. Den analyserer et konfidenskort (scorer for hver klasse) på flere rumlige skalaer og retter sandsynlige fejl. Modellen trænes og testes på det udbredte NYU‑v2-datasæt med indendørs scener. Resultaterne viser, at MMSSL-rammen med simple features kan opnå bedre ydeevne end lignende metoder.
This thesis tackles indoor semantic segmentation—assigning a category to every part of an indoor image, such as walls, floors, and furniture. We present a model based on the Multi-class Multi-scale Stacked Sequential Learning (MMSSL) framework, which combines simple components across multiple spatial scales. First, the image is partitioned into superpixels using SLIC, which groups neighboring pixels into small regions. This reduces data while preserving object boundaries. From each superpixel we extract color and depth cues and use them in a Conditional Random Field (CRF), a probabilistic model that encourages neighboring regions to have consistent labels, to predict categories. In parallel, a Random Forest classifier built on random offset features—descriptors sampled around each region to capture local context—provides an initial label estimate that is fed into the CRF. Finally, in a stacked stage, a second Random Forest refines the result. It analyzes a confidence map (per-class scores) at multiple spatial scales and corrects likely mistakes. The model is trained and tested on the widely used NYU‑v2 dataset of indoor scenes. The results show that, with simple features, the MMSSL framework can achieve better performance than similar methods.
[This abstract was generated with the help of AI]
Keywords
Documents
