AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


From Heightmaps to Cameras: Teacher-Student Reinforcement Learning for Rover Navigation: Master’s thesis

Translated title

From Heightmaps to Cameras: Teacher-Student Reinforcement Learning for Rover Navigation

Author

Term

4. semester

Education

Publication year

2025

Submitted on

Pages

69

Abstract

Specialet undersøger en lærer–elev-tilgang baseret på DAgger, en form for imitationslæring hvor en elev iterativt lærer af en lærer, for at overføre navigationsadfærd fra en politik, der bruger et privilegeret højdekort, til en der skal fungere med støjende RGB‑D-data (kombineret farve og dybde). Formålet er at støtte læring under realistiske sensorbetingelser, hvor forstærkningslæring på RGB‑D er vanskelig, fordi observationerne er delvise og højdimensionelle. En simuleringspipeline blev udvidet på RobuROC4-platformen, så lærer og elev kunne modtage forskellige sensorinput. Selvom eleven i stigende grad matchede lærerens handlinger, generaliserede den ikke og løste ikke opgaven effektivt. Dette tilskrives hukommelsesbegrænset håndtering af datasæt og begrænset diversitet i de indsamlede træningskørsler. Resultaterne peger på, at lærer–elev-baseret imitationslæring har potentiale til overførsel mellem sensormodaliteter, men afhænger af skalerbar infrastruktur, der kan understøtte store og varierede træningsdatasæt.

This thesis examines a teacher–student approach based on DAgger, an imitation learning method where a student iteratively learns from a teacher, to transfer navigation skills from a policy that uses privileged heightmap input to one that must work with noisy RGB‑D data (combined color and depth). The aim is to support learning under realistic sensor conditions, where reinforcement learning on RGB‑D remains difficult because observations are partial and very high‑dimensional. A simulation pipeline was extended on the RobuROC4 platform so the teacher and student could receive different sensor inputs. Although the student increasingly matched the teacher’s actions, it did not generalize or accomplish the task effectively. This is attributed to memory‑bound dataset handling and limited diversity in the collected rollouts. Overall, the findings suggest that teacher–student imitation learning is promising for transferring between sensor modalities, but it depends on scalable infrastructure that can handle large, diverse training datasets.

[This summary has been rewritten with the help of AI based on the project's original abstract]