From Heightmaps to Cameras: Teacher-Student Reinforcement Learning for Rover Navigation: Master’s thesis
Translated title
From Heightmaps to Cameras: Teacher-Student Reinforcement Learning for Rover Navigation
Author
Sørensen, Thomas Schou
Term
4. semester
Education
Publication year
2025
Submitted on
2025-06-04
Pages
69
Abstract
Specialet undersøger en lærer–elev-tilgang baseret på DAgger, en form for imitationslæring hvor en elev iterativt lærer af en lærer, for at overføre navigationsadfærd fra en politik, der bruger et privilegeret højdekort, til en der skal fungere med støjende RGB‑D-data (kombineret farve og dybde). Formålet er at støtte læring under realistiske sensorbetingelser, hvor forstærkningslæring på RGB‑D er vanskelig, fordi observationerne er delvise og højdimensionelle. En simuleringspipeline blev udvidet på RobuROC4-platformen, så lærer og elev kunne modtage forskellige sensorinput. Selvom eleven i stigende grad matchede lærerens handlinger, generaliserede den ikke og løste ikke opgaven effektivt. Dette tilskrives hukommelsesbegrænset håndtering af datasæt og begrænset diversitet i de indsamlede træningskørsler. Resultaterne peger på, at lærer–elev-baseret imitationslæring har potentiale til overførsel mellem sensormodaliteter, men afhænger af skalerbar infrastruktur, der kan understøtte store og varierede træningsdatasæt.
This thesis examines a teacher–student approach based on DAgger, an imitation learning method where a student iteratively learns from a teacher, to transfer navigation skills from a policy that uses privileged heightmap input to one that must work with noisy RGB‑D data (combined color and depth). The aim is to support learning under realistic sensor conditions, where reinforcement learning on RGB‑D remains difficult because observations are partial and very high‑dimensional. A simulation pipeline was extended on the RobuROC4 platform so the teacher and student could receive different sensor inputs. Although the student increasingly matched the teacher’s actions, it did not generalize or accomplish the task effectively. This is attributed to memory‑bound dataset handling and limited diversity in the collected rollouts. Overall, the findings suggest that teacher–student imitation learning is promising for transferring between sensor modalities, but it depends on scalable infrastructure that can handle large, diverse training datasets.
[This summary has been rewritten with the help of AI based on the project's original abstract]
Keywords
Reinforcement learning ; Imitation learning ; teacher-student ; DAgger ; Isaac Sim ; Isaac Lab ; RLRoverLab ; RobuROC4 ; RB Summit ; Leo Rover ; Simulation ; Camera ; RGB-D ; Heightmap
Documents
