AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Reinforcement Learning-Driven Drone Navigation in Simulated Environments Using Stochastic Model-Predictive Control and UPPAAL STRATEGO

Authors

;

Term

4. term

Education

Publication year

2024

Submitted on

Pages

38

Abstract

Drones are increasingly used in industry for service and repair in complex settings. Together with Grundfos, a Danish pump company, we explore whether reinforcement learning (an AI method that learns by trial and error from rewards) can help a drone navigate technical rooms at customer sites and carry out simple tasks. The drone’s main job is to find pumps in an unfamiliar room while building a map as it flies. We connect UPPAAL STRATEGO, a tool for synthesizing strategies under uncertainty, with the Robot Operating System (ROS), which handles drone control in a simulated environment. Through co-simulation, the controller and the virtual world run together so the system can use real-time data to update its strategy continuously. Drawing on the Stochastic Model-Predictive Control (STOMPC) framework used in other domains, we implement a proof of concept that shows the promise of this approach. In our experiments, the reinforcement learning setup outperforms a basic baseline method. In one ideal configuration, it locates a single pump in about 4 minutes on average, compared with 14 minutes for the baseline. We also compare two reward engineering designs to see which one achieves the fastest task completion.

Droner bruges i stigende grad i industrien til service og reparation i komplekse omgivelser. I samarbejde med Grundfos, en dansk pumpevirksomhed, undersøger vi, om forstærkningslæring (en AI-metode, der lærer ved forsøg og fejl ud fra belønninger) kan hjælpe en drone med at navigere i tekniske rum hos kunder og løse enkle opgaver. Dronens hovedopgave er at finde pumper i et ukendt rum og samtidig opbygge et kort under flyvningen. Vi kobler UPPAAL STRATEGO, et værktøj til at syntetisere strategier under usikkerhed, sammen med Robot Operating System (ROS), som styrer dronen i et simuleret miljø. Gennem co-simulation kører controller og virtuel verden side om side, så systemet kan bruge realtidsdata til løbende at opdatere sin strategi. Med afsæt i Stochastic Model-Predictive Control (STOMPC), som er anvendt i andre domæner, implementerer vi et proof of concept, der viser potentialet i tilgangen. I vores eksperimenter overgår den RL-baserede tilgang en simpel baseline. I en ideel konfiguration finder løsningen en enkelt pumpe på cirka 4 minutter i gennemsnit mod 14 minutter for baseline. Vi sammenligner også to forskellige tilgange til reward engineering for at se, hvilken der giver hurtigst opgaveafslutning.

[This apstract has been rewritten with the help of AI based on the project's original abstract]