Exploring Unknown Environments with UPPAAL STRATEGO: Reinforcement Learning for Drone Navigation and Pump Localization
Term
4. term
Education
Publication year
2024
Submitted on
2024-06-06
Pages
23
Abstract
We consider the problem of using autonomous Unmanned Aerial Vehicles (UAVs), also known as drones, to explore, map and find points of interest (POIs) in an unknown room. The paper proposes employing Reinforcement Learning (RL) to enable the drone to map and explore these unknown rooms. We propose modelling the problem as a Markov Decision Process (MDP) in UPPAAL STRATEGO, which utilizes Q-learning to synthesize a near-optimal policy. This policy will be used to generate a sequence of actions that will be activated on the drone. Additionally, we implement the framework STOMPC as a stochastic model predictive controller, to capture the uncertainties of the room and the dynamics of the drone. STOMPC achieves this by giving UPPAAL STRATEGO updated information about the new true state of the drone after activating all the actions in the sequence. We also employ two different shields, a learning shield and a runtime shield, used to enforce safety constraints on the actions the drone can take. The drone used in this work is equipped with a LiDAR sensor and an IMU sensor providing odometry data. We employ Robot Operating System (ROS) to control the drone. ROS also provides us with a simultaneous localization and mapping (SLAM) framework called Slam Toolbox, which we use to update the map given to UPPAAL STRATEGO. To validate our proposed approach, we use the tool Gazebo Simulator to simulate an X500 drone and the room that the drone should map and explore. We compare our approach to a Breadth First Search (BFS) based approach. We show that our approach manages to fully explore a room and examine all POIs with 33 fewer actions activated on average while using marginally more time. Options to further reduce the completion time for our approach are also presented. We also show the generality of our approach, by mapping and exploring rooms of different sizes and shapes. Lastly, we show the proposed method working on a real-life TurtleBot3 robot.
Keywords
reinforcement learning ; uppaal ; uppaal stratego ; machine learning ; drone ; ros ; gazebo ; stompc ; slam
Documents
