AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Solving Euclidean Markov Decision Processes with Neural Networks

Author

Term

4. term

Publication year

2021

Submitted on

Pages

26

Abstract

Applying machine learning to cyber-physical systems—systems that combine computation and physical processes—can be costly or risky if training relies on trial and error on the real system. To reduce this risk, we train on a formal mathematical model instead: Priced Timed Markov Decision Processes (PTMDPs) over a Euclidean (continuous) state space, which capture time, costs, and randomness. We use neural networks to learn strategies, meaning decision rules for how the system should act. We implement a Deep Q-Network (DQN) in the Uppaal Stratego tool. We run a sweep over DQN hyperparameters, select three promising settings, and compare them with the current state-of-the-art optimization algorithm in Uppaal Stratego. Our results show that, with suitable hyperparameters, DQN can find the optimal strategy for simple models in fewer runs than the current method, and it can find better strategies for some more complex models. However, within the hyperparameter configurations we tested, we did not obtain improvements for all models.

At bruge maskinlæring på cyber-fysiske systemer—systemer der kombinerer beregning og fysiske processer—kan være dyrt eller risikabelt, hvis man træner ved forsøg og fejl på selve systemet. For at mindske denne risiko træner vi i stedet på en formel matematisk model: prissatte tidsbestemte Markov-beslutningsprocesser (PTMDP) over et euklidisk (kontinuerligt) tilstandsrum, som beskriver tid, omkostninger og tilfældighed. Vi bruger neurale netværk til at lære strategier, det vil sige beslutningsregler for, hvordan systemet bør handle. Vi implementerer et Deep Q-Network (DQN) i værktøjet Uppaal Stratego. Vi udfører en systematisk afsøgning af DQN-hyperparametre, udvælger tre lovende indstillinger og sammenligner dem med den nuværende state-of-the-art optimeringsalgoritme i Uppaal Stratego. Vores resultater viser, at DQN med passende hyperparametre kan finde den optimale strategi for simple modeller med færre kørsler end den nuværende metode og kan finde bedre strategier for nogle mere komplekse modeller. Dog opnåede vi ikke forbedringer for alle modeller inden for de afprøvede hyperparameterkonfigurationer.

[This apstract has been rewritten with the help of AI based on the project's original abstract]