Priced Timed Automata and Monte Carlo Tree Search
Authors
Mortensen, Jeppe Høiriis ; Mijacika, Adriana
Term
4. term
Education
Publication year
2020
Submitted on
2020-06-12
Abstract
Planlægningsproblemer kan modelleres med Priced Timed Automata (PTA) og løses via omkostnings‑optimal reachability, men den enorme tilstandsrumsudforskning gør beregninger dyre. Denne afhandling præsenterer UCT‑PTA, en tilpasning af Monte Carlo Tree Search‑varianten UCT til PTA, implementeret i UPPAAL, med fokus på hvordan tid håndteres i søgetræet. Vi undersøger og afprøver tre varianter af tidsbehandling: Stochastic Delay UCT‑PTA (stokastiske tidsinkrementer), Delay Exploratory UCT‑PTA (udforskning af alle mulige tidsinkrementer) og Non‑Lazy UCT‑PTA (udforskning med non‑lazy tidsinkrementer). Derudover introducerer vi udvidelser, herunder træbeskæring ved at “tage et step” og partial order reduction, for at forbedre søgeeffektiviteten. Metoderne evalueres på job shop‑planlægningsproblemer modelleret i UPPAAL Cora, herunder avislæseproblemet med varierende antal personer, samt på nogle task‑graph‑eksempler. I vores eksperimenter giver Non‑Lazy UCT‑PTA de bedste resultater blandt de undersøgte varianter, omend løsningerne ikke altid er omkostningsoptimale, og de foreslåede udvidelser forbedrer den samlede ydeevne. Arbejdet viser potentialet i MCTS‑baseret udforskning til PTA i store tilstandsrum og peger på videre forbedringer af tidsbehandling og reduktionsteknikker.
Scheduling problems can be modeled with Priced Timed Automata (PTA) and solved via cost‑optimal reachability, but the state‑space explosion makes this computationally demanding. This thesis presents UCT‑PTA, an adaptation of the Monte Carlo Tree Search variant UCT to PTAs, implemented in UPPAAL, with a particular focus on how to handle time within the search. We propose and test three time‑handling variants: Stochastic Delay UCT‑PTA (stochastic time increments), Delay Exploratory UCT‑PTA (exploring all possible time increments), and Non‑Lazy UCT‑PTA (exploring non‑lazy time increments). We further add extensions—tree pruning by making a step and partial order reduction—to improve search performance. The approach is evaluated on job shop scheduling models in UPPAAL Cora, including the Newspaper problem with varying numbers of readers, and on some task‑graph cases. In our experiments, Non‑Lazy UCT‑PTA yields the best results among the studied variants, although solutions are not always cost‑optimal, and the proposed extensions improve overall performance. The results highlight the potential of MCTS‑based exploration for PTAs in large state spaces and motivate further work on time handling and reduction techniques.
[This summary has been generated with the help of AI directly from the project (PDF)]
Documents
