Learning Optimal Scheduling for Time Uncertain Settings

Studenteropgave: Speciale (inkl. HD afgangsprojekt)

  • Jakob Haahr Taankvist
  • Peter Gjøl Jensen
4. semester, Datalogi, Kandidat (Kandidatuddannelse)
We present a method which learns strategies on Markov decision processes with time and price. The method will synthesize strategies, optimized towards minimizing the expected cost. The method is based on reinforced learning and is able to learn near-optimal strategies for Duration Probabilistic Automatons. The method is in large also applicable to the larger class of Priced Time Markov Decision Processes.

We develop a number of methods for different steps of the main learning algorithm, and empirically investigate their effect on the synthesized strategies. We also show that the methods presented outperform previously known automated for tools synthesizing schedulers for Duration Probabilistic Automata with an order of magnitude improvement in running time, while still obtaining the same schedulers down to a difference of 0.5 in the decision boundaries. All of the methods presented have been implemented in the tool UPPAAL. This enables us to synthesize strategies for cost-bounded reachability-objectives for games while also providing a (near-) optimal expected cost.
Udgivelsesdato10 jun. 2014
Antal sider50
ID: 198674942