• Nichlas Ørts Lisby
  • Thomas Højriis Knudsen
4. semester, Software, Kandidat (Kandidatuddannelse)
In this work we tackle the problem of domains with hybrid action spaces, i.e. both discrete and continuous.
These environments have proven challenging for traditional Deep Reinforcement Learning (DRL) methods, and may be sub-optimally handled by using discretized continuous actions.
The addition of continuous actions are especially ideal for modern games, with input from devices like a mouse or an analog stick.
Other relevant domains for continuous actions include robotics and other tasks where you cannot achieve sufficient precision with a limited number of predefined actions.
While discrete Deep Reinforcement learning agents can be modified to work in such environments, they typically struggle when high precision is required.
We introduce two different methods for combining discrete and continuous action spaces, the first being a naive combination of continuous and discrete networks and the other being an Actor-Critic based approach, with a central critic that can critique the various actors.
We show that the naive combination of networks result in sub-optimal and unstable learning, and thereby confirming the need for a method in which continuous and discrete actions can be combined in a sensible and coherent way.
Our central critic approach outperforms our Double DQN (DDQN) baselines in the DOOM environment on the VizDOOM scenarios Deadly Corridor and Defend The Center.
It quickly reaches a score which is better than the DDQN baselines and then further improves the score.
We also show that our approach significantly outperforms DDQN when using large actions spaces, for example to introduce precision in discretized actions, in which the DDQN will not scale properly.
SprogEngelsk
Udgivelsesdato7 jun. 2021
Antal sider11
ID: 414111691