Joint discrete and continuous action spaces in Deep Reinforcement Learning: DACAN
Authors
Lisby, Nichlas Ørts ; Knudsen, Thomas Højriis
Term
4. term
Education
Publication year
2021
Submitted on
2021-06-07
Pages
11
Abstract
Dette arbejde handler om miljøer, hvor en agent skal vælge både mellem enkelte valgmuligheder (diskrete handlinger) og finjusterede bevægelser (kontinuerlige handlinger). Sådanne hybride handlingsrum er svære for traditionelle metoder i dyb forstærkningslæring (DRL), og at omsætte kontinuerlige handlinger til mange små diskrete trin giver ofte dårligere og mindre præcis adfærd. Det ses f.eks. i moderne spil med mus eller analog-stick, i robotik og i andre opgaver, hvor høj præcision er nødvendig. Vi præsenterer to måder at kombinere diskrete og kontinuerlige handlinger på: (1) en simpel sammenkobling af separate netværk for hver handlingstype og (2) en Actor-Critic-baseret metode, hvor en central kritiker vurderer flere aktører, der foreslår handlinger. Vi viser, at den simple kombination giver ustabil og suboptimal læring, hvilket peger på behovet for en mere sammenhængende tilgang. Vores central-kritiker-metode overgår vores Double DQN (DDQN)-baselines i DOOM-miljøet på VizDoom-scenarierne Deadly Corridor og Defend The Center: den når hurtigt bedre resultater end DDQN og forbedrer dem yderligere. Vi viser også, at metoden klarer sig markant bedre end DDQN, når handlingsrummet er stort, f.eks. når man øger antallet af diskrete valg for at efterligne præcision, hvor DDQN ikke skalerer godt.
This work addresses environments where an agent must make both simple choices (discrete actions) and fine-grained adjustments (continuous actions). Such hybrid action spaces are difficult for standard deep reinforcement learning (DRL), and forcing continuous actions into many discrete bins often reduces precision and performance. This arises in modern games with mouse or analog stick input, in robotics, and in other tasks that demand high precision. We present two ways to combine discrete and continuous actions: (1) a straightforward coupling of separate networks for each action type, and (2) an Actor-Critic approach in which a central critic evaluates multiple actors that propose actions. We find that the simple combination leads to unstable, suboptimal learning, underscoring the need for a more coherent method. Our central-critic approach outperforms our Double DQN (DDQN) baselines in the DOOM environment on the VizDoom scenarios Deadly Corridor and Defend The Center: it quickly surpasses DDQN and continues to improve. We also show that the approach significantly outperforms DDQN when action spaces are large, for example when adding precision by discretizing actions, where DDQN does not scale well.
[This summary has been rewritten with the help of AI based on the project's original abstract]
Keywords
Deep Reinforcement Learning ; DACAN ; NNC ; MADDPG ; DDPG ; DQN ; Vizdoom ; Actor Critic ; Continuous actions ; Discrete actions ; Hybrid actions
Documents
