AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Joint discrete and continuous action spaces in Deep Reinforcement Learning: DACAN

Authors

;

Term

4. term

Education

Publication year

2021

Submitted on

Pages

11

Abstract

Dette arbejde handler om miljøer, hvor en agent skal vælge både mellem enkelte valgmuligheder (diskrete handlinger) og finjusterede bevægelser (kontinuerlige handlinger). Sådanne hybride handlingsrum er svære for traditionelle metoder i dyb forstærkningslæring (DRL), og at omsætte kontinuerlige handlinger til mange små diskrete trin giver ofte dårligere og mindre præcis adfærd. Det ses f.eks. i moderne spil med mus eller analog-stick, i robotik og i andre opgaver, hvor høj præcision er nødvendig. Vi præsenterer to måder at kombinere diskrete og kontinuerlige handlinger på: (1) en simpel sammenkobling af separate netværk for hver handlingstype og (2) en Actor-Critic-baseret metode, hvor en central kritiker vurderer flere aktører, der foreslår handlinger. Vi viser, at den simple kombination giver ustabil og suboptimal læring, hvilket peger på behovet for en mere sammenhængende tilgang. Vores central-kritiker-metode overgår vores Double DQN (DDQN)-baselines i DOOM-miljøet på VizDoom-scenarierne Deadly Corridor og Defend The Center: den når hurtigt bedre resultater end DDQN og forbedrer dem yderligere. Vi viser også, at metoden klarer sig markant bedre end DDQN, når handlingsrummet er stort, f.eks. når man øger antallet af diskrete valg for at efterligne præcision, hvor DDQN ikke skalerer godt.

This work addresses environments where an agent must make both simple choices (discrete actions) and fine-grained adjustments (continuous actions). Such hybrid action spaces are difficult for standard deep reinforcement learning (DRL), and forcing continuous actions into many discrete bins often reduces precision and performance. This arises in modern games with mouse or analog stick input, in robotics, and in other tasks that demand high precision. We present two ways to combine discrete and continuous actions: (1) a straightforward coupling of separate networks for each action type, and (2) an Actor-Critic approach in which a central critic evaluates multiple actors that propose actions. We find that the simple combination leads to unstable, suboptimal learning, underscoring the need for a more coherent method. Our central-critic approach outperforms our Double DQN (DDQN) baselines in the DOOM environment on the VizDoom scenarios Deadly Corridor and Defend The Center: it quickly surpasses DDQN and continues to improve. We also show that the approach significantly outperforms DDQN when action spaces are large, for example when adding precision by discretizing actions, where DDQN does not scale well.

[This summary has been rewritten with the help of AI based on the project's original abstract]