SC2OPT - Applying Reinforcement Learning and the Options Framework to StarCraft II.
Authors
Opstad, Bjørn Espen ; Mikkelsen, Mathias Corlin ; Jensen, Alexander Kaleta
Term
4. term
Education
Publication year
2018
Submitted on
2018-06-08
Pages
63
Abstract
Denne masteropgave undersøger, om options-rammen i forstærkningslæring kan håndtere kompleksiteten i StarCraft II via SC2LE/PySC2. Projektet nedbryder problemet i specialiserede “options”, som trænes i brugerdefinerede mini-miljøer, hver med et begrænset udsnit af spillets handlingsrum og designet til at lære delkompetencer til en kerneopgave: at bygge marines. En politik over options (en controller) vælger mellem de trænede options for at interagere med miljøet. Selvom de enkelte options-agenter viste læring i deres respektive træningsmiljøer, præsterede de ikke markant bedre end tilfældige agenter, og én klarede sig dårligere. Når controlleren med disse options blev anvendt i et mere komplekst miljø, opnåede den imidlertid resultater, der var betydeligt bedre end de tilgængelige reference-resultater fra DeepMind. En analyse af, hvor ofte de forskellige options blev valgt gennem mange episoder, viste desuden, at controlleren lærte at prioritere og sekvensere options hensigtsmæssigt. Samlet peger resultaterne på, at options-rammen er en brugbar tilgang til komplekse, multitask-problemer, der kan opdeles i delopgaver – også selvom de enkelte options ikke er optimale.
This thesis investigates whether the options framework in reinforcement learning can address the complexity of StarCraft II via the SC2LE/PySC2 interface. The project decomposes the problem into specialized options trained in custom mini-environments, each restricted to a subset of the game’s action space and designed to learn subskills for a core task: building marines. A policy over options (a controller) selects among these trained options to interact with the environment. Although individual option agents showed learning in their respective training environments, they did not perform significantly better than random agents, and one performed worse. Nevertheless, when the controller using these options was applied to a more complex environment, it achieved scores significantly higher than DeepMind’s available reference results. Analysis of option-selection frequencies over many episodes indicated that the controller learned to prioritize and sequence options appropriately. Overall, the results support the options framework as a viable approach for complex, multitask problems that can be divided into subtasks—even when the constituent options are not individually optimal.
[This summary has been generated with the help of AI directly from the project (PDF)]
Keywords
Documents
