AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


SC2OPT - Applying Reinforcement Learning and the Options Framework to StarCraft II.

Authors

; ;

Term

4. term

Education

Publication year

2018

Submitted on

Pages

63

Abstract

Denne masteropgave undersøger, om options-rammen i forstærkningslæring kan håndtere kompleksiteten i StarCraft II via SC2LE/PySC2. Projektet nedbryder problemet i specialiserede “options”, som trænes i brugerdefinerede mini-miljøer, hver med et begrænset udsnit af spillets handlingsrum og designet til at lære delkompetencer til en kerneopgave: at bygge marines. En politik over options (en controller) vælger mellem de trænede options for at interagere med miljøet. Selvom de enkelte options-agenter viste læring i deres respektive træningsmiljøer, præsterede de ikke markant bedre end tilfældige agenter, og én klarede sig dårligere. Når controlleren med disse options blev anvendt i et mere komplekst miljø, opnåede den imidlertid resultater, der var betydeligt bedre end de tilgængelige reference-resultater fra DeepMind. En analyse af, hvor ofte de forskellige options blev valgt gennem mange episoder, viste desuden, at controlleren lærte at prioritere og sekvensere options hensigtsmæssigt. Samlet peger resultaterne på, at options-rammen er en brugbar tilgang til komplekse, multitask-problemer, der kan opdeles i delopgaver – også selvom de enkelte options ikke er optimale.

This thesis investigates whether the options framework in reinforcement learning can address the complexity of StarCraft II via the SC2LE/PySC2 interface. The project decomposes the problem into specialized options trained in custom mini-environments, each restricted to a subset of the game’s action space and designed to learn subskills for a core task: building marines. A policy over options (a controller) selects among these trained options to interact with the environment. Although individual option agents showed learning in their respective training environments, they did not perform significantly better than random agents, and one performed worse. Nevertheless, when the controller using these options was applied to a more complex environment, it achieved scores significantly higher than DeepMind’s available reference results. Analysis of option-selection frequencies over many episodes indicated that the controller learned to prioritize and sequence options appropriately. Overall, the results support the options framework as a viable approach for complex, multitask problems that can be divided into subtasks—even when the constituent options are not individually optimal.

[This summary has been generated with the help of AI directly from the project (PDF)]