AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Using reinforcement learning in the context of computer games

Author

Term

4. term

Publication year

2004

Abstract

I denne afhandling undersøger jeg, hvor godt en agent, der bruger forstærkningslæring (maskinlæring, hvor handlinger belønnes eller straffes), kan lære og tilpasse sig ændringer i omgivelserne. Jeg sammenligner to metoder: basal Q-learning og en hierarkisk variant, MaxQ, der opdeler opgaven i mindre delopgaver. For at teste metoderne konstruerede jeg Flag Hunter, et enkelt turbaseret spil, hvor agenten skal nå modstanderens base, hente flaget og bringe det hjem. For at måle læring trænede jeg først agenterne uden modstander og derefter mod modstandere med forskellige grader af tilfældig adfærd (uforudsigelighed). Resultatet var, at agenten med MaxQ nåede målet markant hurtigere end basal Q-learning, men at begge metoder konvergerede i omtrent samme tempo. Jeg identificerer og diskuterer årsagerne til dette. For at teste tilpasningsevne blev agenterne først trænet uden modstander og derefter sat til at spille mod en modstander. Her klarede ingen af metoderne sig særlig godt; de tilpassede sig ikke effektivt til den ændrede situation.

This thesis examines how well an agent using reinforcement learning (a trial-and-reward approach) can learn and adapt when the environment changes. It compares two methods: basic Q-learning and a hierarchical approach, MaxQ, which breaks tasks into subgoals. To test them, I built Flag Hunter, a simple turn-based game where the agent must reach the opponent’s base, pick up a flag, and bring it back home. To measure learning, I first trained the agents without an opponent and then against opponents with different levels of randomness (unpredictability). The agent using MaxQ reached the goal significantly faster than basic Q-learning, but both methods converged at about the same rate. I identify and discuss reasons for this. To assess adaptation, the agents were trained without an opponent and then made to play against one. Neither method adapted well to the changed situation.

[This abstract was generated with the help of AI]