AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


The Impact of Indirect Advantages on Artificial Intelligence Agents' Behaviour in Video Games

Author

Term

4. term

Education

Publication year

2022

Submitted on

Pages

61

Abstract

This thesis investigates whether AI game agents given indirect cooperative advantages and simple communication/coordination methods can develop new, less predictable behaviors compared with hand‑crafted, ad hoc designs. A prototype was built in Unity’s game engine with two opposing agent types: Monster Agents and Friendly NPC Agents. The Friendly NPCs were implemented in three variants: (1) a Finite State Machine (a simple rule‑based model), (2) a Reinforcement Learning agent with a perception bonus (access to extra sensory information), and (3) a Reinforcement Learning agent with pack tactics (encouraging coordinated movement). The RL agents were trained against the Monster Agents. Each Friendly NPC variant then ran for 12 hours to log positions and game scores. In score‑based tests, the ad hoc Finite State Machine significantly outperformed both Reinforcement Learning variants at completing objectives. Heatmaps of positions showed more chaotic, unpredictable behavior from the Reinforcement Learning agents. Because training time was limited, the specific impact of the indirect advantages could not be determined, and the hypothesis could not be accepted.

Dette speciale undersøger, om AI-spilagenter, der får indirekte samarbejdsfordele og simple kommunikations-/koordineringsmetoder, kan udvikle nye og mindre forudsigelige adfærdsmønstre sammenlignet med hånddesignede ad hoc‑løsninger. Der blev bygget en prototype i Unitys spilmotor med to modstående agenttyper: Monster‑agenter og Venlige NPC‑agenter. De venlige NPC’er blev lavet i tre varianter: (1) en tilstandsautomat (Finite State Machine, en enkel regelbaseret model), (2) en forstærkningslæringsagent med perception‑bonus (adgang til ekstra sanseinformation) og (3) en forstærkningslæringsagent med floktaktik (opmuntrer til koordineret bevægelse). RL‑agenterne blev trænet mod monster‑agenterne. Derefter kørte hver variant af venlig NPC i 12 timer for at logge positioner og spilscores. I scoretests klarede den ad hoc‑baserede tilstandsautomat sig markant bedre end begge forstærkningslæringsvarianter til at fuldføre målene. Varmekort over positioner viste mere kaotisk og uforudsigelig adfærd fra forstærkningslæringsagenterne. På grund af den begrænsede træningstid kunne den præcise effekt af de indirekte fordele ikke fastslås, og hypotesen kunne derfor ikke accepteres.

[This apstract has been rewritten with the help of AI based on the project's original abstract]