Learning-Based Decision Making in a Competitive Card Game
Author
Loós, Tamás
Term
4. term
Education
Publication year
2026
Submitted on
2026-02-24
Pages
46
Abstract
This thesis studies behavioral cloning (imitation learning) to teach an AI to play Scripts of Tribute, a competitive deck-building card game used as a testbed at the IEEE Conference on Games AI Competition. We trained a neural network to imitate an expert Monte Carlo Tree Search (MCTS) bot using data from 6,400 games (673,619 decisions). The model matched the expert’s move about 59% of the time, roughly 15 percentage points above the strongest trivial strategy. This accuracy plateaued across five model sizes (84K–2.3M parameters) and two training methods (behavioral cloning and DAgger, a data aggregation approach that adds expert corrections), suggesting the limit is not model capacity but the teacher’s reliance on search information unavailable to the student at prediction time. To fix a mismatch where actions are encoded by position but inputs ignore card order, we introduce a pointer-based architecture that scores actions by card content rather than slot. The deployed bot (457K parameters) makes decisions in about 5 ms and achieves a 36.9% overall win rate against eleven competition bots. It beats heuristic opponents but wins only 12–15% against the strongest MCTS-based bots. This gap aligns with the compounding-error problem in behavioral cloning, where small mistakes snowball without corrective search.
Denne afhandling undersøger behavioral cloning (imiteringslæring) til at lære en AI at spille Scripts of Tribute, et konkurrencepræget deck-building kortspil, der bruges som testbed i IEEE Conference on Games’ AI-konkurrence. Vi trænede et neuralt netværk til at efterligne en ekspertbot baseret på Monte Carlo Tree Search (MCTS) ved at bruge data fra 6.400 spil (673.619 beslutninger). Modellen valgte samme træk som eksperten i ca. 59% af situationerne, omkring 15 procentpoint over den stærkeste trivielle strategi. Denne nøjagtighed lå fast på tværs af fem modelstørrelser (84.000–2,3 mio. parametre) og to træningsmetoder (behavioral cloning og DAgger, en metode der løbende tilføjer ekspertkorrektioner), hvilket tyder på, at begrænsningen ikke er modelstørrelsen, men at læreren bruger søgeinformation, som eleven ikke har adgang til ved forudsigelse. For at afhjælpe et misforhold mellem, at handlinger kodes efter position, mens inputtet ignorerer kortenes rækkefølge, indfører vi en pointer-baseret arkitektur, der vurderer handlinger ud fra kortindhold fremfor placering. Den implementerede bot (457.000 parametre) træffer beslutninger på ca. 5 ms og opnår en samlet sejrsrate på 36,9% mod elleve konkurrencebots. Den slår heuristiske modstandere, men vinder kun 12–15% mod de stærkeste MCTS-baserede bots. Denne forskel er i tråd med det kendte compounding-error problem i behavioral cloning, hvor små fejl ophobes over tid uden korrigerende søgning.
[This apstract has been rewritten with the help of AI based on the project's original abstract]
Keywords
