Efficient Supervised Reinforcement Learning in Backgammon
Author
Jensen, Boris
Term
10. term
Publication year
2009
Abstract
Reinforcement learning is effective but, in practice, requires large amounts of experience and suffers from poor initial performance. This thesis investigates how supervised reinforcement learning can mitigate the cold-start problem in backgammon by employing a teacher to guide early decisions while preserving the agent’s ability to discover new strategies and potentially surpass the supervisor. The work adapts a supervised actor-critic model to backgammon, proposes three variants that differ only in how the teacher’s influence is gradually withdrawn, and addresses the large state space via function approximation (including neural networks and Kanerva coding). Because no standard metric exists to quantify improvement beyond the supervisor, an ad hoc performance measure and a testbed with a benchmark opponent are introduced to steer and evaluate experiments. Using this measure, the results indicate that supervised reinforcement learning can ensure adequate initial performance in backgammon while still enabling learning that can exceed the teacher’s level.
Forstærkningslæring er effektiv, men lider i praksis af behov for store mængder erfaring og svag begyndelsesydelse. Dette speciale undersøger, hvordan superviseret forstærkningslæring kan afhjælpe den dårlige startydelse i backgammon ved at lade en lærer styre de tidlige beslutninger, samtidig med at agenten bevarer muligheden for at opdage nye strategier og potentielt overgå læreren. Arbejdet tilpasser en supervised actor-critic-model til backgammon, præsenterer tre varianter der kun adskiller sig i, hvordan lærerens indflydelse gradvist udfases, og adresserer den store tilstandsrumsproblematik via funktionsapproksimation (bl.a. neurale netværk og Kanerva-kodning). Da der ikke findes en standard for at måle forbedring ud over lærerens niveau, foreslås et ad hoc-mål og et testsetup med en benchmark-modstander til at styre og evaluere eksperimenter. Resultaterne med det foreslåede mål indikerer, at superviseret forstærkningslæring kan sikre acceptabel begyndelsesydelse i backgammon og stadig give rum for læring, der kan overgå lærerens niveau.
[This apstract has been generated with the help of AI directly from the project full text]
