AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Curiosity-driven Planning with Reinforcement Learning

Author

Term

4. term

Publication year

2023

Submitted on

Pages

12

Abstract

Reinforcement learning (RL) often struggles when external rewards are rare. By contrast, humans and animals learn in such settings because curiosity provides an intrinsic drive to explore novelty. This thesis examines whether adding curiosity can help a model-based RL agent learn and explore in a visual environment with sparse extrinsic rewards. We present a curiosity-driven, model-based RL agent that learns a compact latent representation of the visual input and uses it as the basis for decision making. The agent employs Random Network Distillation (RND) to produce episodic intrinsic rewards—treating hard-to-predict states as more novel—and guides its choices with Monte Carlo Tree Search (MCTS), a planning method that looks ahead by simulating action sequences. We show that curiosity improves learning: our agent solves a sparse-reward visual task that neither a model-free agent nor a model-based agent without curiosity can solve. Finally, we explore building a world model to serve as the simulation environment for MCTS.

Forstærkningslæring (RL) har det svært, når ydre belønninger er sjældne. Til gengæld kan mennesker og dyr lære i sådanne situationer, fordi nysgerrighed giver en indre drivkraft til at udforske det nye. I denne afhandling undersøger vi, om nysgerrighed kan forbedre en modelbaseret RL-agents læring og fremme udforskning i et visuelt miljø med sparsomme ydre belønninger. Vi præsenterer en nysgerrighedsdrevet, modelbaseret RL-agent, der lærer en kompakt latent repræsentation af miljøets billeder og bruger den som input. Agenten anvender Random Network Distillation (RND) til at skabe episodiske indre belønninger, hvor svært at forudsige tilstande behandles som mere nye, og planlægger sine handlinger med Monte Carlo Tree Search (MCTS), en metode der ser fremad ved at simulere mulige handlingsforløb. Vi viser, at nysgerrighed forbedrer læringen: Vores agent løser et visuelt miljø med sparsomme belønninger, som hverken en modelfri eller en modelbaseret agent uden nysgerrighed kan løse. Til sidst undersøger vi muligheden for at bygge en verdensmodel, der kan bruges som simuleringsmiljø for MCTS.

[This apstract has been rewritten with the help of AI based on the project's original abstract]