AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Learning utility Functions by Imputing

Authors

; ;

Term

4. term

Publication year

2005

Abstract

I dette projekt introducerer vi to metoder—Utility Iteration og Imputing by Comparison—til at udlede, hvad en observeret agent værdsætter, baseret på de valg den træffer. Målet er at lære en nyttefunktion (en numerisk beskrivelse af agentens præferencer), som kan repræsenteres i et indflydelsesdiagram, en grafisk model over beslutninger, usikkerheder og udfald. Ud fra agentens beslutninger opstiller metoderne betingelser—krav som nyttefunktionen skal opfylde—så den observerede adfærd stemmer med de lærte præferencer. Metoderne er designet til at håndtere agenter, der ændrer adfærd over tid, og inkluderer regler til at håndtere modstridende observationer. De ligner en tidligere tilgang, FLUF, men adskiller sig i håndteringen af delvist observerede strategier. FLUF afslapper krav for at undgå at udelukke den sande nyttefunktion. Derimod imputerer Utility Iteration og Imputing by Comparison manglende observationer for at gøre strategien fuldt observeret, hvilket fjerner behovet for at afslappe krav. Vi gennemførte eksperimenter med statisk adfærd og tre definerede typer skiftende adfærd og sammenlignede de nye metoder med FLUF. I alle tilfælde lærte de nye metoder hurtigere og med større nøjagtighed. Under projektets antagelser om domænet giver imputering af observationer højere nøjagtighed end at afslappe krav.

This project introduces two methods—Utility Iteration and Imputing by Comparison—to infer what an observed agent values, based on the choices it makes. The goal is to learn a utility function (a numerical description of the agent’s preferences) that can be represented in an influence diagram, a graphical model of decisions, uncertainties, and outcomes. From the agent’s decisions, the methods build constraints—requirements the utility must satisfy—so the observed behavior aligns with the learned preferences. The methods are designed to handle agents that change behavior over time and include policies to resolve conflicting observations. They are closely related to a prior approach, FLUF, but differ in how they treat partially observed strategies. FLUF relaxes constraints to avoid excluding the true utility function. In contrast, Utility Iteration and Imputing by Comparison impute missing observations to make the strategy fully observed, removing the need to relax constraints. We ran experiments on static behavior and on three defined kinds of changing behavior, comparing the new methods with FLUF. In all cases, the new methods learned more accurately and faster. Under the project’s domain assumptions, imputing observations yields higher accuracy than relaxing constraints.

[This abstract was generated with the help of AI]