Learning utility Functions by Imputing

Studenteropgave: Kandidatspeciale og HD afgangsprojekt

  • Anders Hansen
  • Nicolaj Lock
  • Peter Poulsen
4. semester, Datalogi, Kandidat (Kandidatuddannelse)
In this project two methods, called Utility Iteration and Imputing by Comparison, are developed. These methods learn the utilities of an observed agent, such that its preferences can be modeled in an influence diagram. The utilities are learned from the behavior of the agent, by creating constraints based on the observed decisions made by it. The two methods are designed to handle agents that change behavior, using different policies to handle conflicting behavior. The methods have much in common with FLUF, the main difference being how partially observed strategies are handled. Where FLUF relaxes constraints to ensure that the true utility function is not excluded, then Utility Iteration and Imputing by Comparison impute observations to make the strategy fully observed, thereby removing the need to relax constraints. Experiments are conducted with the two new methods and with FLUF. Three different kinds of changing behavior are defined, and experiments are conducted with respect to each kind. Both in the experiments with changing behavior and the experiment with static behavior, the new methods achieved better results than FLUF, both with regard to accuracy and learning speed. It is concluded that, under the assumption made about the domain in this project, imputing observations will yield a higher accuracy than relaxing constraints.
Udgivelsesdatojun. 2005
ID: 61065150