Relational Data Mining Using Probabilistic Relational Models -- Design, Implementation and Test
Student thesis: Master thesis (including HD thesis)
- Morten Gade
- Michael Gade Nielsen
4. term, Computer Science, Master (Master Programme)
This thesis documents the design, implementation and test of Probabilistic
Relational Models (PRMs). PRMs are a graphical statistical approach to
modeling relational data using the Relational Language. PRMs consist of two
components; the dependency structure and the parameters.
Our design is based on simplicity, flexibility, and performance. We explain the search over possible structures, using a notation of potential parents. The potential parents are used in four different search algorithms; one greedy, two random, and a hybrid between greedy and random. Also, we explicitly explain learning the parameters using sufficient statistics, by considering internal and external dependencies and how to keep track of the context.
We perform five different tests, showing that the penalty term of the score function can be tweaked to control the trade off between maximum likelihood and complexity. Also, our limited scalability test shows that our implementation scales near linear, although our implementation could be further optimized. The search test of the four algorithms show that introducing randomness is beneficial if combined with a greedy approach. The results also show, that although greedy finds the best model, the hybrid approach comes close in less time.
In comparison with our prior work, it is clear that PRMs are a very good alternative to propositional data mining in terms of descriptiveness.
Our design is based on simplicity, flexibility, and performance. We explain the search over possible structures, using a notation of potential parents. The potential parents are used in four different search algorithms; one greedy, two random, and a hybrid between greedy and random. Also, we explicitly explain learning the parameters using sufficient statistics, by considering internal and external dependencies and how to keep track of the context.
We perform five different tests, showing that the penalty term of the score function can be tweaked to control the trade off between maximum likelihood and complexity. Also, our limited scalability test shows that our implementation scales near linear, although our implementation could be further optimized. The search test of the four algorithms show that introducing randomness is beneficial if combined with a greedy approach. The results also show, that although greedy finds the best model, the hybrid approach comes close in less time.
In comparison with our prior work, it is clear that PRMs are a very good alternative to propositional data mining in terms of descriptiveness.
Language | English |
---|---|
Publication date | Jun 2005 |