Linkage Analysis with PDGs
Studenteropgave: Speciale (inkl. HD afgangsprojekt)
- Mads Thrane
- Mette Thøgersen
4. semester, Datalogi, Kandidat (Kandidatuddannelse)
This is a linkage analysis project. Linkage analysis is a tool for
locating genes on DNA strings. The motivation has been to
investigate possible optimization of an existing linkage analysis
algorithm Fast Tree Traversal, developed by a medical company
DeCode Genetics in Iceland, using probabilistic graphical models.
The current implementation of the Fast Tree Traversal Algorithm
uses MTBDDs.
The project is divided into three parts. The first part is an introduction to the field of linkage analysis, written to create a better understanding of the subject. The second part is an investigation into some of the currently available linkage analysis algorithms, and the third part is development of a new linkage analysis algorithm using PDGs.
We have implemented a single point linkage analysis algorithm which gives an RFG as output. To use the output as input to a multi point algorithm the RFG must be normalized into a PDG. The implementation has been tested using data from the Superlink homepage. Superlink is another linkage analysis algorithm, which uses probabilistic graphical models, in this case Bayesian networks. Giving this data to the implementation the resulting RFG contains 105 nodes, which is quite small compared to the possible 4^42 nodes.
The project is divided into three parts. The first part is an introduction to the field of linkage analysis, written to create a better understanding of the subject. The second part is an investigation into some of the currently available linkage analysis algorithms, and the third part is development of a new linkage analysis algorithm using PDGs.
We have implemented a single point linkage analysis algorithm which gives an RFG as output. To use the output as input to a multi point algorithm the RFG must be normalized into a PDG. The implementation has been tested using data from the Superlink homepage. Superlink is another linkage analysis algorithm, which uses probabilistic graphical models, in this case Bayesian networks. Giving this data to the implementation the resulting RFG contains 105 nodes, which is quite small compared to the possible 4^42 nodes.
Sprog | Engelsk |
---|---|
Udgivelsesdato | jun. 2004 |