Y-STR: Haplotype Frequency Estimation and Evidence Calculation
Student thesis: Master Thesis and HD Thesis
- Mikkel Meyer Andersen
4. term, Mathematics, Master (Master Programme)
Y-STR haplotype frequency estimation is important because it is required in order to calculate evidence. The loci on the Y-chromosome cannot be assumed to be independent as with on the autosomal STR, so the simultaneous probability does not factor to the product of the marginal probabilities. This means that a statistical model incorporating proper dependence must be created.
First an existing method, the frequency surveying approach, is described, and afterwards new models are developed. The new models considered are a new method called ancestral awareness and models based on existing methods such as kernel smoothing and model based clustering. Also a class of models, classification models, are developed. Examples of such models are classification trees, support vector machines, and ordered logistic regression.
Methods to assess the performance of the methods are developed and afterwards used to compare the models. It is found that classification trees is a good model, but it has the disadvantage of not using the prior knowledge such as the single step mutation model. Besides frequency estimation, evidence calculations is also considered in this thesis.
First an existing method, the frequency surveying approach, is described, and afterwards new models are developed. The new models considered are a new method called ancestral awareness and models based on existing methods such as kernel smoothing and model based clustering. Also a class of models, classification models, are developed. Examples of such models are classification trees, support vector machines, and ordered logistic regression.
Methods to assess the performance of the methods are developed and afterwards used to compare the models. It is found that classification trees is a good model, but it has the disadvantage of not using the prior knowledge such as the single step mutation model. Besides frequency estimation, evidence calculations is also considered in this thesis.
Language | English |
---|---|
Publication date | May 2010 |
Number of pages | 138 |
Publishing institution | Institut for Matematiske Fag, Aalborg Universitet |
Keywords | Y-STR, haplotype, frequency, estimation, evidence, calculation, svm, support, vector, machines, classification, trees, ordered, logistic, regression, ancestral, awareness, kernel, smoothing, model, based, clustering, surveying, unobserved, probability, mass |
---|