Y-STR: Haplotype Frequency Estimation and Evidence Calculation

Student thesis: Master Thesis and HD Thesis

  • Mikkel Meyer Andersen
4. term, Mathematics, Master (Master Programme)
Y-STR haplotype frequency estimation is important because it is required in order to calculate evidence. The loci on the Y-chromosome cannot be assumed to be independent as with on the autosomal STR, so the simultaneous probability does not factor to the product of the marginal probabilities. This means that a statistical model incorporating proper dependence must be created.

First an existing method, the frequency surveying approach, is described, and afterwards new models are developed. The new models considered are a new method called ancestral awareness and models based on existing methods such as kernel smoothing and model based clustering. Also a class of models, classification models, are developed. Examples of such models are classification trees, support vector machines, and ordered logistic regression.

Methods to assess the performance of the methods are developed and afterwards used to compare the models. It is found that classification trees is a good model, but it has the disadvantage of not using the prior knowledge such as the single step mutation model. Besides frequency estimation, evidence calculations is also considered in this thesis.
Publication dateMay 2010
Number of pages138
Publishing institutionInstitut for Matematiske Fag, Aalborg Universitet
ID: 31928577