Amplification of DNA mixtures - Missing data approach

Studenteropgave: Kandidatspeciale og HD afgangsprojekt

  • Torben Tvedebrink
4. semester, Matematik, Kandidat (Kandidatuddannelse)
This thesis presents a model for the interpretation of results of STR typing of DNA mixtures based on a multivariate normal distribution of peak areas. From previous analyses of controlled experiments with mixed DNA samples, we exploit the linear relationship between peak heights and peak areas, and the linear relations of the means and variances of the measurements. Furthermore the contribution from one individual allele to the mean area of this allele, is assumed proportional to the average of height measurements on alleles where the individual is the only contributor.

For shared alleles in mixed DNA samples, it is only possible to observe the cumulative peak heights and areas. Complying with this latent structure, we use the EM-algorithm to impute the missing variables based on a compound symmetry model. This allows intra- and intersystem correlations on the measurements and does not depend on the alleles of the DNA profiles. Due to factorization of the likelihood and properties of the normal distribution, an ordinary implementation of the EM-algorithm solves the missing data problem.

We estimate the parameters in the model based on a training data set. In order to asses the weight of evidence provided by the model, we use the model with the estimated parameters on STR data from real crime cases with DNA mixtures.

The model work under certain limitations. In the estimation phase we exclude cases with drop-outs. These limitations are important and must be solved before the model can be used for real crime case work and the limitations are therefore subject to further investigation.
Udgivelsesdatojun. 2007
ID: 61071054