Amplification of DNA mixtures - Missing data approach
Student thesis: Master Thesis and HD Thesis
- Torben Tvedebrink
4. term, Mathematics, Master (Master Programme)
This thesis presents a model for the interpretation of results of STR typing of DNA
mixtures based on a multivariate normal distribution of peak areas. From previous
analyses of controlled experiments with mixed DNA samples, we exploit the linear
relationship between peak heights and peak areas, and the linear relations of the means
and variances of the measurements. Furthermore the contribution from one individual
allele to the mean area of this allele, is assumed proportional to the average of height
measurements on alleles where the individual is the only contributor.
For shared alleles in mixed DNA samples, it is only possible to observe the cumulative peak heights and areas. Complying with this latent structure, we use the EM-algorithm to impute the missing variables based on a compound symmetry model. This allows intra- and intersystem correlations on the measurements and does not depend on the alleles of the DNA profiles. Due to factorization of the likelihood and properties of the normal distribution, an ordinary implementation of the EM-algorithm solves the missing data problem.
We estimate the parameters in the model based on a training data set. In order to asses the weight of evidence provided by the model, we use the model with the estimated parameters on STR data from real crime cases with DNA mixtures.
The model work under certain limitations. In the estimation phase we exclude cases with drop-outs. These limitations are important and must be solved before the model can be used for real crime case work and the limitations are therefore subject to further investigation.
For shared alleles in mixed DNA samples, it is only possible to observe the cumulative peak heights and areas. Complying with this latent structure, we use the EM-algorithm to impute the missing variables based on a compound symmetry model. This allows intra- and intersystem correlations on the measurements and does not depend on the alleles of the DNA profiles. Due to factorization of the likelihood and properties of the normal distribution, an ordinary implementation of the EM-algorithm solves the missing data problem.
We estimate the parameters in the model based on a training data set. In order to asses the weight of evidence provided by the model, we use the model with the estimated parameters on STR data from real crime cases with DNA mixtures.
The model work under certain limitations. In the estimation phase we exclude cases with drop-outs. These limitations are important and must be solved before the model can be used for real crime case work and the limitations are therefore subject to further investigation.
Language | English |
---|---|
Publication date | Jun 2007 |