Prediction Models for Classification

Student thesis: Master thesis (including HD thesis)

  • Nicolai Søndergaard Schjøtt
  • Simon Grøntved
4. term, Mathematics, Master (Master Programme)
The aim of this master thesis is to build the best prediction models that classify which children and adolescents get ASD or ADHD, respectively. If the model predict sufficiently good, it can be used to support the theory within the fields of ASD and ADHD. If the model is very good at predicting, it can be used by clinicians to substantiate their suspicion of diagnosis.

We started out by writing a protocol, used to order the data set used in this master thesis. Since it takes a long time from us ordering the data until us receiving the data, we end up simulating a data set, which we expected had the same properties as the ordered. This has proved to be a great advantage as we have learned to simulate, link theory and practice and it has prepared us for the ordered data set.

The master thesis focuses on the classification method logistic regression, where we use splines for our continuous variables and LASSO to select the most important variables. We also use other non-likelihood based classification methods such as classification trees, which also contributed to our variable selection. When we fit a prediction model, it is important to determine whether it predicts good at all and whether it predicts better than other models. To determine this, we have used various evaluation measures, but our main focus has been "Area under the ROC curve". All our evaluation measures were 10-fold cross-validated.

We do not recommend using the models that we have reached at present, but rather we recommend expanding our thoughts and ideas for further research. We experience problems with logistic regression in the form of a time-dependent response as well as informative censoring for the predictors. Furthermore, we believe that one of the most advantageous improvements would be to add more predictors for the prediction models to become sufficiently good.
Publication date2019
Number of pages152
External collaboratorEnheden for Psykiatrisk Forskning ved Psykiatrien i Region Nordjylland
María Rodrigo
ID: 305437130