Gaussian Graphical Models
Author
Kløjgård, Torben Anders
Term
4. term
Education
Publication year
2015
Submitted on
2015-08-03
Abstract
This thesis presents a new classification approach inspired by Random Forests, in which the usual tree-based learners are replaced by Gaussian graphical models. A key challenge is model selection: likelihood ratio tests are theoretically optimal but become computationally prohibitive as the number of variables grows. To address this, the Headlong Method (HLM) is introduced, based on regressing each variable on its graph neighbors and using t‑tests to assess individual parameters; by selecting edges at random, HLM delivers a substantially faster and practically effective alternative to likelihood-based selection, particularly in high-dimensional settings. The methodology rests on conditional probability, graph theory, the multivariate normal distribution, and general linear models. The proposed classifier is benchmarked against linear and quadratic discriminant analysis and the graphical lasso on two datasets—one with many observations and few variables, the other with few observations and many variables. Performance is evaluated using ROC curves and AUC, and the comparisons are used to identify the most suitable classifier for each dataset, while HLM demonstrates clear speed advantages and good applicability with large numbers of variables.
Specialet undersøger en ny klassifikationsstrategi inspireret af Random Forest, hvor beslutningstræer erstattes af Gaussiske grafiske modeller. En central udfordring er modelselektion: likelihood ratio-tests er teoretisk optimale, men bliver hurtigt beregningstunge, når antallet af variable vokser. Derfor introduceres Headlong Method (HLM), der bygger på regression af en variabel på dens naboer i grafen og efterfølgende t‑tests af de enkelte parametre; metoden vælger kanter tilfældigt og giver en markant hurtigere og praktisk effektiv procedure end tradtionelle likelihood-baserede tests, især i højdimensionelle situationer. Metodens grundlag omfatter betinget sandsynlighed, grafteori og den multivariate normalfordeling samt generelle lineære modeller. Den foreslåede klassifikationsmetode benchmarkes mod lineær og kvadratisk diskriminantanalyse samt graphical lasso på to datasæt, hvoraf det ene har mange observationer og få variable, og det andet få observationer og mange variable. Ydelsen vurderes via ROC-kurver og AUC, og resultaterne anvendes til at udvælge den mest velegnede klassifikationsmodel for hvert datasæt; samtidig viser HLM en klar hastighedsfordel og god anvendelighed ved store variabelmængder.
[This apstract has been generated with the help of AI directly from the project full text]
