AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Describing mutational signatures using variational autoencoders

Authors

;

Term

4. term

Education

Publication year

2024

Submitted on

Pages

21

Abstract

This thesis examines whether variational autoencoders can help identify mutational signatures in cancer genomics data. Mutational signatures are patterns of DNA changes that can reveal underlying processes, such as environmental exposure or defects in DNA repair. Traditionally, these signatures are extracted using Non-Negative Matrix Factorization (NMF), a mathematical technique that breaks data into building blocks. Here, we develop a β-VAE, a probabilistic autoencoder, to identify signatures, their exposures (how much each signature contributes in a given sample), and confidence intervals that express uncertainty around the estimates. The addition of confidence intervals is unique to this work and is derived by analyzing the model’s probabilistic latent space (its internal, compressed representation of the data). Experiments show that the β-VAE achieves competitive performance but still falls short of state-of-the-art methods in signature extraction. These findings indicate a need for further refinement and suggest exploring alternative probabilistic models to improve accuracy.

Denne afhandling undersøger, om variational autoencodere kan bruges til at finde mutationssignaturer i kræftgenomdata. Mutationssignaturer er mønstre af DNA-mutationer, som kan pege på underliggende processer, for eksempel miljøpåvirkninger eller fejl i DNA-reparation. Traditionelt udtrækkes disse signaturer med Ikke-Negativ Matrixfaktorisering (NMF), en matematisk metode der opdeler data i byggeklodser. I dette arbejde udvikler vi en β-VAE, en probabilistisk autoencoder, til at identificere signaturer, deres eksponeringer (hvor meget hver signatur bidrager i en given prøve) samt konfidensintervaller, der angiver usikkerheden omkring estimaterne. Bidraget med konfidensintervaller er unikt for dette arbejde og udledes ved at analysere modellens probabilistiske latente rum (dens indre, komprimerede repræsentation af data). Forsøg viser, at β-VAE’en kan levere konkurrencedygtige resultater, men at den stadig halter efter de bedste metoder, når det gælder selve signaturudtrækningen. Fundene peger på behovet for yderligere forfining og foreslår at undersøge alternative probabilistiske modeller for at forbedre nøjagtigheden.

[This apstract has been rewritten with the help of AI based on the project's original abstract]