Autoencoders for Signature Extraction: Systematically evaluating Pre- and Post-Processing
Authors
Hansen, Magni Jógvansson ; Kure, Nikolai Eriksen
Term
4. term
Education
Publication year
2024
Submitted on
2024-06-10
Pages
20
Abstract
Kræft er grundlæggende en genetisk sygdom. Når DNA forandres (mutationer), efterlader de processer, der skabte dem, genkendelige mønstre i arvematerialet. Disse mønstre kaldes mutationssignaturer og kan bruges til at pege på, hvad der har forårsaget dem. I dette projekt undersøger vi, om autoencodere—en type maskinlæringsmodel, der komprimerer og genskaber data for at finde mønstre—kan forbedre, hvor godt og præcist sådanne signaturer kan udtrækkes fra data. Vi afprøver forskellige for- og efterbehandlingsskridt og sammenligner med en baseline. For at vurdere metoderne matcher vi de udtrukne signaturer med kendte signaturer fra COSMIC og Signal ved hjælp af cosinus-lighed, en målestok for hvor ens to mønstre er. Resultaterne visualiseres for at gøre sammenligningerne tydelige. Vores resultater viser, at de tilføjede trin i pipelinen havde en positiv effekt og øgede både performance og nøjagtighed af de udtrukne signaturer.
Cancer is fundamentally a genetic disease. When DNA changes (mutations) occur, the processes that created them leave recognizable patterns in the genome. These patterns are called mutational signatures and can help point to their causes. In this project, we test whether autoencoders—a type of machine learning model that compresses and reconstructs data to find patterns—can improve how well and how accurately such signatures are extracted from data. We explore different pre- and post-processing steps and compare them to a baseline. To evaluate the methods, we match the extracted signatures to known signatures from COSMIC and Signal, using cosine similarity, a measure of how alike two patterns are. We also visualize the results for clarity. Our findings show that the added steps in the pipeline had a positive effect and increased both the performance and accuracy of the extracted signatures.
[This summary has been rewritten with the help of AI based on the project's original abstract]
Documents
