AAU Student Projects is unavailable between June 15th 1.30pm and 17th 1.30pm due to planned system maintenance. The projects cannot be downloaded during this period.
AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Self-Supervised Contrastive Learning for Large-Scale ECG Similarity Search & Retrieval

Authors

;

Term

4. term

Publication year

2026

Submitted on

Abstract

This thesis investigates whether self-supervised contrastive deep learning can enable large-scale ECG archives to be searched based on signal similarity rather than predefined diagnostic labels. The work is motivated by the fact that 12‑lead electrocardiograms contain rich morphological, temporal, and spatial information, while current AI methods are dominated by supervised classification, which depends on expensive expert annotations and does not scale well to population-level datasets. In this study, a self-supervised contrastive ECG retrieval network is developed and trained on 12 million 10‑second ECGs from the Danish Nationwide ECG Cohort. Each recording is split into two 5‑second segments, with one segment augmented using amplitude scaling, Gaussian noise, and baseline wander to learn robust, clinically meaningful representations. The model is optimized using a tri-objective loss function combining two NT-Xent contrastive losses with a mean squared error reconstruction loss. The resulting 10‑second ECG embeddings are then indexed in a FAISS-based approximate nearest-neighbor system to enable efficient ECG-to-ECG similarity search at scale. Evaluation shows that higher cosine similarity between embeddings is associated with smaller differences in standardized ECG-derived metadata and that disease-specific retrieval yields substantially higher concentrations of the same diagnosis among nearest neighbors for both common (e.g., AFIB, RBBB) and rare (e.g., WPW, WPWB) conditions. These findings suggest that the model captures clinically relevant morphological, temporal, and spatial structures and can organize and query large ECG databases without supervised fine-tuning. The thesis thus opens new avenues for downstream phenotyping and risk stratification, while emphasizing that further clinical validation is required before widespread deployment.

Denne specialeopgave undersøger, om selv-supervised kontrastiv deep learning kan bruges til at gøre store EKG-arkiver søgbare ud fra signal-lighed i stedet for foruddefinerede diagnoser. Udgangspunktet er, at 12-aflednings EKG’er rummer rige morfologiske, temporale og spatiale informationer, men at nuværende AI-løsninger hovedsageligt bygger på supervised klassifikation, som kræver dyre og tidskrævende annoteringer og derfor skalerer dårligt til populationsniveau. I studiet udvikles et selv-supervised kontrastivt EKG-retrievalnetværk trænet på 12 millioner 10-sekunders EKG’er fra Danish Nationwide ECG Cohort. Hvert signal opdeles i to 5-sekunders udsnit, hvor det ene udsnit undergår augmentering med amplitudeskalering, Gaussisk støj og baseline-vandring for at lære robuste, diagnoserelevante repræsentationer. Modellen trænes med en tri-objektiv loss-funktion, der kombinerer to NT-Xent-kontrastive tab med et rekonstruktions-tab baseret på mean squared error. De resulterende embeddings fra 10-sekunders EKG’er indekseres efterfølgende i et FAISS-baseret approximate nearest-neighbor system til hurtig EKG-til-EKG similarity search i stor skala. Evalueringen viser, at højere cosinus-similaritet mellem EKG-embeddings hænger sammen med mindre forskelle i EKG-afledte metadata, målt på en standardiseret skala, og at sygdomsspecifik retrieval giver en tydeligt forhøjet andel af samme diagnose blandt de nærmeste naboer for både hyppige (fx AFIB, RBBB) og sjældne (fx WPW, WPWB) tilstande. Dette indikerer, at modellen uden supervised finetuning opfanger klinisk relevante morfologiske, temporale og spatiale strukturer og kan bruges til at strukturere og gennemsøge store EKG-databaser. Arbejdet peger dermed på nye muligheder for downstream-phenotyping og risikostratificering, men forfatterne understreger, at yderligere klinisk validering er nødvendig, før metoden kan anvendes bredt i praksis.

[This abstract has been generated with the help of AI directly from the project full text]