AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Discover Bacterial Interactions by Combining Clustering of Pairwise Univariate Time Series and Explainability

Authors

; ;

Term

4. term

Education

Publication year

2021

Submitted on

Pages

18

Abstract

Understanding how bacteria interact in wastewater treatment plants (WWTPs) can improve process control, prevent eutrophication, and reduce pollution in discharged water. This study proposes a data-driven way to identify potential interactions by clustering paired time series of bacterial abundances measured in activated sludge. We adapt a deep clustering method called DPSOM so it takes pairs of bacteria as input. Each pair consists of two single-bacterium time series. To capture local temporal patterns, we split each pair into shorter subsequences (“windows”), cluster the windows, and then use those window clusters to assign clusters to the original full-length pairs. To make the results interpretable, we use the LIME framework to visualize which parts of a pair most influenced its cluster assignment. Because the dataset has no ground-truth labels for interactions, we assess the approach with alternative criteria: the Pearson correlation coefficient, a cluster-based prediction task, and the LIME explanations.

At forstå, hvordan bakterier interagerer i spildevandsrensningsanlæg (WWTPs), kan forbedre styring af processen, forhindre eutrofiering og mindske forurening i udløbsvand. Dette studie foreslår en datadrevet måde at finde mulige interaktioner ved at klynge parvise tidsserier af bakterieforekomster målt i aktiveret slam. Vi tilpasser en dyb klynge-metode kaldet DPSOM, så den tager bakteriepar som input. Hvert par består af to tidsserier for hver sin bakterie. For at fange lokale tidsmønstre opdeler vi hvert par i kortere delsekvenser ("vinduer"), klynge disse vinduer og bruger vindue-klyngerne til at tildele klynger til de oprindelige fuldlængde-par. For at gøre resultaterne forståelige bruger vi LIME til at visualisere, hvilke dele af et par der har størst indflydelse på dets klyngetildeling. Da datasættet ikke har kendte facit for interaktioner, vurderer vi metoden med alternative mål: Pearsons korrelationskoefficient, en klyngebaseret forudsigelsesopgave samt LIME-forklaringer.

[This apstract has been rewritten with the help of AI based on the project's original abstract]