Discover Bacterial Interactions by Combining Clustering of Pairwise Univariate Time Series and Explainability
Authors
Agerskov, Alexander de Linde ; Sørensen, Christian Bro ; Holmager, Trine Juhl
Term
4. term
Education
Publication year
2021
Submitted on
2021-06-11
Pages
18
Abstract
Understanding how bacteria interact in wastewater treatment plants (WWTPs) can improve process control, prevent eutrophication, and reduce pollution in discharged water. This study proposes a data-driven way to identify potential interactions by clustering paired time series of bacterial abundances measured in activated sludge. We adapt a deep clustering method called DPSOM so it takes pairs of bacteria as input. Each pair consists of two single-bacterium time series. To capture local temporal patterns, we split each pair into shorter subsequences (“windows”), cluster the windows, and then use those window clusters to assign clusters to the original full-length pairs. To make the results interpretable, we use the LIME framework to visualize which parts of a pair most influenced its cluster assignment. Because the dataset has no ground-truth labels for interactions, we assess the approach with alternative criteria: the Pearson correlation coefficient, a cluster-based prediction task, and the LIME explanations.
At forstå, hvordan bakterier interagerer i spildevandsrensningsanlæg (WWTPs), kan forbedre styring af processen, forhindre eutrofiering og mindske forurening i udløbsvand. Dette studie foreslår en datadrevet måde at finde mulige interaktioner ved at klynge parvise tidsserier af bakterieforekomster målt i aktiveret slam. Vi tilpasser en dyb klynge-metode kaldet DPSOM, så den tager bakteriepar som input. Hvert par består af to tidsserier for hver sin bakterie. For at fange lokale tidsmønstre opdeler vi hvert par i kortere delsekvenser ("vinduer"), klynge disse vinduer og bruger vindue-klyngerne til at tildele klynger til de oprindelige fuldlængde-par. For at gøre resultaterne forståelige bruger vi LIME til at visualisere, hvilke dele af et par der har størst indflydelse på dets klyngetildeling. Da datasættet ikke har kendte facit for interaktioner, vurderer vi metoden med alternative mål: Pearsons korrelationskoefficient, en klyngebaseret forudsigelsesopgave samt LIME-forklaringer.
[This apstract has been rewritten with the help of AI based on the project's original abstract]
