AutoBinner: A Metagenomic Binner combining Feature Learning and HDBSCAN
Authors
Hansen, Mathias Lundhede ; Vive, Lasse Erritsø ; Linnebjerg, Simon Nagel
Term
4. term
Education
Publication year
2020
Submitted on
2020-06-11
Pages
65
Abstract
Metagenomisk binning hjælper med at sortere DNA-fragmenter fra blandede mikrobielle prøver i grupper, der repræsenterer enkelte organismer. Denne rapport introducerer AutoBinner, et værktøj der kombinerer feature learning og klyngeanalyse for at forbedre denne sortering. AutoBinner bruger en stablet, underkomplet autoencoder—et neuralt netværk, der komprimerer data for at lære en lavdimensionel repræsentation (embedding)—baseret på abundans- og kompositionsfeatures. Derefter anvendes HDBSCAN, en tæthedsbaseret klyngealgoritme, til at gruppere disse repræsentationer i bins. For at give kontekst forklarer rapporten også, hvordan neurale netværk, især autoencodere, bruges til feature learning i denne opgave. AutoBinner evalueres i sammenligning med state-of-the-art binners, MetaBAT2, CONCOCT og VAMB, på tre datasæt: CAMI Medium, CAMI High og CAMI Airways. Resultaterne viser, at AutoBinner kræver yderligere forfining, men at kombinationen af autoencodere og HDBSCAN har potentiale til metagenomisk binning.
Metagenomic binning helps sort DNA fragments from mixed microbial samples into groups that represent individual organisms. This report introduces AutoBinner, a tool that combines feature learning and clustering to improve this sorting. AutoBinner uses a stacked, undercomplete autoencoder—a neural network that compresses data to learn a lower-dimensional representation (embedding)—based on abundance and composition features. It then applies HDBSCAN, a density-based clustering algorithm, to group these representations into bins. To provide context, the report also explains how neural networks, especially autoencoders, support feature learning for this task. We evaluate AutoBinner against state-of-the-art binners MetaBAT2, CONCOCT, and VAMB on three datasets: CAMI Medium, CAMI High, and CAMI Airways. The results indicate that AutoBinner needs further refinement, but they also show promise for using autoencoders together with HDBSCAN in metagenomic binning.
[This abstract was generated with the help of AI]
Keywords
Binner ; Autoencoder ; Binning ; Deep learning ; Feature Learning ; Tensorflow ; Keras ; HDBSCAN ; Clustering
Documents
