AutoBinner: A Metagenomic Binner combining Feature Learning and HDBSCAN
Student thesis: Master thesis (including HD thesis)
- Mathias Lundhede Hansen
- Lasse Erritsø Vive
- Simon Nagel Linnebjerg
4. term, Software, Master (Master Programme)
This report documents the development of the
metagenomic binner AutoBinner. Autobinner
combines feature learning and clustering by using
a stacked undercomplete autoencoder and
the clustering algorithm HDBSCAN. The autoencoder
learns a feature embedding, given abundance
and composition features before the embedding
is clustered.
To provide the full picture of AutoBinner the report
also provides an explanation of the workings
of neural networks in the context of autoencoders
and feature learning.
The performance of AutoBinner is evaluated
with a comparison with state of the art binners,
MetaBAT2 [Kang et al., 2019], CONCOCT [Alneberg
et al., 2013] and VAMB [Nissen et al.,
2018], on three different datasets CAMI Medium,
CAMI High, and CAMI Airways. The results
indicate that further refinement of AutoBinner
is needed, yet we see potential of using autoencoders
and HDBSCAN for metagenomic binning.
metagenomic binner AutoBinner. Autobinner
combines feature learning and clustering by using
a stacked undercomplete autoencoder and
the clustering algorithm HDBSCAN. The autoencoder
learns a feature embedding, given abundance
and composition features before the embedding
is clustered.
To provide the full picture of AutoBinner the report
also provides an explanation of the workings
of neural networks in the context of autoencoders
and feature learning.
The performance of AutoBinner is evaluated
with a comparison with state of the art binners,
MetaBAT2 [Kang et al., 2019], CONCOCT [Alneberg
et al., 2013] and VAMB [Nissen et al.,
2018], on three different datasets CAMI Medium,
CAMI High, and CAMI Airways. The results
indicate that further refinement of AutoBinner
is needed, yet we see potential of using autoencoders
and HDBSCAN for metagenomic binning.
Language | English |
---|---|
Publication date | 11 Jun 2020 |
Number of pages | 65 |