Deep Clustering for Metagenomic Binning
Translated title
Deep Clustering til Metagenomic Binning
Authors
Ha, Trong Dai ; Polásková, Jitka ; Fichte, Jan Niklas
Term
4. Term
Publication year
2022
Submitted on
2022-06-17
Pages
117
Abstract
Deep learning is an area that is only sparsely explored for metagenomic binning. The existing deep learning-based approaches usually preprocess raw DNA sequences into input features such as composition and abundance and perform representation learning and clustering in two steps. The utilization of unprocessed DNA sequences as input shows promising results for gene prediction. Joint deep clustering leads to better results for image clustering than basic approaches like k-means clustering. In this report, we investigate the potential of joint end-to-end unsupervised learning and the utilization of unprocessed contigs as inputs for the task of metagenomic binning. We propose two new binners: Deep Convolutional Metagenomic Binner (DCMB) and Deep Stacked Metagenomic Binner (DSMB). Both binners utilize KL divergence-based joint deep clustering. The DCMB takes unprocessed contigs and the DSMB uses abundance and composition as inputs. The performance of both binners is benchmarked on the CAMI Low dataset and compared to metagenomic binners VAMB, MetaBat2, and SolidBin. The results show that metagenomic information requires preprocessing to obtain meaningful representations and that joint end-to-end learning slightly improves the number of recovered bins.
Keywords
metagenomics ; binning ; deep learning ; joint clustering ; end-to-end learning ; KL divergence ; VAMB ; metagenomic binning ; DCMB ; DSMB ; DVMB
Documents
