Deep Clustering for Metagenomic Binning

Translated title

Deep Clustering til Metagenomic Binning

Authors

Ha, Trong Dai ; Polásková, Jitka ; Fichte, Jan Niklas

Term

4. Term

Education

Computer Science (IT - International Track)

Publication year

2022

Submitted on

2022-06-17

Pages

117

Abstract

Deep learning is an area that is only sparsely explored for metagenomic bin- ning. The existing deep learning-based approaches usually preprocess raw DNA sequences into input features such as com- position and abundance and perform rep- resentation learning and clustering in two steps. The utilization of unprocessed DNA sequences as input shows promis- ing results for gene prediction. Joint deep clustering leads to better results for im- age clustering than basic approaches like k-means clustering. In this report, we in- vestigate the potential of joint end-to-end unsupervised learning and the utilization of unprocessed contigs as inputs for the task of metagenomic binning. We propose two new binners: Deep Con- volutional Metagenomic Binner (DCMB) and Deep Stacked Metagenomic Bin- ner (DSMB). Both binners utilize KL divergence-based joint deep clustering. The DCMB takes unprocessed contigs and the DSMB uses abundance and composi- tion as inputs. The performance of both binners is bench- marked on the CAMI Low dataset and compared to metagenomic binners VAMB, MetaBat2, and SolidBin. The results show that metagenomic information requires preprocessing to obtain meaningful repre- sentations and that joint end-to-end learn- ing slightly improves the number of recov- ered bins.

Keywords

metagenomics ; binning ; deep learning ; joint clustering ; end-to-end learning ; KL divergence ; VAMB ; metagenomic binning ; DCMB ; DSMB ; DVMB

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Deep Clustering for Metagenomic Binning