Deep Clustering for Metagenomic Binning

Student thesis: Master thesis (including HD thesis)

  • Trong Dai Ha
  • Jitka Polásková
  • Jan Niklas Fichte
Deep learning is an area that is only
sparsely explored for metagenomic bin-
ning. The existing deep learning-based
approaches usually preprocess raw DNA
sequences into input features such as com-
position and abundance and perform rep-
resentation learning and clustering in two
steps. The utilization of unprocessed
DNA sequences as input shows promis-
ing results for gene prediction. Joint deep
clustering leads to better results for im-
age clustering than basic approaches like
k-means clustering. In this report, we in-
vestigate the potential of joint end-to-end
unsupervised learning and the utilization
of unprocessed contigs as inputs for the
task of metagenomic binning.
We propose two new binners: Deep Con-
volutional Metagenomic Binner (DCMB)
and Deep Stacked Metagenomic Bin-
ner (DSMB). Both binners utilize KL
divergence-based joint deep clustering.
The DCMB takes unprocessed contigs and
the DSMB uses abundance and composi-
tion as inputs. The performance of both binners is bench-
marked on the CAMI Low dataset and
compared to metagenomic binners VAMB,
MetaBat2, and SolidBin. The results show
that metagenomic information requires
preprocessing to obtain meaningful repre-
sentations and that joint end-to-end learn-
ing slightly improves the number of recov-
ered bins.
Publication date17 Jun 2022
Number of pages117
ID: 473351125