Deep Clustering til Metagenomic Binning
Studenteropgave: Kandidatspeciale og HD afgangsprojekt
- Trong Dai Ha
- Jitka Polásková
- Jan Niklas Fichte
4. semester, Computer Science (IT - International Track) (Kandidatuddannelse)
Deep learning is an area that is only
sparsely explored for metagenomic bin-
ning. The existing deep learning-based
approaches usually preprocess raw DNA
sequences into input features such as com-
position and abundance and perform rep-
resentation learning and clustering in two
steps. The utilization of unprocessed
DNA sequences as input shows promis-
ing results for gene prediction. Joint deep
clustering leads to better results for im-
age clustering than basic approaches like
k-means clustering. In this report, we in-
vestigate the potential of joint end-to-end
unsupervised learning and the utilization
of unprocessed contigs as inputs for the
task of metagenomic binning.
We propose two new binners: Deep Con-
volutional Metagenomic Binner (DCMB)
and Deep Stacked Metagenomic Bin-
ner (DSMB). Both binners utilize KL
divergence-based joint deep clustering.
The DCMB takes unprocessed contigs and
the DSMB uses abundance and composi-
tion as inputs. The performance of both binners is bench-
marked on the CAMI Low dataset and
compared to metagenomic binners VAMB,
MetaBat2, and SolidBin. The results show
that metagenomic information requires
preprocessing to obtain meaningful repre-
sentations and that joint end-to-end learn-
ing slightly improves the number of recov-
ered bins.
sparsely explored for metagenomic bin-
ning. The existing deep learning-based
approaches usually preprocess raw DNA
sequences into input features such as com-
position and abundance and perform rep-
resentation learning and clustering in two
steps. The utilization of unprocessed
DNA sequences as input shows promis-
ing results for gene prediction. Joint deep
clustering leads to better results for im-
age clustering than basic approaches like
k-means clustering. In this report, we in-
vestigate the potential of joint end-to-end
unsupervised learning and the utilization
of unprocessed contigs as inputs for the
task of metagenomic binning.
We propose two new binners: Deep Con-
volutional Metagenomic Binner (DCMB)
and Deep Stacked Metagenomic Bin-
ner (DSMB). Both binners utilize KL
divergence-based joint deep clustering.
The DCMB takes unprocessed contigs and
the DSMB uses abundance and composi-
tion as inputs. The performance of both binners is bench-
marked on the CAMI Low dataset and
compared to metagenomic binners VAMB,
MetaBat2, and SolidBin. The results show
that metagenomic information requires
preprocessing to obtain meaningful repre-
sentations and that joint end-to-end learn-
ing slightly improves the number of recov-
ered bins.
Sprog | Engelsk |
---|---|
Udgivelsesdato | 17 jun. 2022 |
Antal sider | 117 |