AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Deep Clustering for Metagenomic Binning

Translated title

Deep Clustering til Metagenomic Binning

Term

4. Term

Publication year

2022

Submitted on

Pages

117

Abstract

Deep learning is an area that is only sparsely explored for metagenomic bin- ning. The existing deep learning-based approaches usually preprocess raw DNA sequences into input features such as com- position and abundance and perform rep- resentation learning and clustering in two steps. The utilization of unprocessed DNA sequences as input shows promis- ing results for gene prediction. Joint deep clustering leads to better results for im- age clustering than basic approaches like k-means clustering. In this report, we in- vestigate the potential of joint end-to-end unsupervised learning and the utilization of unprocessed contigs as inputs for the task of metagenomic binning. We propose two new binners: Deep Con- volutional Metagenomic Binner (DCMB) and Deep Stacked Metagenomic Bin- ner (DSMB). Both binners utilize KL divergence-based joint deep clustering. The DCMB takes unprocessed contigs and the DSMB uses abundance and composi- tion as inputs. The performance of both binners is bench- marked on the CAMI Low dataset and compared to metagenomic binners VAMB, MetaBat2, and SolidBin. The results show that metagenomic information requires preprocessing to obtain meaningful repre- sentations and that joint end-to-end learn- ing slightly improves the number of recov- ered bins.