AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

Improving Basecalling Accuracy With Transformers

Author(s)

Term

4. term

Education

Publication year

2020

Submitted on

2020-06-10

Pages

15 pages

Abstract

DNA sequencing has recently undergone rapid improvements due to the Oxford Nanopore Technologies sequencing devices. These devices are fast and can read longer sequences than other sequencers, but have a lower accuracy due to the process of translating the measured electric signal into the corresponding DNA bases. This process is done using machine learning models called basecallers, which greatly impact the overall sequencing accuracy. Current basecallers process the electric signal sequentially, relying on recurrent layers and connectionist temporal classification for decoding. We propose an open source transformer-based model, FishNChips, which eliminates the need of recurrence by relying solely on attention. We compare it to our own implementation of a recurrent model, Gravlax, and show that FishNChips outperforms both Gravlax and the current state of the art basecallers

Keywords

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.