Improving Basecalling Accuracy With Transformers

Student thesis: Master thesis (including HD thesis)

  • Felix Gravila
  • Miroslav Pakanec
4. term, Software, Master (Master Programme)
DNA sequencing has recently undergone rapid improvements due to the Oxford Nanopore Technologies sequencing devices.
These devices are fast and can read longer sequences than other
sequencers, but have a lower accuracy due to the process of translating
the measured electric signal into the corresponding DNA bases. This
process is done using machine learning models called basecallers, which
greatly impact the overall sequencing accuracy.
Current basecallers process the electric signal sequentially, relying on
recurrent layers and connectionist temporal classification for decoding.
We propose an open source transformer-based model, FishNChips, which
eliminates the need of recurrence by relying solely on attention. We
compare it to our own implementation of a recurrent model, Gravlax,
and show that FishNChips outperforms both Gravlax and the current
state of the art basecallers
Publication date10 Jun 2020
Number of pages15
External collaboratorAlbertsen Lab
Mads Albertsen
Information group
ID: 333958505