Improving Basecalling Accuracy With Transformers

Studenteropgave: Speciale (inkl. HD afgangsprojekt)

  • Felix Gravila
  • Miroslav Pakanec
4. semester, Software, Kandidat (Kandidatuddannelse)
DNA sequencing has recently undergone rapid improvements due to the Oxford Nanopore Technologies sequencing devices.
These devices are fast and can read longer sequences than other
sequencers, but have a lower accuracy due to the process of translating
the measured electric signal into the corresponding DNA bases. This
process is done using machine learning models called basecallers, which
greatly impact the overall sequencing accuracy.
Current basecallers process the electric signal sequentially, relying on
recurrent layers and connectionist temporal classification for decoding.
We propose an open source transformer-based model, FishNChips, which
eliminates the need of recurrence by relying solely on attention. We
compare it to our own implementation of a recurrent model, Gravlax,
and show that FishNChips outperforms both Gravlax and the current
state of the art basecallers
SprogEngelsk
Udgivelsesdato10 jun. 2020
Antal sider15
Ekstern samarbejdspartnerAlbertsen Lab
Mads Albertsen ma@bio.aau.dk
Informantgruppe
ID: 333958505