Improving Basecalling Accuracy With Transformers
Authors
Term
4. term
Education
Publication year
2020
Submitted on
2020-06-10
Pages
15
Abstract
DNA sequencing has recently undergone rapid improvements due to the Oxford Nanopore Technologies sequencing devices. These devices are fast and can read longer sequences than other sequencers, but have a lower accuracy due to the process of translating the measured electric signal into the corresponding DNA bases. This process is done using machine learning models called basecallers, which greatly impact the overall sequencing accuracy. Current basecallers process the electric signal sequentially, relying on recurrent layers and connectionist temporal classification for decoding. We propose an open source transformer-based model, FishNChips, which eliminates the need of recurrence by relying solely on attention. We compare it to our own implementation of a recurrent model, Gravlax, and show that FishNChips outperforms both Gravlax and the current state of the art basecallers
Keywords
Documents
