Palamut - An Expansion of the Bonito basecaller using language models
Studenteropgave: Speciale (inkl. HD afgangsprojekt)
- Andreas Christian Meyer Larsen
- Magnus Nørhave Hansen
- Christian Aae Knudsen
4. semester, Software, Kandidat (Kandidatuddannelse)
In this paper we discuss methods used in modern basecallers and the end-to-end ASR architecture adopted by the Bonito basecaller to increase accuracy. We investigate the prospect of increasing accuracy by applying common ASR approaches to basecalling. \
We expand the architecture of the Bonito nanopore basecaller, by introducing a decoder algorithm, allowing for the use of language model probabilities, to increase accuracy of basecalls. We train and compare $n$-gram and RNN character-level language models. \
Our results show that while an introduction of language models gives a slight increase in consensus accuracy of basecalls, our current language models decrease read accuracy by an equal margin. We finally conclude that the decrease in accuracy is caused by poorly optimized hyperparameters of our decoder, and present potential solutions to the problem.
We expand the architecture of the Bonito nanopore basecaller, by introducing a decoder algorithm, allowing for the use of language model probabilities, to increase accuracy of basecalls. We train and compare $n$-gram and RNN character-level language models. \
Our results show that while an introduction of language models gives a slight increase in consensus accuracy of basecalls, our current language models decrease read accuracy by an equal margin. We finally conclude that the decrease in accuracy is caused by poorly optimized hyperparameters of our decoder, and present potential solutions to the problem.
Sprog | Engelsk |
---|---|
Udgivelsesdato | 11 jun. 2020 |
Antal sider | 17 |