Palamut - An Expansion of the Bonito basecaller using language models
Authors
Larsen, Andreas Christian Meyer ; Hansen, Magnus Nørhave ; Knudsen, Christian Aae
Term
4. term
Education
Publication year
2020
Submitted on
2020-06-11
Pages
17
Abstract
I denne afhandling undersøger vi, hvordan metoder fra automatisk talegenkendelse (ASR) kan forbedre nanopore-basecalling – processen hvor rå sensorsignaler omsættes til nukleotidbogstaver. Vi fokuserer på Bonito, en moderne end-to-end basecaller, og udvider dens arkitektur med en afkoder (decoder), der kan bruge sandsynligheder fra en sprogmodel til at forfine basecalls. Vi træner og sammenligner to sprogmodeller på tegnniveau: en n-gram-model, som fanger korte mønstre, og en RNN (recurrent neural network)-model, som kan lære længere afhængigheder. Vores resultater viser en lille forbedring i konsensusnøjagtighed (nøjagtigheden efter at kombinere flere læsninger), men en tilsvarende forringelse af nøjagtigheden for enkeltlæsninger. Vi vurderer, at faldet skyldes suboptimalt indstillede hyperparametre i afkoderen snarere end selve sprogmodellerne, og vi skitserer mulige justeringer for at løse problemet.
This thesis explores how techniques from Automatic Speech Recognition (ASR) can improve nanopore basecalling—the step that turns raw sensor signals into nucleotide letters. We focus on Bonito, a modern end-to-end basecaller, and extend its architecture with a decoder that uses language model probabilities to refine basecalls. We train and compare two character-level language models: an n-gram model, which captures short patterns, and a recurrent neural network (RNN) model, which can learn longer-range dependencies. Our results show a small increase in consensus accuracy (the accuracy after combining multiple reads), accompanied by a matching decrease in single-read accuracy. We attribute this drop to suboptimally tuned decoder hyperparameters rather than the language models themselves, and we outline potential adjustments to address the issue.
[This abstract was generated with the help of AI]
Documents
