Improving Basecalling Accuracy With Transformers
Authors
Gravila, Felix ; Pakanec, Miroslav
Term
4. term
Education
Publication year
2020
Submitted on
2020-06-10
Pages
15
Abstract
DNA-sekventering er blevet hurtigere takket være Oxford Nanopore Technologies-enheder, som kan læse meget lange DNA-stræk. Nøjagtigheden er dog lavere, fordi enhederne måler et elektrisk signal, som skal oversættes til DNA-bogstaver (A, C, G, T). Denne oversættelse, kaldet basecalling, udføres af maskinlæringsmodeller og har stor betydning for den samlede nøjagtighed. De fleste nuværende basecallere behandler signalet trin for trin med tilbagevendende neurale netværk og en særlig afkodningsmetode. Vi præsenterer FishNChips, en open-source basecaller bygget på transformer-modeller, der udelukkende bruger opmærksomhed (attention). I stedet for at behandle signalet sekventielt kan FishNChips se på mange dele af signalet samtidig for at finde mønstre, hvilket kan forbedre tildelingen af bogstaver. Vi sammenligner FishNChips med en tilbagevendende model, vi selv har implementeret, Gravlax, og med førende eksisterende basecallere. I vores evalueringer opnår FishNChips højere nøjagtighed end Gravlax og overgår de nuværende state-of-the-art basecallere.
DNA sequencing has become faster thanks to Oxford Nanopore Technologies devices, which can read very long stretches of DNA. However, accuracy is limited because the device measures an electrical signal that must be translated into DNA letters (A, C, G, T). This translation, called basecalling, is done by machine learning models and has a major effect on overall accuracy. Most current basecallers read the signal step by step using recurrent neural networks and a specific decoding method. We present FishNChips, an open-source basecaller built on transformer models that rely only on attention. Instead of processing the signal sequentially, FishNChips can look at many parts of the signal at once to find patterns, which can improve how well it assigns letters. We compare FishNChips with a recurrent model we implemented, Gravlax, and with leading existing basecallers. In our evaluations, FishNChips achieves higher accuracy than Gravlax and outperforms current state-of-the-art basecallers.
[This abstract was generated with the help of AI]
Keywords
Documents
