AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Refining Homopolymer Predictions in Nanopore Sequences

Authors

; ;

Term

4. term

Education

Publication year

2020

Submitted on

Abstract

Denne afhandling adresserer et vedvarende problem i basecalling af Oxford Nanopore-sekvenser: pålidelig aflæsning af homopolymerer (lange stræk af samme base), hvor den elektriske strøm ændrer sig minimalt, og translocationshastigheden kan variere. Forfatterne viser, at signal-længder for homopolymerer med forskellig længde overlapper betydeligt, hvilket forklarer tidligere vanskeligheder. De foreslår en softwarebaseret pipeline, der understøtter en eksisterende basecaller ved først at segmentere nanoporesignalet i homo- og ikke-homopolymer-regioner med en U-Net-baseret model og derefter estimere homopolymer-længden med en ResNet-baseret model. Kombineret med ONTs Bonito-model forbedrer denne tilgang identitetsraten med 2% (fra 81% til 83%). Arbejdet motiveres af, at unøjagtigheder i homopolymerer kan give systematiske fejl i konsensussekvenser og påvirke biologiske analyser, og det viser, at målrettet behandling af homopolymerer kan give en målbar forbedring uden ændringer af selve nanoporen.

This thesis tackles a persistent challenge in Oxford Nanopore basecalling: reliable interpretation of homopolymers (long runs of the same base), where the electrical signal changes little and translocation speed can vary. The authors show that signal lengths for homopolymers of different sizes overlap substantially, highlighting why these regions are error-prone. They propose a software pipeline to augment an existing basecaller by first segmenting the nanopore signal into homo- and non-homopolymer regions using a U-Net–based model, then estimating homopolymer length with a ResNet–based model. Integrated with ONT’s Bonito model, this approach improves the identity rate by 2% (from 81% to 83%). Motivated by the impact of homopolymer errors on consensus sequences and downstream analyses, the work demonstrates that specialized processing of homopolymers can yield measurable accuracy gains without modifying the nanopore itself.

[This summary has been generated with the help of AI directly from the project (PDF)]