Effects of Hyperparameter Tuning and KnowledgeDistillation on the State-of-the-Art Basecaller Bonito
Authors
Frausing, Jonatan Groth ; Bargsteen, Kasper Dissing
Term
4. term
Education
Publication year
2020
Submitted on
2020-06-11
Pages
57
Abstract
Basecalling er det trin, hvor maskinlæringsmodeller omsætter rå signaler til sekvenser af baser. Som i mange andre opgaver er der en afvejning mellem, hvor hurtigt man kan forudsige, og hvor nøjagtige resultaterne er. Bonito, en basecaller bygget på QuartzNet-arkitekturen, opnår en nøjagtighed på linje med Guppy, som regnes for state of the art blandt basecallere. Fordi Bonito bruger et konvolutionelt neuralt netværk (der behandler input parallelt) i stedet for et rekurrent neuralt netværk (der behandler data trin for trin), har det potentiale til at være hurtigere. I dette arbejde undersøger vi, hvordan justering af Bonitos hyperparametre kan øge hastigheden uden at forringe nøjagtigheden. For at modvirke det typiske tab af nøjagtighed i mindre netværk anvender vi også videndestillation, hvor en mindre elevmodel lærer af en større lærermodel. Vores forsøg viser, at brug af dilation sammen med mindre konvolutionskerner forbedrer både hastighed og nøjagtighed i Bonito. Vi finder desuden, at videndestillation øger nøjagtigheden af basecallere, med de største forbedringer i større basecallere. Samlet set tyder resultaterne på, at videndestillation er gavnlig på tværs af modelstørrelser og bør anvendes uanset basecallerens størrelse.
Basecalling is the step in which machine learning models convert raw signals into sequences of bases. Like many such tasks, it faces a trade-off between how fast predictions can be made and how accurate they are. Bonito, a basecaller built on the QuartzNet architecture, achieves accuracy comparable to the state-of-the-art Guppy basecaller. Because Bonito uses a convolutional neural network (which processes inputs in parallel) rather than a recurrent one (which processes data step by step), it has the potential to run faster. This thesis examines how tuning Bonito's hyperparameters can increase prediction speed without harming accuracy. To counter the typical loss of accuracy seen in smaller networks, we also apply knowledge distillation, in which a smaller student model learns from a larger teacher. Our experiments indicate that using dilation together with smaller convolutional kernels improves both speed and accuracy in Bonito. We further find that knowledge distillation increases basecalling accuracy, with the largest improvements on larger basecallers. Overall, the results suggest that knowledge distillation is beneficial across model sizes and should be applied regardless of basecaller size.
[This abstract was generated with the help of AI]
Keywords
Documents
