Bias-Free Convolutional Neural Networks for Speech Enhancement
Authors
Thomsen, Jakob Krarup ; Harris, James Peter
Term
4. term
Education
Publication year
2021
Submitted on
2021-06-03
Pages
51
Abstract
Removing background noise from audio is a core task in signal processing, especially for making speech easier to understand. In recent years, deep learning models have surpassed earlier speech enhancement methods in both objective tests and listening evaluations. A persistent challenge has been reduced performance when models encounter noise levels they were not trained on. In image processing, it has been suggested that bias-free models (networks without the additive bias parameter in their layers) may generalize better across noise levels. This thesis examines whether that idea helps in speech enhancement. Four types of convolutional neural networks (CNNs) were tested in both bias-free and conventional configurations and evaluated at known and unknown signal-to-noise ratios (SNR). Overall, bias-free networks did not show a significant generalization advantage over regular networks. An exception was the UNet architecture, which performed significantly better in a bias-free setup within known SNR ranges and slightly better outside them. A conventionally configured denoising CNN achieved the best overall performance.
At fjerne støj fra lydsignaler er et centralt problem i signalbehandling, især når målet er at gøre tale lettere at høre. De seneste år har dybe læringsmodeller overgået tidligere metoder til tale-forbedring, både i objektive målinger og i lytteforsøg. En udfordring har dog været, at ydeevnen falder, når modeller møder støjniveauer, de ikke er trænet på. Inden for billedbehandling er det foreslået, at såkaldte bias-frie modeller (netværk uden den additive bias-parameter i lagene) kan generalisere bedre på tværs af støjniveauer. Dette projekt undersøger, om den idé også hjælper i tale-forbedring. Fire typer konvolutionelle neurale netværk (CNN’er) blev afprøvet i både bias-frie og konventionelle versioner og evalueret ved kendte og ukendte signal-støj-forhold (SNR). Resultatet er, at bias-frie netværk generelt ikke generaliserer betydeligt bedre end almindelige netværk. En undtagelse er UNet-arkitekturen, som i en bias-fri udgave klarede sig markant bedre i kendte SNR-områder og en smule bedre uden for disse. Samlet set præsterede en konventionelt konfigureret denoising-CNN bedst.
[This apstract has been rewritten with the help of AI based on the project's original abstract]
