AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Wind noise reduction in speech signals using non-negative matrix factorisation

Author

Term

4. Term

Publication year

2018

Submitted on

Pages

105

Abstract

Formålet var at implementere ikke-negativ matrixfaktorisering (NMF) og vurdere, om metoden kan udtrække tale fra optagelser med ikke-stationær støj, her specifikt vindstøj, som ændrer sig over tid. NMF blev sammenlignet med en anerkendt metode, ikke-negativ sparsom kodning (NNSC), med ubehandlede signaler samt med to støjreduceringsmetoder designet til stationær støj (støj, der er mere konstant). To "ordbøger" med lydmønstre blev trænet: én for tale og én for vind. Metoderne blev testet under forskellige betingelser: antallet af tale- og vindkomponenter, det indledende signal-støj-forhold (SNR) og to værdier af beta-divergens (en måde at måle forskel på under faktoriseringen). Ydelsen blev vurderet med PESQ (tale­kvalitet) og STOI (forståelighed). Udgangens SNR (SNRout) blev målt for NMF og NNSC. Resultaterne viser, at NMF generelt ikke lykkedes med at adskille tale og vind tilfredsstillende: den scorede samlet lavere end både de ubehandlede signaler og de to stationære støjmetoder, og ydede for det meste på linje med NNSC. Da NNSC tidligere er blevet rapporteret at give gode resultater, kan det tyde på, at antallet af træningssignaler til tale- og vind-ordbøgerne ikke var tilstrækkeligt til at generalisere til nye, utrænede signaler. Der blev samtidig observeret betydelig forvrængning i de behandlede lydsignaler, hvilket kan indikere, at ordbøgerne til tider trak elementer fra den forkerte kilde.

The goal was to implement non-negative matrix factorization (NMF) and assess whether it can extract speech from recordings with non-stationary noise, here wind noise that changes over time. NMF was compared with a recognized approach, non-negative sparse coding (NNSC), with unprocessed signals, and with two noise-reduction methods designed for stationary noise (more constant noise). Two dictionaries of sound patterns were trained: one for speech and one for wind. The methods were tested under different conditions: the number of speech and wind components, the initial signal-to-noise ratio (SNR), and two beta-divergence values (a way of measuring differences during factorization). Performance was evaluated using PESQ (speech quality) and STOI (intelligibility). Output SNR (SNRout) was measured for NMF and NNSC. The results show that NMF generally did not separate speech and wind well: overall it scored lower than both the unprocessed signals and the two stationary-noise methods, and most of the time performed similarly to NNSC. Since NNSC has been reported to give good results, this may indicate that the number of training signals for the speech and wind dictionaries was too limited to generalize to unseen signals. Considerable distortion was also observed, suggesting the dictionaries sometimes captured elements from the wrong source.

[This abstract was generated with the help of AI]