AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


A Hybrid Approach for Speech Enhancement with DNN Supported Acoustic Beamforming

Author

Term

4. term

Publication year

2018

Submitted on

Pages

80

Abstract

Mange moderne høreapparater har flere mikrofoner. Når man bruger dem samlet, kan man fokusere på en taler og dæmpe støj (beamforming), hvilket kan forbedre forståelsen i travle miljøer som cocktailfester eller restauranter. Men ved lavt signal-støj-forhold (SNR) kan klassiske, modelbaserede beamformere have svært ved at fungere, fordi de afhænger af ukendte parametre, der er svære at estimere pålideligt. Dette speciale undersøger, om dybe neurale netværk (DNN'er) kan støtte akustiske beamformere ved at estimere den rumlige information, de behøver. Konkret estimerer DNN'et ankomstretning (DOA) for den ønskede tale og den relative overføringsfunktion (RTF), som beskriver, hvordan lyd fra en given retning opfanges i de forskellige mikrofoner. Vi foreslår tre DNN-understøttede beamformere: - En MPDR (minimum power distortionless response) beamformer støttet af et DNN, der estimerer DOA. - En MPDR beamformer støttet af et DNN, der estimerer RTF-vektorer. - En bayesiansk beamformer, hvor DNN'et estimerer posteriore sandsynligheder. I forsøg med isotrop bablestøj (støj fra alle retninger, som i en menneskemængde) overgik de DNN-understøttede beamformere en modelbaseret bayesiansk beamformer målt på standardmål: ESTOI (opfattelighed), PESQ (oplevet kvalitet) og segmenteret SNR.

Many modern hearing aids include multiple microphones. Using them together makes it possible to focus on a talker and suppress noise (beamforming), which can improve understanding in busy places such as cocktail parties or restaurants. However, when the signal-to-noise ratio (SNR) is low, classic model-based beamformers can struggle because they depend on parameters that are hard to estimate reliably. This thesis investigates whether deep neural networks (DNNs) can support acoustic beamformers by estimating the spatial information they need. In particular, the DNN estimates the direction of arrival (DOA) of the target speech and the relative transfer function (RTF), which describes how sound from a given direction is received across the microphones. We propose three DNN-supported beamformers: - An MPDR (minimum power distortionless response) beamformer aided by a DNN that estimates DOA. - An MPDR beamformer aided by a DNN that estimates RTF vectors. - A Bayesian beamformer in which the DNN estimates posterior probabilities. In experiments with isotropic babble noise (noise arriving from all directions, as in a crowded room), the DNN-supported beamformers outperformed a model-based Bayesian beamformer on standard objective measures: ESTOI (intelligibility), PESQ (perceived quality), and segmental SNR.

[This abstract was generated with the help of AI]