Author(s)
Term
4. term
Education
Publication year
2013
Submitted on
2013-06-05
Pages
69 pages
Abstract
I dette projekt undersøges en måde, hvor flere mikrofoner i et array kan bruges til at undertrykke efterklang og støj så ledes at automatisk talegendekelsessystemer opnår bedre resultater i tilfælde, hvor afstanden mellem taler og mikrofon er relativ stor. Den fundamentale array signalbehandlingsteori er kort beskrevet sammen med udledning af den klassiske Generalised Sidelobe-Canceller array algoritme, som anvender MSE som optimeringskriterie. Denne algoritme er udvidet således, at det adaptive filter estimeres i forhold til at maksimere kurtosis af outputtet. Ydermere opdateres filteret kun blok vist. Histogrammer af ren tale og tale med efterklang er plottet, hvilket bekræfter at ren tale er mere super-gaussisk og har en højere kurtosis værdi end tale med efterklang. En simpel filter bank og Zelinski postfiltrering implementeres og verficeres gennem test. Den fundamentale teori bag HMM talegenkendelse præsenteres sammen med to metoder, hvor taleren og de akustiske omgivelser kan tilpasses til den eksisterende model. Algoritmen testes mod den velkendte delay-sum beamformer med og uden postfiltrering. Der anvendes to typer datasæt, hver bestående af 610 phonemer. En type datasæt, hvor efterklangen er genereret syntetisk vha. MATLAB og en type, hvor data er optaget i et klasseværelse og et auditorie. Som talegenkendelsessystem anvendes Kaldi. Resultaterne viser, at delay-sum beamformer uden postfiltrering opnår bedre resultater end maksimum kurtosis Generalised Sidelobe-Canceller i alle tilfælde. Årsagerne hertil diskuteres til sidst.
This project concerns the investigations of using a microphone array to suppress reverberation and noise such that the recognition error rate for speech recognition systems is reduced, when the distance between speaker and microphone is relatively large. The general theory of array processing is presented along with the classical Generalised Sidelobe-Canceller beamforming algorithm, which uses the MSE as optimization criteria. This algorithm is extended to adapt the filter block-wise instead of sample-wise and further adapt them using a kurtosis criteria, where it is sought to maximise the kurtosis of the output. Histograms of reverberant speech and clean speech are plotted to confirm that clean speech has a higher kurtosis and is more super-gaussian than reverberant speech. A simple cosine-modulated filter bank and Zelinski postfiltering is implemented and verified to further extend the system. The fundamental theory of HMM speech recognition along with two popular adaptation methods, VTLN and MLLR, is stated. The beamforming algorithm is benchmarked against the classical and well-known delay-sum beamformer, both with and without Zelinski postfiltering. The benchmarks were done using two data sets each consisting of 610 phonemes, but where one has synthetic generated reverberation and the other is collected from a real speaker recorded in a classroom and an auditorium. The speech recognition software, Kaldi, is used the generate recognition error rates. The reults show that the delay-sum beamformer without postfiltering performs better than maximum kurtosis Generalised Sidelobe-Canceller in all case. The reasons for this are discussed in the end.
Keywords
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.