Distant Speech Recognition

Student thesis: Master thesis (including HD thesis)

  • Nicolai Bæk Thomsen
4. term, Signal Processing and Computing, Master (Master Programme)
This project concerns the investigations of using a microphone array to suppress reverberation and noise such that the recognition error rate for speech recognition systems is reduced, when the distance between speaker and microphone is relatively large. The general theory of array processing is presented along with the classical Generalised Sidelobe-Canceller beamforming algorithm, which uses the MSE as optimization criteria. This algorithm is extended to adapt the filter block-wise instead of sample-wise and further adapt them using a kurtosis criteria, where it is sought to maximise the kurtosis of the output. Histograms of reverberant speech and clean speech are plotted to confirm that clean speech has a higher kurtosis and is more super-gaussian than reverberant speech. A simple cosine-modulated filter bank and Zelinski postfiltering is implemented and verified to further extend the system. The fundamental theory of HMM speech recognition along with two popular adaptation methods, VTLN and MLLR, is stated. The beamforming algorithm is benchmarked against the classical and well-known delay-sum beamformer, both with and without Zelinski postfiltering. The benchmarks were done using two data sets each consisting of 610 phonemes, but where one has synthetic generated reverberation and the other is collected from a real speaker recorded in a classroom and an auditorium. The speech recognition software, Kaldi, is used the generate recognition error rates. The reults show that the delay-sum beamformer without postfiltering performs better than maximum kurtosis Generalised Sidelobe-Canceller in all case. The reasons for this are discussed in the end.

Publication date6 Jun 2013
Number of pages69
ID: 77272921