Speech Enhancement and Noise-Robust Automatic Speech Recognition : Harvesting the Best of Two Worlds
Student thesis: Master Thesis and HD Thesis
- Carina Enevold Andersen
- Dennis Alexander Lehmann Thomsen
4. term, Signal Processing and Computing, Master (Master Programme)
This project investigates any potential relationship between the performances of noise reduction algorithms in the context of speech recognition and speech enhancement.
General theory related to speech production and hearing is presented together with the basics of the Mel-frequency cepstral coefficients speech feature.
The fundamental theory of hidden Markov model speech recognition is stated along with the standard feature-extraction method European telecommunication standards institute (ETSI) advanced frontend (AFE).
The performance of the ETSI AFE algorithm and state-of-the-art speech enhancement algorithms are investigated in both fields using speech data from the Aurora-2 database.
The aggressiveness of the noise reduction applied has been identified as a major difference between the algorithms from the two fields, and has been adjusted to increase performance in the rivalling field.
Using a logistic model, estimators of recognition performance are created for the ETSI AFE using the distortion measures for speech quality and intelligibility.
The most accurate estimator of the recognition performance of the ETSI AFE, proved to be the one designed for short-time objective intelligibility measure using a recogniser trained with clean and noisy speech data.
General theory related to speech production and hearing is presented together with the basics of the Mel-frequency cepstral coefficients speech feature.
The fundamental theory of hidden Markov model speech recognition is stated along with the standard feature-extraction method European telecommunication standards institute (ETSI) advanced frontend (AFE).
The performance of the ETSI AFE algorithm and state-of-the-art speech enhancement algorithms are investigated in both fields using speech data from the Aurora-2 database.
The aggressiveness of the noise reduction applied has been identified as a major difference between the algorithms from the two fields, and has been adjusted to increase performance in the rivalling field.
Using a logistic model, estimators of recognition performance are created for the ETSI AFE using the distortion measures for speech quality and intelligibility.
The most accurate estimator of the recognition performance of the ETSI AFE, proved to be the one designed for short-time objective intelligibility measure using a recogniser trained with clean and noisy speech data.
Language | English |
---|---|
Publication date | 3 Jun 2015 |
Number of pages | 144 |
External collaborator | Oticon Danmark AS Jesper Jensen jsj@oticon.dk Other |