Single-Channel BLSTM Enhancement for Language Identification

Student thesis: Master thesis (including HD thesis)

  • Peter Sibbern Frederiksen
4. semester, Mathematical Engineering, Master (Master Programme)
This project applies deep neural network (DNN)-based single-channel speech enhancement (SE) to language identification.
The 2017 language recognition evaluation (LRE17) introduced noisy audio from videos, in addition to the telephone conversations from past challenges.
Because of that, adapting models from telephone speech to noisy speech from the video domain was required to obtain optimum performance.
Such adaptation requires knowledge of the audio domain. %and (%tegn her)availability of in-domain data.
Instead we propose to use speech enhancement preprocessing to clean up the noisy audio.
We used a BLSTM DNN model to predict a spectral mask.
The noisy spectrogram is enhanced when multiplied by the mask, and it is transformed back into the time domain by using the noisy speech phase.
The experiments show significant improvement to language identification of noisy speech, for systems with and without domain adaptation, while preserving performance in the telephone audio domain. In the best adapted state-of-the-art bottleneck i-vector system the relative improvement is 11.3\% for noisy speech.
Publication date7 Jun 2018
Number of pages54
ID: 280550788