AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

Single-Channel BLSTM Enhancement for Language Identification

Author(s)

Term

4. semester

Education

Publication year

2018

Submitted on

2018-06-07

Pages

54 pages

Abstract

Dette projekt foreslår at anvende dybe neurale netværk (DNN)-baseret enkelt kanal taleforstærkning (SE) på sprog identificering. 2017 language recognition evaluation (LRE17) introducerede støjfyldt lyd fra videoer udover telefon samtalerne fra tidligere evalueringer. Derfor var der et behov for at adaptere modeller fra telefon samtaler til støjfyldt lyd fra video domænet, for at opnå optimal ydeevne. Adapteringen kræver viden om lyd domænet. I stedet foreslår vi et forbehandlings trin der renser den støjfyldte lyd med taleforstærkning. Vi brugte en BLSTM DNN model til at estimere en spektral maske. Det støjfyldte spectrogram bliver taleforstærket når det er multipliceret med masken, og bliver derefter transformeret tilbage til tids domænet ved at bruge den uændrede støjfyldte tales fase. Experimenterne viser en betydelig forbedring til sprog genkendelse af støjfyldt tale, for systemer med og uden domæne adaptering, samtidigt med at den bevare ydeevne i telefonlyds domænet. I det bedste adapterede nyeste flaskehals i-vector system er den relative forbedring 11.3 for støjfyldt tale.

This project applies deep neural network (DNN)-based single-channel speech enhancement (SE) to language identification. The 2017 language recognition evaluation (LRE17) introduced noisy audio from videos, in addition to the telephone conversations from past challenges. Because of that, adapting models from telephone speech to noisy speech from the video domain was required to obtain optimum performance. Such adaptation requires knowledge of the audio domain. %and (%tegn her)availability of in-domain data. Instead we propose to use speech enhancement preprocessing to clean up the noisy audio. We used a BLSTM DNN model to predict a spectral mask. The noisy spectrogram is enhanced when multiplied by the mask, and it is transformed back into the time domain by using the noisy speech phase. The experiments show significant improvement to language identification of noisy speech, for systems with and without domain adaptation, while preserving performance in the telephone audio domain. In the best adapted state-of-the-art bottleneck i-vector system the relative improvement is 11.3\% for noisy speech.

Keywords

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.