Speech Enhancement and Deep Learning Speaker Separation: Separation, Identification, and Enhancement of a Conversational Partner in a Cocktail Party Environment

Authors

Nielsen, Rasmus ; Andersen, Morten ; Busk, Jonas Kronborg

Term

4. term

Education

Signal Processing and Computing, Master

Publication year

2023

Submitted on

2023-06-01

Pages

Abstract

This report looks into a possible solution tothe cocktail party effect, which depicts anenvironment with multiple different conversa-tions, background music, and other sources ofnoise. The goal is to explore possible solutionsfor a speech enhancement system consistingof a speech separation, speaker ranking, andspeech enhancement stage. This system wouldideally be capable of isolating the user’s con-versational partner.The foundation of the solution is based onthe newly proposed Minimum Overlap-Gapalgorithm for speaker ranking and enhance-ment. However, potential speech separationstages remain largely unexplored. This reportinvestigates a single-microphone setup usingdeep learning.Different state-of-the-art network architec-tures are explored, and two are chosen for fur-ther investigation. These are ConvolutionalTasNet and Dual-Path Recurrent Neural Net-work. The networks are trained and testedin 2-, 3- and 4-speaker scenarios. Possibleimprovement techniques are also explored.Several models showed potential for enhancingthe target speaker’s voice.

Keywords

Neurale Netværk ; Tale Separation ; Tale forbedring ; Taler identifikation ; RNN ; NN ; TasNet ; DPRNN ; AI ; Deep Learning ; Machine Learning

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Speech Enhancement and Deep Learning Speaker Separation: Separation, Identification, and Enhancement of a Conversational Partner in a Cocktail Party Environment