Speech Enhancement and Deep Learning Speaker Separation: Separation, Identification, and Enhancement of a Conversational Partner in a Cocktail Party Environment
Student thesis: Master Thesis and HD Thesis
- Rasmus Nielsen
- Morten Andersen
- Jonas Kronborg Busk
4. term, Signal Processing and Computing, Master (Master Programme)
This report looks into a possible solution tothe cocktail party effect, which depicts anenvironment with multiple different conversa-tions, background music, and other sources ofnoise. The goal is to explore possible solutionsfor a speech enhancement system consistingof a speech separation, speaker ranking, andspeech enhancement stage. This system wouldideally be capable of isolating the user’s con-versational partner.The foundation of the solution is based onthe newly proposed Minimum Overlap-Gapalgorithm for speaker ranking and enhance-ment. However, potential speech separationstages remain largely unexplored. This reportinvestigates a single-microphone setup usingdeep learning.Different state-of-the-art network architec-tures are explored, and two are chosen for fur-ther investigation. These are ConvolutionalTasNet and Dual-Path Recurrent Neural Net-work. The networks are trained and testedin 2-, 3- and 4-speaker scenarios. Possibleimprovement techniques are also explored.Several models showed potential for enhancingthe target speaker’s voice.
Language | English |
---|---|
Publication date | 1 Jun 2023 |
Number of pages | 79 |
External collaborator | Oticon Danmark AS Poul Hoang phoa@demant.com Information group |