• Rasmus Nielsen
  • Morten Andersen
  • Jonas Kronborg Busk
This report looks into a possible solution tothe cocktail party effect, which depicts anenvironment with multiple different conversa-tions, background music, and other sources ofnoise. The goal is to explore possible solutionsfor a speech enhancement system consistingof a speech separation, speaker ranking, andspeech enhancement stage. This system wouldideally be capable of isolating the user’s con-versational partner.The foundation of the solution is based onthe newly proposed Minimum Overlap-Gapalgorithm for speaker ranking and enhance-ment. However, potential speech separationstages remain largely unexplored. This reportinvestigates a single-microphone setup usingdeep learning.Different state-of-the-art network architec-tures are explored, and two are chosen for fur-ther investigation. These are ConvolutionalTasNet and Dual-Path Recurrent Neural Net-work. The networks are trained and testedin 2-, 3- and 4-speaker scenarios. Possibleimprovement techniques are also explored.Several models showed potential for enhancingthe target speaker’s voice.
Udgivelsesdato1 jun. 2023
Antal sider79
Ekstern samarbejdspartnerOticon Danmark AS
Poul Hoang phoa@demant.com
ID: 532606936