Author(s)
Term
4. term
Education
Publication year
2023
Submitted on
2023-06-01
Pages
79 pages
Abstract
This report looks into a possible solution tothe cocktail party effect, which depicts anenvironment with multiple different conversa-tions, background music, and other sources ofnoise. The goal is to explore possible solutionsfor a speech enhancement system consistingof a speech separation, speaker ranking, andspeech enhancement stage. This system wouldideally be capable of isolating the user’s con-versational partner.The foundation of the solution is based onthe newly proposed Minimum Overlap-Gapalgorithm for speaker ranking and enhance-ment. However, potential speech separationstages remain largely unexplored. This reportinvestigates a single-microphone setup usingdeep learning.Different state-of-the-art network architec-tures are explored, and two are chosen for fur-ther investigation. These are ConvolutionalTasNet and Dual-Path Recurrent Neural Net-work. The networks are trained and testedin 2-, 3- and 4-speaker scenarios. Possibleimprovement techniques are also explored.Several models showed potential for enhancingthe target speaker’s voice.
Keywords
Neurale Netværk ; Tale Separation ; Tale forbedring ; Taler identifikation ; RNN ; NN ; TasNet ; DPRNN ; AI ; Deep Learning ; Machine Learning
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.