AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

The effect of different time-embeddings on an end-to-end diffusion speech enhancement model

Author(s)

Term

4. semester

Education

Publication year

2024

Submitted on

2024-05-30

Abstract

This research proposes an end-to-end diffusion model for speech enhancement, trained directly on raw audio waveforms. While aiming to achieve performance comparable to existing methods that rely on Short-Time Fourier Transform (STFT) representations, the model utilizes a U-Net structure with a time step embedding. Here, the embedding leverages an existing technique but applies it in a novel way for speech enhancement within a diffusion model framework. This embedding facilitates the model’s awareness of its position within the diffusion process, potentially improving performance. The results demonstrate that incorporating the time step embedding is a key factor, significantly enhancing the model’s capabilities. However, the model’s performance remains below current state-of-the-art methods like SGMSE and Facebook Demucs for speech enhancement.

Keywords

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.