AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


The effect of different time-embeddings on an end-to-end diffusion speech enhancement model

Term

4. semester

Publication year

2024

Submitted on

Abstract

This research proposes an end-to-end diffusion model for speech enhancement, trained directly on raw audio waveforms. While aiming to achieve performance comparable to existing methods that rely on Short-Time Fourier Transform (STFT) representations, the model utilizes a U-Net structure with a time step embedding. Here, the embedding leverages an existing technique but applies it in a novel way for speech enhancement within a diffusion model framework. This embedding facilitates the model’s awareness of its position within the diffusion process, potentially improving performance. The results demonstrate that incorporating the time step embedding is a key factor, significantly enhancing the model’s capabilities. However, the model’s performance remains below current state-of-the-art methods like SGMSE and Facebook Demucs for speech enhancement.