Low Complexity Neural Networks for Speech Enhancement on Consumer Products - Low Latency and Full-Band Content

Author

Ascari, Giacomo

Term

4. Term

Education

Sound and Music Computing

Publication year

2025

Submitted on

2025-05-26

Pages

Abstract

In this work, we demonstrate the feasibility of low-latency speech enhancement using Deep Neural Networks (DNNs), aimed at the integration into consumer products, such as loudspeakers, soundbars, and portable speakers. This often requires full-band audio processing on already computationally loaded devices with limited resources. By combining state-of-the-art technologies, such as low-complexity Deep Noise Suppression (DNS) networks, asymmetric STFT-iSTFT windowing scheme and dataset for Cinematic Audio Source Separation (CASS), we achieve real-time execution on various platforms and low algorithmic latency of 11 ms. The presented models have been designed thanks to an objective evaluation-guided process, followed by a perceptual subjective evaluation to validate their performance. While promising and sufficient for the demonstrative nature of the work, the perceptual performance is not satisfactory for a customer ready implementation. However, the results support the potential of our approach, shortening the gap between research and real-world application in consumer electronics.

Documents

Download
View record in AAU Student Projects

An executive master's programme thesis from Aalborg University

Low Complexity Neural Networks for Speech Enhancement on Consumer Products - Low Latency and Full-Band Content