Low Complexity Neural Networks for Speech Enhancement on Consumer Products - Low Latency and Full-Band Content
Author
Term
4. Term
Education
Publication year
2025
Submitted on
2025-05-26
Pages
9
Abstract
In this work, we demonstrate the feasibility of low-latency speech enhancement using Deep Neural Networks (DNNs), aimed at the integration into consumer products, such as loudspeakers, soundbars, and portable speakers. This often requires full-band audio processing on already computationally loaded devices with limited resources. By combining state-of-the-art technologies, such as low-complexity Deep Noise Suppression (DNS) networks, asymmetric STFT-iSTFT windowing scheme and dataset for Cinematic Audio Source Separation (CASS), we achieve real-time execution on various platforms and low algorithmic latency of 11 ms. The presented models have been designed thanks to an objective evaluation-guided process, followed by a perceptual subjective evaluation to validate their performance. While promising and sufficient for the demonstrative nature of the work, the perceptual performance is not satisfactory for a customer ready implementation. However, the results support the potential of our approach, shortening the gap between research and real-world application in consumer electronics.
Documents
