Vector quantization (VQ)-based generative DNN models for low delay speech and audio coding: A model-based approach to encoding packets for packet-loss robustness.

Author

Kristensen, Lukas Bisgaard

Term

4. semester

Education

Artificial Intelligence, Vision and Sound, MSc.

Publication year

2024

Submitted on

2024-05-31

Pages

Abstract

Recent developments in the state-of-the-art of audio compression have led to models achieving low bit rates while maintaining a good reconstruction of the compressed embeddings. The findings make it interesting to explore model-based techniques for making audio packages robust to packet loss, which led to the development of three models with varying bit rates in this project. The three models had bit rates of 768kbps, 192kbps and 6kbps and were trained on the LibriTTS Corpus dataset, where data samples had a bit rate of 384kbps. The largest model showed the best potential for package loss, where it had a good reconstruction ranging from 20\% to 80\% packet loss probability. The main limitation of the results seemed to be the underlying autoencoders, which opens up for future work applying the same technique for more improved frameworks at lower bitrates.

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Vector quantization (VQ)-based generative DNN models for low delay speech and audio coding: A model-based approach to encoding packets for packet-loss robustness.