Vector quantization (VQ)-based generative DNN models for low delay speech and audio coding: A model-based approach to encoding packets for packet-loss robustness.
Author
Term
4. semester
Publication year
2024
Submitted on
2024-05-31
Pages
57
Abstract
Recent developments in the state-of-the-art of audio compression have led to models achieving low bit rates while maintaining a good reconstruction of the compressed embeddings. The findings make it interesting to explore model-based techniques for making audio packages robust to packet loss, which led to the development of three models with varying bit rates in this project. The three models had bit rates of 768kbps, 192kbps and 6kbps and were trained on the LibriTTS Corpus dataset, where data samples had a bit rate of 384kbps. The largest model showed the best potential for package loss, where it had a good reconstruction ranging from 20\% to 80\% packet loss probability. The main limitation of the results seemed to be the underlying autoencoders, which opens up for future work applying the same technique for more improved frameworks at lower bitrates.
Documents
