AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Vector quantization (VQ)-based generative DNN models for low delay speech and audio coding: A model-based approach to encoding packets for packet-loss robustness.

Author

Term

4. semester

Publication year

2024

Submitted on

Pages

57

Abstract

Audio streaming relies on compressing sound so it uses fewer bits per second (bit rate). When audio is sent over networks, the data is split into small packets that can be lost (packet loss), causing dropouts. This project explored model-based methods to make compressed audio more robust to packet loss. We built three models with different bit rates—768 kbps, 192 kbps, and 6 kbps—and trained them on the LibriTTS Corpus, where audio samples had a bit rate of 384 kbps. In tests, the largest (highest-bitrate) model showed the strongest robustness: it maintained good reconstruction across a wide range of packet loss probabilities from 20% to 80%. The main limitation appeared to be the underlying autoencoders (neural networks that compress and reconstruct audio). Future work could apply the same approach with improved autoencoder frameworks to achieve better results at lower bit rates.

Streaming af lyd kræver komprimering, så der bruges færre bit pr. sekund (bitrate). Når lyd sendes over netværk, opdeles data i små pakker, som kan gå tabt (pakketab), hvilket kan give udfald. Dette projekt undersøgte modelbaserede metoder til at gøre komprimeret lyd mere robust over for pakketab. Vi byggede tre modeller med forskellige bitrater—768 kbps, 192 kbps og 6 kbps—og trænede dem på LibriTTS-corpusset, hvor lydprøverne havde en bitrate på 384 kbps. I test viste den største (højeste bitrate) model den bedste robusthed: den bevarede en god genskabelse på tværs af et bredt spænd af sandsynligheder for pakketab fra 20 % til 80 %. Den vigtigste begrænsning syntes at være de underliggende autoencodere (neurale netværk, der komprimerer og genskaber lyd). Fremtidigt arbejde kan anvende samme tilgang med forbedrede autoencoder-rammer for at opnå bedre resultater ved lavere bitrater.

[This apstract has been rewritten with the help of AI based on the project's original abstract]