AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Concatenated convolution framelets in audio compression

Authors

; ;

Term

4. semester

Publication year

2018

Submitted on

Pages

70

Abstract

Dette projekt undersøger convolution framelets, en nyere metode til at repræsentere signaler med redundante 'tight frames' (Yin m.fl., 2017), til lydkomprimering. For at tilpasse metoden til komprimering foreslår vi at styre redundansen ved at sammenkæde flere mindre redundante framelets. Det gør det muligt at bruge flere patch-størrelser (vindueslængder), men med færre patches af hver størrelse. Komprimeringskæden, inspireret af Ravelli m.fl. (2008), har to trin: (1) beskrive lyden med et sparsomt sæt framekoefficienter og (2) kode disse koefficienter effektivt. Sparsitet opnås med Orthogonal Matching Pursuit (OMP), en grådig algoritme, der vælger få vigtige komponenter (Foucart og Rauhut, 2013). Til kodning anvender vi bitplane run-length-kodning med interleaving, som hos Ravelli m.fl. Vi tester på et sæt musikuddrag og vurderer kvaliteten ved perceptuel evaluering af lydkvalitet. Resultaterne sammenlignes med en MP3-koder og de ikke-psykoakustiske resultater rapporteret af Ravelli m.fl. Vores metode klarer sig dårligere ved lave bithastigheder.

This project investigates convolution framelets, a recently introduced way to represent signals using redundant tight frames (Yin et al., 2017), for audio compression. To adapt the method to compression, we control redundancy by concatenating several less-redundant framelets. This allows the use of multiple patch sizes (window lengths) while using fewer patches of each size. Our compression pipeline, inspired by Ravelli et al. (2008), has two stages: (1) represent the audio with a sparse set of frame coefficients and (2) encode these coefficients efficiently. We obtain sparsity with Orthogonal Matching Pursuit (OMP), a greedy algorithm that selects a few important components (Foucart and Rauhut, 2013). For coding, we follow a bitplane run-length scheme with interleaving, as in Ravelli et al. We test on a set of music excerpts and estimate quality using perceptual evaluation of audio quality. We compare our results to an MP3 encoder and to the non-psychoacoustic results reported by Ravelli et al. Our method does not perform as well at low bit rates.

[This abstract was generated with the help of AI]