Estimation of Source Parameters and Segmentation of Stereophonic Music Mixtures

Author

Hansen, Jacob Møller Hjerrild

Term

4. Term

Education

Sound and Music Computing

Publication year

2017

Submitted on

2017-05-22

Pages

Abstract

Many stereo recordings contain several overlapping sound sources. Knowing where each source sits left-to-right (its panning) helps analysis and mixing. This thesis presents a new estimator that recovers panning and other source parameters in multi-channel audio even when source pitches and harmonic strengths are unknown, and without prior knowledge of how many sources are present. The method uses an unsupervised, Bayesian framework to segment the signal over time and to estimate parameters via maximum a posteriori (MAP) modeling. Specifically, we represent the distribution of panning values with a Gaussian mixture model (GMM) and estimate its parameters using MAP based on the expectation–maximization (EM) algorithm, an iterative method. To prevent one real cluster from being described by several Gaussian components, we place a sparse Dirichlet prior on the mixture weights and prune redundant components. For time segmentation, we adopt a scheme that guarantees global optimality with respect to the MAP cost function. We evaluate the estimator through simulations on synthetic signals and real audio. Across these tests, the method performs well at estimating source parameters and the number of sources in stereophonic mixtures.

Mange stereooptagelser indeholder flere samtidige lydkilder. At vide, hvor hver kilde er placeret i venstre–højre felt (panorering), er vigtigt for analyse og miks. Denne afhandling præsenterer en ny estimator, der kan udlede panorering og andre kildeparametre i multikanalslyd, selv når tonehøjder og harmoniske styrker er ukendte, og uden forhåndsviden om antallet af kilder. Metoden bruger uovervåget læring med Bayesiansk statistik til at segmentere signalet over tid og til at estimere parametre via maksimum a posteriori (MAP) modellering. Konkret beskriver vi fordelingen af panorering med en Gaussisk blandingsmodel (GMM) og estimerer modellens parametre med MAP baseret på expectation–maximization (EM), en iterativ algoritme. For at undgå, at én reel gruppe bliver delt op i flere Gauss-komponenter, anvender vi en sparsom Dirichlet-prior på blandingsvægtene og beskærer overflødige komponenter. For tidssegmentering bruger vi en ordning, der garanterer global optimalitet i forhold til MAP-omkostningsfunktionen. Vi evaluerer estimatoren gennem simuleringer på syntetiske signaler og virkelige lydoptagelser. På tværs af disse tests viser metoden god nøjagtighed i estimering af kildeparametre og antallet af kilder i stereofoniske blandinger.

[This apstract has been rewritten with the help of AI based on the project's original abstract]

Keywords

Machine learning, Gaussian mixture model, unsupervised learning, stereophonic mixtures, signal segmentation

Documents

Download PDF
View record in AAU Student Projects

A master's thesis from Aalborg University

Estimation of Source Parameters and Segmentation of Stereophonic Music Mixtures