Estimation of Source Parameters and Segmentation of Stereophonic Music Mixtures
Author
Hansen, Jacob Møller Hjerrild
Term
4. Term
Education
Publication year
2017
Submitted on
2017-05-22
Pages
95
Abstract
Mange stereooptagelser indeholder flere samtidige lydkilder. At vide, hvor hver kilde er placeret i venstre–højre felt (panorering), er vigtigt for analyse og miks. Denne afhandling præsenterer en ny estimator, der kan udlede panorering og andre kildeparametre i multikanalslyd, selv når tonehøjder og harmoniske styrker er ukendte, og uden forhåndsviden om antallet af kilder. Metoden bruger uovervåget læring med Bayesiansk statistik til at segmentere signalet over tid og til at estimere parametre via maksimum a posteriori (MAP) modellering. Konkret beskriver vi fordelingen af panorering med en Gaussisk blandingsmodel (GMM) og estimerer modellens parametre med MAP baseret på expectation–maximization (EM), en iterativ algoritme. For at undgå, at én reel gruppe bliver delt op i flere Gauss-komponenter, anvender vi en sparsom Dirichlet-prior på blandingsvægtene og beskærer overflødige komponenter. For tidssegmentering bruger vi en ordning, der garanterer global optimalitet i forhold til MAP-omkostningsfunktionen. Vi evaluerer estimatoren gennem simuleringer på syntetiske signaler og virkelige lydoptagelser. På tværs af disse tests viser metoden god nøjagtighed i estimering af kildeparametre og antallet af kilder i stereofoniske blandinger.
Many stereo recordings contain several overlapping sound sources. Knowing where each source sits left-to-right (its panning) helps analysis and mixing. This thesis presents a new estimator that recovers panning and other source parameters in multi-channel audio even when source pitches and harmonic strengths are unknown, and without prior knowledge of how many sources are present. The method uses an unsupervised, Bayesian framework to segment the signal over time and to estimate parameters via maximum a posteriori (MAP) modeling. Specifically, we represent the distribution of panning values with a Gaussian mixture model (GMM) and estimate its parameters using MAP based on the expectation–maximization (EM) algorithm, an iterative method. To prevent one real cluster from being described by several Gaussian components, we place a sparse Dirichlet prior on the mixture weights and prune redundant components. For time segmentation, we adopt a scheme that guarantees global optimality with respect to the MAP cost function. We evaluate the estimator through simulations on synthetic signals and real audio. Across these tests, the method performs well at estimating source parameters and the number of sources in stereophonic mixtures.
[This abstract was generated with the help of AI]
Documents
