AudioConv: A new source separation metric: A new source separation metric
Translated title
AudioConv: A new source separation metric
Author
Woodward, Daniel Michael
Term
4. Term
Education
Publication year
2022
Submitted on
2022-12-21
Pages
35
Abstract
Blind source separation in music aims to split a mixed recording into its individual instruments without prior knowledge. Today, such models are often judged by a single number, the Signal-to-Distortion Ratio (SDR), but studies report that SDR does not align with how listeners rate the results. This thesis introduces audioConv, a deep-learning, perceptually inspired evaluation metric designed to better reflect human listening. audioConv is implemented as a convolutional neural network and incorporates additional features from well-established audio models. We assess audioConv by examining how strongly its scores correlate with listener ratings. The findings show potential but also highlight the need for further improvements and high-quality data to make the metric more robust.
Blind kildeadskillelse i musik handler om at dele en blandet optagelse op i de enkelte instrumenter uden forudgående viden. I dag vurderes sådanne modeller ofte med et enkelt tal, Signal-to-Distortion Ratio (SDR), men litteraturen viser, at SDR ikke stemmer overens med, hvordan lyttere bedømmer resultaterne. Denne afhandling præsenterer audioConv, en dyb læringsbaseret, perceptuelt inspireret evalueringsmetrik, der skal afspejle menneskers lytteoplevelse bedre. audioConv er implementeret som et konvolutionelt neuralt netværk og bruger ekstra egenskaber (features) fra veletablerede lydmodeller. Vi analyserer audioConv ved at undersøge, hvor godt dens scores korrelerer med lytterbedømmelser. Resultaterne viser potentiale, men peger også på behov for yderligere forbedringer og data af høj kvalitet for at gøre metrikken mere robust.
[This apstract has been rewritten with the help of AI based on the project's original abstract]
Keywords
