AAU Student Projects is unavailable between June 15th 1.30pm and 17th 1.30pm due to planned system maintenance. The projects cannot be downloaded during this period.
AAU Student Projects - visit Aalborg University's student projects portal
An executive master's programme thesis from Aalborg University
Book cover


Sound2Serum - Neural Sound Matching System for Serum 2

Author

Term

4. Term

Publication year

2026

Submitted on

Pages

61

Abstract

Sound2Serum is a machine-learning system that helps match a reference sound on the Serum 2 synthesizer. Given an audio example, it predicts the synth’s parameter settings that should recreate it, automating the early, trial‑and‑error stage of sound design and giving producers a solid starting point. The model adapts the Sound2Synth approach to Serum 2: it listens to several complementary audio representations, processes them in separate branches, fuses them, and then predicts 40 continuous controls and 5 categorical choices. We built a dataset of 40,000 paired sounds and settings across four common categories (sustained bass, pad, lead, pluck) using a custom data generation app. Training ran for three epochs (full passes over the data) on AAU’s CLAAUDIA cluster; both training and validation loss decreased steadily, with no signs of overfitting. We evaluated the system with objective audio similarity measures (MFCCD, multi‑scale spectral loss) and a MUSHRA listening test with 19 participants. Both show that, at this stage, outputs still sound noticeably different from the targets and were rated below a 300 Hz low‑pass anchor in 82% of trials. The key takeaway is that, at this quality level, objective metrics and human ratings agree on the ranking of results; whether that agreement holds when outputs get closer to the target remains unclear and likely requires training the model more thoroughly.

Sound2Serum er et maskinlæringssystem, der hjælper med at matche en referencelyd på synthesizeren Serum 2. Givet et lydeksempel forudsiger det de indstillinger (parametre), der bør genskabe lyden, så den tidlige, prøv‑og‑fejl‑fase i lyddesign automatiseres og producere får et solidt udgangspunkt. Modellen tilpasser Sound2Synth‑tilgangen til Serum 2: den bruger flere supplerende lydrepræsentationer, behandler dem i separate grene, samler dem og forudsiger derefter 40 kontinuerte kontroller og 5 kategorivalg. Vi byggede et datasæt med 40.000 par af lyde og indstillinger på tværs af fire almindelige kategorier (langvarig bas, pad, lead, pluck) ved hjælp af en specialbygget data‑genereringsapp. Træningen kørte tre epoker (fulde gennemløb af data) på AAU’s CLAAUDIA‑klynge; både trænings‑ og validations‑tab faldt støt, uden tegn på overfitting. Systemet blev evalueret med objektive mål for lydlig lighed (MFCCD, multiskala spektraltab) samt en MUSHRA‑lytningstest med 19 deltagere. Begge dele viser, at resultaterne på nuværende tidspunkt stadig ligger tydeligt fra målene og blev bedømt lavere end et 300 Hz low‑pass‑anker i 82% af forsøgene. Hovedkonklusionen er, at på dette kvalitetsniveau er der overensstemmelse mellem objektive mål og menneskelige vurderinger af rangordningen; om den overensstemmelse også gælder, når resultaterne kommer tættere på målet, er uklart og kræver sandsynligvis mere grundig træning.

[This apstract has been rewritten with the help of AI based on the project's original abstract]