Wavize: A Deep Learning Application for Enhanced Categorization of Diverse Bass Sound Designs in Electronic Dance Music
Translated title
Wavize: En Dyb Læring Applikation til Forbedret Kategorisering af Baslyddesign i Elektronisk Dansemusik
Author
Andersen, Simon
Term
4. Term
Education
Publication year
2024
Submitted on
2024-05-24
Pages
69
Abstract
Denne afhandling undersøger, om dyb læring kan forbedre automatiseret kategorisering af moderne bas-lyddesign i elektronisk dansemusik, så musikere lettere kan finde og bruge deres samples. Udgangspunktet er, at eksisterende sample managers ofte kun tilbyder brede eller upræcise etiketter for syntetiske baslyde. Der blev kurateret et balanceret datasæt med 210 bassamples fordelt på syv kategorier (808, acid, brass, growl, reese, slap og sub) og udvidet til 34.020 eksempler via data augmentation. En lydklassifikationsmodel baseret på finjustering af DistilHuBERT blev udviklet og integreret i en brugervenlig Electron-desktopapplikation. Systemet blev evalueret med almindelige ML-metrikker samt latenstests og brugbarhedstest. Modellen opnåede 91,3% nøjagtighed og en AUC på 0,81, med høj præcision og recall i flere kategorier (bl.a. 808, brass, growl, reese og slap), mens acid viste sporadiske fejlklassifikationer. For 100 forudsigelser var den længste målte latenstid 6,17 sekunder, og brugertests indikerede, at løsningen var effektiv og let at anvende. Afhandlingen demonstrerer, at en AI-drevet sample manager målrettet bas-lyddesign kan forbedre organisering og genfinding af samples og danne grundlag for fremtidige udvidelser.
This thesis investigates whether deep learning can improve automatic categorization of contemporary bass sound designs in electronic dance music to help musicians find and use their samples more efficiently. The motivation is that current sample managers often provide broad or inaccurate labels for synthetic bass sounds. A balanced dataset of 210 bass samples across seven categories (808, acid, brass, growl, reese, slap, and sub) was curated and expanded to 34,020 examples through data augmentation. A DistilHuBERT-based audio classification model was fine-tuned and integrated into a user-friendly Electron desktop application. The system was evaluated using standard ML metrics, latency measurements, and usability testing. The model achieved 91.3% accuracy and an AUC of 0.81, with high precision and recall for several categories (including 808, brass, growl, reese, and slap), while acid showed occasional misclassifications. For 100 predictions, the longest measured latency was 6.17 seconds, and user testing indicated the tool was effective and accessible. The work shows that an AI-driven sample manager tailored to bass design taxonomy can enhance sample organization and retrieval, providing a solid foundation for future improvements.
[This summary has been generated with the help of AI directly from the project (PDF)]
Keywords
Documents
