Classification of Percussive Sounds using Machine Learning
Author
Gajhede, Nicolai
Term
4. Term
Education
Publication year
2016
Abstract
This thesis investigates multi-class classification of percussive sounds using convolutional neural networks (CNN). The aim is to design and evaluate network architectures and preprocessing strategies that reliably distinguish different percussive instruments. The work includes a review of relevant CNN approaches and audio-related research, construction of a dataset with 3,713 digital and acoustic recordings, and preprocessing in MATLAB (including quantile-based representations) to generate training examples. A fundamental CNN and several variants were implemented in Python and tested with different activation functions (tanh, ReLU, sigmoid), learning rates, batch sizes, and regularization techniques such as batch normalization and dropout. The proposed fundamental network architecture showed promising results across experiments and remained robust when batch normalization and 50% dropout were applied. Although exact performance metrics are not reported here, the study indicates that a relatively simple, well-prepared CNN can effectively classify percussive sounds, and that common regularization techniques do not degrade its performance.
Dette speciale undersøger flerklasses klassifikation af percussionlyde ved hjælp af konvolutionelle neurale netværk (CNN). Formålet er at udvikle og evaluere netværksarkitekturer og forbehandlingsstrategier, der pålideligt kan skelne mellem forskellige percussive instrumenter. Arbejdet omfatter en gennemgang af relevante CNN-tilgange og lydrelateret forskning, opbygning af et datasæt med 3713 digitale og akustiske optagelser, samt forbehandling i MATLAB (bl.a. kvantilbaserede repræsentationer) og generering af træningseksempler. En grundlæggende CNN og flere varianter er implementeret i Python og afprøvet med forskellige aktiveringsfunktioner (tanh, ReLU, sigmoid), læringsrater, batch-størrelser og reguleringsteknikker som batch-normalisering og dropout. Den foreslåede grundlæggende netværksarkitektur viste lovende resultater på tværs af eksperimenter og forblev robust, også når batch-normalisering og 50 % dropout blev anvendt. Selvom konkrete nøgletal ikke fremgår her, indikerer studiet, at en relativt enkel CNN, korrekt forberedt og tunet, effektivt kan klassificere percussive lyde, og at almindelige reguleringsteknikker ikke forringer ydeevnen.
[This apstract has been generated with the help of AI directly from the project full text]
