AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Classification of Percussive Sounds using Machine Learning

Author

Term

4. Term

Publication year

2016

Abstract

This thesis investigates multi-class classification of percussive sounds using convolutional neural networks (CNN). The aim is to design and evaluate network architectures and preprocessing strategies that reliably distinguish different percussive instruments. The work includes a review of relevant CNN approaches and audio-related research, construction of a dataset with 3,713 digital and acoustic recordings, and preprocessing in MATLAB (including quantile-based representations) to generate training examples. A fundamental CNN and several variants were implemented in Python and tested with different activation functions (tanh, ReLU, sigmoid), learning rates, batch sizes, and regularization techniques such as batch normalization and dropout. The proposed fundamental network architecture showed promising results across experiments and remained robust when batch normalization and 50% dropout were applied. Although exact performance metrics are not reported here, the study indicates that a relatively simple, well-prepared CNN can effectively classify percussive sounds, and that common regularization techniques do not degrade its performance.

Dette speciale undersøger flerklasses klassifikation af percussionlyde ved hjælp af konvolutionelle neurale netværk (CNN). Formålet er at udvikle og evaluere netværksarkitekturer og forbehandlingsstrategier, der pålideligt kan skelne mellem forskellige percussive instrumenter. Arbejdet omfatter en gennemgang af relevante CNN-tilgange og lydrelateret forskning, opbygning af et datasæt med 3713 digitale og akustiske optagelser, samt forbehandling i MATLAB (bl.a. kvantilbaserede repræsentationer) og generering af træningseksempler. En grundlæggende CNN og flere varianter er implementeret i Python og afprøvet med forskellige aktiveringsfunktioner (tanh, ReLU, sigmoid), læringsrater, batch-størrelser og reguleringsteknikker som batch-normalisering og dropout. Den foreslåede grundlæggende netværksarkitektur viste lovende resultater på tværs af eksperimenter og forblev robust, også når batch-normalisering og 50 % dropout blev anvendt. Selvom konkrete nøgletal ikke fremgår her, indikerer studiet, at en relativt enkel CNN, korrekt forberedt og tunet, effektivt kan klassificere percussive lyde, og at almindelige reguleringsteknikker ikke forringer ydeevnen.

[This apstract has been generated with the help of AI directly from the project full text]