Audio Event Classification Using Deep Learning in an End-to-End Approach: -
Author
Diez Antich, Jose Luis
Term
4. Term
Education
Publication year
2017
Submitted on
2017-06-16
Pages
38
Abstract
Dette speciale undersøger opgaven lydhændelsesklassifikation med dybe neurale netværk i en end-to-end tilgang. Opgaven går ud på automatisk at afgøre, hvilke lydkilder fra hverdagsmiljøer der er til stede i en optagelse. Fordi flere lyde ofte forekommer samtidig, er det en multi-label opgave, hvor systemet kan tildele flere mærkater til det samme lydklip. Et velfungerende system kan for eksempel hjælpe brugere af høreapparater med bedre at forstå deres omgivelser og forbedre robotters navigation ved at tolke akustiske signaler i miljøet. I en end-to-end tilgang lærer modellerne direkte fra data i stedet for at bygge på manuelt konstruerede træk (features). Denne idé er for nylig blevet anvendt på lyd med bemærkelsesværdige resultater. Selv om resultaterne i dette speciale ikke overgår standardtilgange, bidrager arbejdet med en udforskning af deep learning-arkitekturer, som giver indsigt i, hvordan netværk behandler lyd i denne opgave.
This thesis examines the task of sound event classification using deep neural networks in an end-to-end approach. The goal is to automatically determine which sound sources from everyday environments are present in a recording. Because several sounds often occur at the same time, this is a multi-label problem in which the system may assign more than one label to a single audio clip. An effective system could, for example, support users of hearing devices in understanding their surroundings and enhance robot navigation by interpreting acoustic signals in the environment. In an end-to-end approach, models learn directly from data rather than relying on hand-engineered features. This idea has recently been applied to audio with notable results. Although the results here do not improve on standard approaches, the thesis contributes an exploration of deep learning architectures that provides insight into how networks process audio for this task.
[This abstract was generated with the help of AI]
Documents
