Author(s)
Term
4. term
Education
Publication year
2022
Submitted on
2022-05-31
Pages
100 pages
Abstract
In this thesis carried out in collaboration with RTX we consider a deep learning based method for Voice Activity Detection. In this work we investigate the potential of this method to be used in a real-time application on an embedded device. Towards achieving this we work with three research questions that are aiming to increase the performance of the Voice Activity Detection, lower the algorithmic delay and finally consider methods making it more suitable for implementation on a resource constrained device. As part of this work is submitted a paper to \textit{Interspeech 2022} which proposes a method for increasing the Voice Activity Detection performance and reducing the algorithmic delay. The performance is increased by introducing adversarial multi-task learning during training and the algorithmic delay is lowered by reducing the filter sizes of the network. Reducing the algorithmic delay leads to a small performance degradation. Afterwards is considered pruning and quantization in the use-case of this project. Finally it is discussed on which hardware architectures this algorithm is best suited for an implementation based on the aforementioned optimisations.
Keywords
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.