Real-time implementation considerations of a deep learning based Voice Activity Detection

Author

Larsen, Claus Meyer

Term

4. term

Education

Signal Processing and Computing, Master

Publication year

2022

Submitted on

2022-05-31

Pages

100

Abstract

In this thesis carried out in collaboration with RTX we consider a deep learning based method for Voice Activity Detection. In this work we investigate the potential of this method to be used in a real-time application on an embedded device. Towards achieving this we work with three research questions that are aiming to increase the performance of the Voice Activity Detection, lower the algorithmic delay and finally consider methods making it more suitable for implementation on a resource constrained device. As part of this work is submitted a paper to \textit{Interspeech 2022} which proposes a method for increasing the Voice Activity Detection performance and reducing the algorithmic delay. The performance is increased by introducing adversarial multi-task learning during training and the algorithmic delay is lowered by reducing the filter sizes of the network. Reducing the algorithmic delay leads to a small performance degradation. Afterwards is considered pruning and quantization in the use-case of this project. Finally it is discussed on which hardware architectures this algorithm is best suited for an implementation based on the aforementioned optimisations.

Keywords

Voice Activity Detection ; Deep learning ; CNN ; Adversarial multi-task learning ; Algorithmic delay ; Quantisation ; Pruning ; Interspeech 2022 ; Machine learning

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Real-time implementation considerations of a deep learning based Voice Activity Detection