Real-time implementation considerations of a deep learning based Voice Activity Detection
Studenteropgave: Kandidatspeciale og HD afgangsprojekt
- Claus Meyer Larsen
4. semester, Signalbehandling og Beregning (cand.polyt.), Kandidat (Kandidatuddannelse)
In this thesis carried out in collaboration with RTX we consider a deep learning based method for Voice Activity Detection. In this work we investigate the potential of this method to be used in a real-time application on an embedded device.
Towards achieving this we work with three research questions that are aiming to increase the performance of the Voice Activity Detection, lower the algorithmic delay and finally consider methods making it more suitable for implementation on a resource constrained device.
As part of this work is submitted a paper to \textit{Interspeech 2022} which proposes a method for increasing the Voice Activity Detection performance and reducing the algorithmic delay. The performance is increased by introducing adversarial multi-task learning during training and the algorithmic delay is lowered by reducing the filter sizes of the network. Reducing the algorithmic delay leads to a small performance degradation.
Afterwards is considered pruning and quantization in the use-case of this project. Finally it is discussed on which hardware architectures this algorithm is best suited for an implementation based on the aforementioned optimisations.
Towards achieving this we work with three research questions that are aiming to increase the performance of the Voice Activity Detection, lower the algorithmic delay and finally consider methods making it more suitable for implementation on a resource constrained device.
As part of this work is submitted a paper to \textit{Interspeech 2022} which proposes a method for increasing the Voice Activity Detection performance and reducing the algorithmic delay. The performance is increased by introducing adversarial multi-task learning during training and the algorithmic delay is lowered by reducing the filter sizes of the network. Reducing the algorithmic delay leads to a small performance degradation.
Afterwards is considered pruning and quantization in the use-case of this project. Finally it is discussed on which hardware architectures this algorithm is best suited for an implementation based on the aforementioned optimisations.
Sprog | Engelsk |
---|---|
Udgivelsesdato | 15 jun. 2022 |
Antal sider | 100 |
Ekstern samarbejdspartner | RTX A/S Project Engineer Sebastian Schiøler sbs@rtx.dk Anden |