Self-supervised Keyword Spotting using Data2Vec Pretraining

Author

Bovbjerg, Holger Severin

Term

4. term

Education

Signal Processing and Computing, Master

Publication year

2022

Submitted on

2022-06-02

Pages

Abstract

In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in keyword spotting technology being embedded in a number of technologies such as voice assistants. Many of these models rely on large amounts of labelled data to achieve good performance. As a result, most models are restricted to applications for which a large speech dataset can be obtained. Self-supervised learning is a promising area of research which seeks to remove the need for large labelled datasets by leveraging unlabelled data, which is easier to obtain in large amounts. In this thesis, the use of a newly proposed general self-supervised learning framework called Data2Vec is investigated for increasing the performance of KWS models when only a small amount of labelled data is available. A transformer based KWS system is implemented and experiments are carried out on a reduced labelled setup of the Google Speech Commands dataset, to test the benefit of Data2Vec pretraining. It is found that models pretrained using Data2Vec greatly outperform the models without Data2Vec pretraining. The results show that the Data2Vec pretraining can be used to increase the performance of KWS models when only a small amount of labelled data is available for training.

Keywords

deep learning ; self-supervised ; keyword spotting ; transformer

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Self-supervised Keyword Spotting using Data2Vec Pretraining