Self-supervised Keyword Spotting using Data2Vec Pretraining

Studenteropgave: Kandidatspeciale og HD afgangsprojekt

  • Holger Severin Bovbjerg
In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in keyword spotting technology being embedded in a number of technologies such as voice assistants.
Many of these models rely on large amounts of labelled data to achieve good performance. As a result, most models are restricted to applications for which a large speech dataset can be obtained.
Self-supervised learning is a promising area of research which seeks to remove the need for large labelled datasets by leveraging unlabelled data, which is easier to obtain in large amounts.
In this thesis, the use of a newly proposed general self-supervised learning framework called Data2Vec is investigated for increasing the performance of KWS models when only a small amount of labelled data is available.
A transformer based KWS system is implemented and experiments are carried out on a reduced labelled setup of the Google Speech Commands dataset, to test the benefit of Data2Vec pretraining. It is found that models pretrained using Data2Vec greatly outperform the models without Data2Vec pretraining. The results show that the Data2Vec pretraining can be used to increase the performance of KWS models when only a small amount of labelled data is available for training.
Udgivelsesdato2 jun. 2022
Antal sider49
ID: 472010183