Author(s)
Term
4. term
Education
Publication year
2022
Submitted on
2022-06-02
Pages
49 pages
Abstract
In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in keyword spotting technology being embedded in a number of technologies such as voice assistants. Many of these models rely on large amounts of labelled data to achieve good performance. As a result, most models are restricted to applications for which a large speech dataset can be obtained. Self-supervised learning is a promising area of research which seeks to remove the need for large labelled datasets by leveraging unlabelled data, which is easier to obtain in large amounts. In this thesis, the use of a newly proposed general self-supervised learning framework called Data2Vec is investigated for increasing the performance of KWS models when only a small amount of labelled data is available. A transformer based KWS system is implemented and experiments are carried out on a reduced labelled setup of the Google Speech Commands dataset, to test the benefit of Data2Vec pretraining. It is found that models pretrained using Data2Vec greatly outperform the models without Data2Vec pretraining. The results show that the Data2Vec pretraining can be used to increase the performance of KWS models when only a small amount of labelled data is available for training.
Keywords
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.