AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


SignPredict: A machine learning approach to gesture recognition

Term

4. term

Education

Publication year

2024

Submitted on

Pages

18

Abstract

Around 5% of the world’s current population has some sort of hearing loss, which is predicted to double over the next 30 years. Being deaf in a hearing society brings communication barriers within their daily life, and affects them in several ways. This research aims to explore the possibilities within sign language prediction using machine intelligence models, through three different approaches, all bundled into a single system called SignPredict. This solution utilizes a Sequential model and predicts using a series of coordinates, representing 21 landmarks located on a hand. The broader aim of this study is to provide a baseline solution, to be further developed in the future, capable of predicting Danish sign language gestures accurately. To secure a baseline to determine the success of the system, Oracle’s Quality of Service criteria have been used throughout development, as fulfilment of these ensures quality within the system. Furthermore, an agile project management approach has been used, splitting the development process into four phases. For the MI models, the Sequential model from Keras has been used, utilizing an LSTM, a type of RNN, to produce a prediction giving a series of data. For data extraction, computer vision and landmark detection have been used. Every model developed has undergone benchmarking, evaluating their accuracy, loss, precision, recall, F1- score, as well as their correctness of predicting gestures on a data set. The models trained solely on coordinate data yielded positive results and adhered to the industry standard. The other models, trained on either linear graphs or a combination of both graph and coordinate data, yielded poor results, entailed both bad metrics and an inability to predict more than one of three labels. Throughout the research and development of this project, it was discovered that through utilizing coordinates as primary data, inputted into a Sequential model, it was possible to predict single gestures and differentiate between similar ones. However, the final bundled API, containing all functionality accessible through public endpoints, proved itself unacceptable in terms of overall response time during higher loads. A major refactor of this API is needed, focusing on optimization.