AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


SignPredict: A machine learning approach to gesture recognition

Author

Term

4. term

Education

Publication year

2024

Submitted on

Pages

18

Abstract

Omtrent 5% af verdens befolkning har høretab, og andelen forventes at fordobles inden for 30 år. Døve møder ofte kommunikationsbarrierer i hverdagen. Dette speciale undersøger, om maskinlæringsmodeller kan forudsige tegnsprogsgester. Tre tilgange afprøves og samles i en prototype kaldet SignPredict. Kernen er at bruge computervision til at finde 21 nøglepunkter på hånden og bruge deres koordinater over tid som input til en Keras Sequential-model med en LSTM (en type RNN til sekvensdata), der forudsiger gester i dansk tegnsprog. Målet er at levere en baseline, som fremtidig udvikling kan bygge videre på. Undervejs blev Oracles Quality of Service-kriterier brugt til at fastlægge succeskriterier, og udviklingen fulgte en agil proces i fire faser. Alle modeller blev benchmarked på et datasæt med standardmål som nøjagtighed, loss, præcision, recall og F1-score samt korrekt klassifikation af gester. Resultaterne viser, at modeller trænet udelukkende på koordinatdata klarede sig godt og levede op til branchestandard. Modeller trænet på lineære grafer eller på en kombination af grafer og koordinater klarede sig dårligt: De gav svage målinger og kunne ikke forudsige mere end én af tre klasser pålideligt. Ved at bruge koordinater som primære input til en Sequential LSTM-model var det muligt at genkende enkeltstående gester og skelne mellem lignende gester. Den samlede API med offentlige endpoints havde dog uacceptabel svartid under høj belastning og kræver en større refaktorering med fokus på optimering.

Around 5% of the world’s population lives with hearing loss, a share expected to double within 30 years. Deaf people often face daily communication barriers. This thesis examines whether machine learning models can predict sign language gestures. Three approaches are tested and combined in a prototype called SignPredict. The core idea is to use computer vision to detect 21 key points on the hand and feed their coordinates over time into a Keras Sequential model with an LSTM (a type of recurrent neural network for sequence data) to predict gestures in Danish Sign Language. The goal is to provide a baseline that future work can build on. Development used Oracle’s Quality of Service criteria to define success and followed an agile process in four phases. All models were benchmarked on a dataset using standard metrics such as accuracy, loss, precision, recall, and F1-score, as well as whether the correct gesture label was predicted. Results show that models trained only on coordinate data performed well and met industry standards. Models trained on linear graphs, or on a combination of graphs and coordinates, performed poorly: they produced weak metrics and could not reliably predict more than one of three labels. Using coordinates as the primary input to a Sequential LSTM model made it possible to recognize single gestures and distinguish between similar ones. However, the bundled API that exposes all functionality via public endpoints had unacceptable response times under higher load and requires a major refactor focused on optimization.

[This summary has been rewritten with the help of AI based on the project's original abstract]