Author(s)
Term
4. semester
Education
Publication year
2025
Submitted on
2025-06-04
Pages
73 pages
Abstract
This study aims to address the task of recognizing pedestrian-to-driver navigation gestures in a zero-shot setting, enabling safe decision-making even in conflicting scenarios. Navigation gestures are a daily routine in driving to make it safe for all. Gesture in conflict is more of an edge case, but these situations can also be critical, making gesture recognition and decision-making essential. Recognizing pedestrians' gestures is a significant aspect of the study. This led to the development of enhancement methods Supplementary Body Description with VLM and Pose Projection and evaluation methods Classification, Natural-language, and Reconstruction of VLMs in this domain. Alongside, three datasets were created with annotations: Acted Traffic Gesture (ATG), Instructive Traffic Gesture In-The-Wild (ITGI), and Acted Conflicting Authorities & Navigation Gestures (Act-CANG). Across three VLMs, initial results were poor across all three evaluation domains. VideoLLaMA3, with and without enhancements, achieved F1-scores between 0.02 and 0.06 in classification. These results highlight the current limitations of VLMs in accurately recognizing pedestrian navigation gestures. This underscores the need for further research, either through fine-tuning or alternative approaches.
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.