Author(s)
Term
4. semester
Publication year
2024
Submitted on
2024-05-30
Pages
89 pages
Abstract
Dette arbejde består af et kandidatspeciale i Computer Engineering: AI, Vision \& Sound. Det beskriver det arbejde, der er udført under et semesterophold ved UC San Diego. Det er forskningsorienteret og struktureret i separate dele for at tilpasse sig forskningsflowet i modsætning til en lineær udviklingsproces. Det primære omfang af denne rapport har været undersøgelsen af syntetisk generering af manglende termiske video-frames, generaliserbar klassificering af chaufføraktivitet og registrering af førertræthed, alt sammen inden for konteksten af en bilkabine. Denne rapport inkluderer et forslag til generering af manglende termiske rammer fra RGB, hvor der opnås brugbare resultater ved brug af conditional Generative Adversarial Networks (cGANs). Derudover er der udført et eksperiment med anvendelse af flere kameravinkler i Vision-Language modeller til aktiviteter i bilkabinen, som viser lovende resultater for generaliserbare Vision-Language modeller. Ydermere er der udført en undersøgelse, der benytter Video Transformers i forsøget på at klassificere træthed, som detaljerer den nødvendige videodetalje for opgaven, og inkluderer en ansigtstilpasset version af datasættet UTA-RLDD. Dette arbejde har resulteret i en accepteret artikel til det 35. IEEE Intelligent Vehicles Symposium (IV) samt en accept ved Computer Vision and Pattern Recognition (CVPR) Vision and Language for Autonomous Driving and Robotics Workshop.
This work consists a master's thesis conducted in Computer Engineering: AI, Vision \& Sound. It describes the work conducted during a semester abroad at UC San Diego. It is research-oriented and is therefore structured in separate parts to align with the flow of research in contrast to a linear development flow. The primary scope of this study has been the exploration of synthetic generation of missing thermal video frames, generalizable driver activity classification and driver drowsiness detection, all within the context of a car cabin. This work includes a proposal for generating missing thermal frames from RGB, achieving well-performing results through the use of conditional Generative Adversarial Networks (cGANs). Additionally, an experiment utilizing multiple camera angles in Vision-Language models for in-car-cabin activities have been conducted, showcasing promising results for generalizable Vision-Language models. Furthermore, a study utilizing Video Transformers in the pursuit of drowsiness classification has been conducted detailing the accuracy of video detail needed for the task, and includes a custom facial video cropped version of the UTA-RLDD. This work has resulted in the acceptance of a paper for the 35th IEEE Intelligent Vehicles Symposium (IV) as well as an acceptance at the Computer Vision and Pattern Recognizion (CVPR) Vision and Language for Autonomous Driving and Robotics Workshop.
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.