Active VisLED: Active Vision-Language Embedded Diversity Querying for 3D Object Detection

Author

Antoniussen, Bjørk

Term

4. semester

Education

Artificial Intelligence, Vision and Sound, MSc.

Publication year

2024

Submitted on

2024-05-31

Pages

Abstract

This thesis presents the development and evaluation of the VisLED-Querying method for 3D object detection, with a particular focus on applications in autonomous driving. By leveraging Vision-Language Embedding Diversity Querying (VisLED-Querying), the study aims to achieve detection performance equivalent to using the full training set, while using only up to 50 \% of the dataset, hence reducing the need for extensive labeling. The VisLED-Querying method integrates active learning strategies to select diverse and informative data samples from an unlabeled pool, thereby improving the model’s ability to detect underrepresented or novel objects. This approach is evaluated in two scenarios: Open-World Exploring (OWE) and Closed-World Mining (CWM). Using the nuScenes dataset, the study shows that VisLED-Querying achieves high performance with significantly reduced data. The method reaches performance levels close to a full dataset, even with only 50 \% of the data pool. This demonstrates VisLED-Querying's potential to reduce labeling costs and enhance model efficiency, making it valuable for real-world autonomous driving systems. The findings indicate that diversity-based active learning methods, like VisLED-Querying, can lead to more accurate and cost-effective 3D object detection models, advancing autonomous vehicle technologies and other domains requiring robust object detection.

Keywords

Active Learning ; Multi-Modal Models ; nuScenes Dataset ; Diversity-Based Algorithm Creation

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Active VisLED: Active Vision-Language Embedded Diversity Querying for 3D Object Detection