AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Improving camera motion classification for undersea coral videos

Term

4. term

Publication year

2024

Submitted on

Pages

14

Abstract

The health of the planets oceans is facing a rapid decline, particularly the world's coral reefs have seen significant reduction since 2009. 3D reconstructions of coral structures are vital methods for quantifying and monitoring the health of coral reefs but such methods often require professionally obtained footage to be viable. However, there are great amounts of amateur footage available online which might be viable for use in 3D reconstruction but identifying it is a time consuming task, as coral structures require views from more than one angle. We therefore propose a model which might bridge a gap between public footage and scientific research by identifying sections of public videos which might be relevant for 3D reconstruction. In this work we present a model which identifies and isolates the desired camera motion by extracting motion vectors from video footage and converting them to HSI color images which are applied to a Swin transformer model. In order to train and validate this model we expanded upon a benchmark dataset containing data amateur footage for coral 3D reconstruction. In order to validate our model, it is tested against two other approaches. A Convolutional Neural Network (CNN) model also trained and validated upon HSI color images from vector and a Heuristic model applied to motion vectors. The CNN model and Heuristic model both performed poorly with an F1 score of 0.11 and 0.16 respectively. In contrast, Swin transformer outperformed these approaches by scoring 0.19. However, simply applying the Swin transformer without data augmentation performed the best with a score of 0.26. The HSI Swin transformer performed significantly better on the validation set, meaning the approach might be prone to over-fitting, or causes information loss for the model.

The health of the planets oceans is facing a rapid decline, particularly the world's coral reefs have seen significant reduction since 2009. 3D reconstructions of coral structures are vital methods for quantifying and monitoring the health of coral reefs but such methods often require professionally obtained footage to be viable. However, there are great amounts of amateur footage available online which might be viable for use in 3D reconstruction but identifying it is a time consuming task, as coral structures require views from more than one angle. We therefore propose a model which might bridge a gap between public footage and scientific research by identifying sections of public videos which might be relevant for 3D reconstruction. In this work we present a model which identifies and isolates the desired camera motion by extracting motion vectors from video footage and converting them to HSI color images which are applied to a Swin transformer model. In order to train and validate this model we expanded upon a benchmark dataset containing data amateur footage for coral 3D reconstruction. In order to validate our model, it is tested against two other approaches. A Convolutional Neural Network (CNN) model also trained and validated upon HSI color images from vector and a Heuristic model applied to motion vectors. The CNN model and Heuristic model both performed poorly with an F1 score of 0.11 and 0.16 respectively. In contrast, Swin transformer outperformed these approaches by scoring 0.19. However, simply applying the Swin transformer without data augmentation performed the best with a score of 0.26. The HSI Swin transformer performed significantly better on the validation set, meaning the approach might be prone to over-fitting, or causes information loss for the model.