Evaluating synthetic digital twin data quality: A no-reference approach using a pretrained vision model and a customizable data generator

Authors

Whitehead, Sebastian Mikael Løwe ; Hansen, Rebecca Ryø

Term

4. term

Education

Medialogy, Master

Publication year

2025

Submitted on

2025-05-26

Pages

Abstract

The use of synthetic data is becoming more common as companies realise the workflows it facilitates. This, however, poses a difficult and yet somewhat unanswered question: How can someone, without the use of trial and error model training, evaluate the quality of synthetic data for any given use case? Herein, how can developers evaluate where their development efforts are best dedicated to achieve the highest performance for their efforts. It is these questions that this paper seeks answer, by exploring existing quality evaluation methodologies. Based in those, it is posited that performance of models trained on natural data, reflects the quality of digital twin synthetic data implementation. To evaluate this, a digital twin environment of an open-source vision dataset was created, equipped with a number of degradable parameters, herein: lighting quality, texture resolution and polygon count. 37 test datasets were generated, totalling ~7400 images and tested with a model trained on the natural data. By comparing the performance of the degraded datasets to the highest quality twin dataset, this paper shows a statistically significant decline in performance, indicating that the performance of a model trained on natural data does reflect the quality of said data. This is reflected through further testing, as models fine-tuned on the datasets showed the greatest difference which had orders of magnitude greater performance decline. While there is a significant performance impacts, this work fails to show a significant breakpoint in performance. It is hypothesised that this lack of significant breakpoint was caused by visual class diversity and differences. This is reflected in the fact that when evaluating the performance of individual classes, breakpoints where present across all three degradation types. Due to these findings, this paper concludes that the proposed no-reference methodology shows merit as a means for digital twin quality assessment, potentially enabling developers to direct their efforts, such to enable best performance for the invested resources.

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Evaluating synthetic digital twin data quality: A no-reference approach using a pretrained vision model and a customizable data generator