Presentation + Paper
7 June 2024 Toward quantifying the real-versus-synthetic imagery training data reality gap: analysis and practical applications
Colin N. Reinhardt, Sarah Brockman, Rusty Blue, Brian Clipp, Anthony Hoogs
Author Affiliations +
Abstract
Synthetically-generated imagery holds the promise of being a panacea for the challenges of real world datasets. Yet it continues to be frequently observed that deep learning model performance is not as good when trained with synthetic data versus real measured imagery. In this study we present analyses and illustration of the use of several statistical metrics, measures, and visualization tools based on the distance and similarity between real and synthetic data empirical distributions in the latent feature embedding space, which provide a quantitative understanding of the relevant image-domain distribution discrepancy issues hampering the generation of performant simulated datasets. We also demonstrate the practical applications of these tools and techniques in a novel study comparing latent space embedding vector distributions of real, pristine synthetic, and synthetic modified by physics-based degradation models. The results may assist deep learning practitioners and synthetic imagery modelers with evaluating latent space embedding distributional dissimilarity and improving model performance when using simulation tools to generate synthetic imagery training data.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Colin N. Reinhardt, Sarah Brockman, Rusty Blue, Brian Clipp, and Anthony Hoogs "Toward quantifying the real-versus-synthetic imagery training data reality gap: analysis and practical applications", Proc. SPIE 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350W (7 June 2024); https://doi.org/10.1117/12.3026764
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Education and training

Field emission displays

Deep learning

RGB color model

Visualization

Performance modeling

Back to Top