Synthesizing Images to Recognize Natural Images with Transfer Learning in Convolutional Neural Networks
Author
Ylla, Ernest Bofill
Term
4. term
Education
Publication year
2017
Submitted on
2017-01-26
Pages
51
Abstract
Konvolutionelle neurale netværk (CNN'er) er i front inden for billedgenkendelse takket være fremskridt i modeldesign, herunder Inception-familien. Sådanne modeller kræver normalt meget store mængder mærkede billeder, som ofte er svære at skaffe. En mulig løsning er at generere syntetiske billeder ud fra 3D-modeller, så store datasæt kan skabes hurtigt og automatisk. Dette speciale undersøger, om en Inception-V3-CNN kan blive bedre, når et lille datasæt af naturlige fotos suppleres med et stort datasæt af renderede billeder, og modellen finjusteres med transfer learning. Til forsøget blev der opbygget to datasæt med LEGO-klodser: et stort syntetisk datasæt renderet fra en 3D-model og et lille datasæt af virkelige fotografier. Klassifikationsnøjagtigheden blev sammenlignet med og uden de syntetiske billeder. I denne opsætning forringede de syntetiske billeder resultatet: 82% uden syntetisk supplering mod 68% med. Dette fund kan ikke generaliseres, bl.a. fordi tydelige forskelle mellem de syntetiske og naturlige billeder kan have gjort genkendelsen vanskeligere. Alligevel er syntetiske datasæt lovende, når det er svært at samle virkelige billeder. Fremtidigt arbejde bør undersøge, hvordan forbedringer i renderingsprocessen påvirker billedgenkendelse.
Convolutional neural networks (CNNs) now lead image recognition thanks to advances in model design, including the Inception family. Training these models typically requires very large collections of labeled images, which are often hard to obtain. One proposed workaround is to render synthetic images from 3D models, producing large datasets quickly and automatically. This thesis examines whether adding a large set of rendered images to a small set of natural photos can improve performance when an Inception-V3 CNN is retrained with transfer learning. To test this, two datasets of LEGO bricks were created: a large synthetic dataset rendered from a 3D LEGO model and a small dataset of real photographs. The model’s classification accuracy was compared with and without the synthetic images. In this setup, supplementing the photos with synthetic images reduced accuracy: 82% without synthetic augmentation versus 68% with it. This result should not be over-generalized, as clear differences between the synthetic and natural images may have made recognition harder. Even so, synthetic datasets remain promising when real images are difficult to collect. Future work should investigate how improvements in the rendering process affect image recognition.
[This abstract was generated with the help of AI]
Keywords
Documents
