Pointer-CNN for Visual Question Answering
Studenteropgave: Kandidatspeciale og HD afgangsprojekt
- Jakob Svidt
- Jens Søholm Jepsen
4. semester, Software, Kandidat (Kandidatuddannelse)
Visual Question Answering(VQA) is an interesting problem from a research perspective, as it is an intersection of the Computer Vision and Natural Language Processing (NLP) domains.
Many recent methods focus on improving features, attention mechanisms and hyper-parameter tuning. Most approaches model the problem with a fixed-sized classifier over the answers.
We propose a Pointer-CNN classifier for multiple choice in VQA, which achives state of the art performance on both the VQA v1.0 and reasonable performance on the Visual7W data set. We provide an analysis and discussion of performance of the model on different question categories of VQA v1.0, to identify the shortcomings of our architecture.
Many recent methods focus on improving features, attention mechanisms and hyper-parameter tuning. Most approaches model the problem with a fixed-sized classifier over the answers.
We propose a Pointer-CNN classifier for multiple choice in VQA, which achives state of the art performance on both the VQA v1.0 and reasonable performance on the Visual7W data set. We provide an analysis and discussion of performance of the model on different question categories of VQA v1.0, to identify the shortcomings of our architecture.
Sprog | Engelsk |
---|---|
Udgivelsesdato | 15 jun. 2018 |
Antal sider | 28 |
Emneord | Visual Question Answering |
---|