Pointer-CNN for Visual Question Answering

Authors

Svidt, Jakob ; Jepsen, Jens Søholm

Term

4. term

Education

Software, Master

Publication year

2018

Submitted on

2018-06-15

Pages

Abstract

Visual Question Answering(VQA) is an interesting problem from a research perspective, as it is an intersection of the Computer Vision and Natural Language Processing (NLP) domains. Many recent methods focus on improving features, attention mechanisms and hyper-parameter tuning. Most approaches model the problem with a fixed-sized classifier over the answers. We propose a Pointer-CNN classifier for multiple choice in VQA, which achives state of the art performance on both the VQA v1.0 and reasonable performance on the Visual7W data set. We provide an analysis and discussion of performance of the model on different question categories of VQA v1.0, to identify the shortcomings of our architecture.

Keywords

Visual Question Answering

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Pointer-CNN for Visual Question Answering