10 April 2018 Visual question answering using hierarchical dynamic memory networks
Author Affiliations +
Proceedings Volume 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017); 106153V (2018) https://doi.org/10.1117/12.2302484
Event: Ninth International Conference on Graphic and Image Processing, 2017, Qingdao, China
Abstract
Visual Question Answering (VQA) is one of the most popular research fields in machine learning which aims to let the computer learn to answer natural language questions with images. In this paper, we propose a new method called hierarchical dynamic memory networks (HDMN), which takes both question attention and visual attention into consideration impressed by Co-Attention method, which is the best (or among the best) algorithm for now. Additionally, we use bi-directional LSTMs, which have a better capability to remain more information from the question and image, to replace the old unit so that we can capture information from both past and future sentences to be used. Then we rebuild the hierarchical architecture for not only question attention but also visual attention. What’s more, we accelerate the algorithm via a new technic called Batch Normalization which helps the network converge more quickly than other algorithms. The experimental result shows that our model improves the state of the art on the large COCO-QA dataset, compared with other methods.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jiayu Shang, Jiayu Shang, Shiren Li, Shiren Li, Zhikui Duan, Zhikui Duan, Junwei Huang, Junwei Huang, } "Visual question answering using hierarchical dynamic memory networks", Proc. SPIE 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017), 106153V (10 April 2018); doi: 10.1117/12.2302484; https://doi.org/10.1117/12.2302484
PROCEEDINGS
9 PAGES


SHARE
Back to Top