Paper
6 May 2019 When visual object-context features meet generic and specific semantic priors in image captioning
Heng Liu, Chunna Tian, Mengmeng Jiang
Author Affiliations +
Proceedings Volume 11069, Tenth International Conference on Graphics and Image Processing (ICGIP 2018); 110691E (2019) https://doi.org/10.1117/12.2524235
Event: Tenth International Conference on Graphic and Image Processing (ICGIP 2018), 2018, Chengdu, China
Abstract
In this work, we propose a novel encoding-decoding based image captioning framework, which improves the performance by jointly exploring the visual object-context features, generic and specific semantic priors. In the encoding of RNN, we extract the semantic attributes, object-related and scene-related image features first, and then feed them sequentially to the encoder of RNN, which considers the rich general semantic and visual object-context representation of images. To incorporate the testing specific semantic priors in the decoding of RNN, we apply cross-modal retrieval to find the most similar captions of the testing image in the visual-semantic embedding space of VSE++. The BLEU-4 similarity is utilized to evaluate the similarity between the generated sentence and the retrieved captions, which incorporates the sentence-making priors to the testing-specific reference captions. The evaluation on benchmark dataset Microsoft COCO shows the superiority of our algorithm over the state-of-the-art approaches on standard evaluation metrics.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Heng Liu, Chunna Tian, and Mengmeng Jiang "When visual object-context features meet generic and specific semantic priors in image captioning", Proc. SPIE 11069, Tenth International Conference on Graphics and Image Processing (ICGIP 2018), 110691E (6 May 2019); https://doi.org/10.1117/12.2524235
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Computer programming

Feature extraction

Image retrieval

Information visualization

Performance modeling

Sensors

Back to Top