6 July 2022 Transformer-based image captioning by leveraging sentence information
Vahid Chahkandi, Mohammad Javad Fadaeieslam, Farzin Yaghmaee
Author Affiliations +
Abstract

Although the autoregressive image captioning methods yield good-quality image descriptions, their sequential structures slow down the speed of sentence generation processes. With a view to overcome these shortcomings, some nonautoregressive models have been proposed, but the quality of sentences produced by them is lower than those obtained in autoregressive methods. We have designed a new structure based on nonautoregressive methods to not only find better relations between sentence words and image salient objects but also combine this information with some positional information, extracted from the sentence, to generate a more qualified target sentence. The experimental results on the standard benchmark show that our proposed model achieves performance better than general nonautoregressive captioning models.

© 2022 SPIE and IS&T 1017-9909/2022/$28.00 © 2022 SPIE and IS&T
Vahid Chahkandi, Mohammad Javad Fadaeieslam, and Farzin Yaghmaee "Transformer-based image captioning by leveraging sentence information," Journal of Electronic Imaging 31(4), 043005 (6 July 2022). https://doi.org/10.1117/1.JEI.31.4.043005
Received: 4 February 2022; Accepted: 20 June 2022; Published: 6 July 2022
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Computer programming

Autoregressive models

Liquid crystals

Visualization

Image processing

Transformers

Performance modeling

Back to Top