Paper
29 October 2018 Spatial-temporal attention in Bi-LSTM networks based on multiple features for video captioning
Chu-yi Li, Wei-yu Yu
Author Affiliations +
Proceedings Volume 10836, 2018 International Conference on Image and Video Processing, and Artificial Intelligence; 1083616 (2018) https://doi.org/10.1117/12.2514651
Event: 2018 International Conference on Image, Video Processing and Artificial Intelligence, 2018, Shanghai, China
Abstract
Automatically generating rich natural language descriptions for open-domain videos is among the most challenging tasks of computer vision, natural language processing and machine learning. Based on the general approach of encoder-decoder frameworks, we propose a bidirectional long short-term memory network with spatial-temporal attention based on multiple features of objects, activities and scenes, which can learn valuable and complementary high-level visual representations, and dynamically focus on the most important context information of diverse frames within different subsets of videos. From the experimental results, our proposed methods achieve competitive or better than state-of-the-art performance on the MSVD video dataset.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chu-yi Li and Wei-yu Yu "Spatial-temporal attention in Bi-LSTM networks based on multiple features for video captioning", Proc. SPIE 10836, 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083616 (29 October 2018); https://doi.org/10.1117/12.2514651
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication and 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Data modeling

RGB color model

Video processing

Performance modeling

Neural networks

Computer programming

RELATED CONTENT

Attention based CNN-LSTM network for video caption
Proceedings of SPIE (November 10 2022)
Reversible compression of a video sequence
Proceedings of SPIE (September 16 1994)

Back to Top