8 March 2018 Deep visual-semantic for crowded video understanding
Author Affiliations +
Proceedings Volume 10609, MIPPR 2017: Pattern Recognition and Computer Vision; 106091E (2018) https://doi.org/10.1117/12.2285848
Event: Tenth International Symposium on Multispectral Image Processing and Pattern Recognition (MIPPR2017), 2017, Xiangyang, China
Abstract
Visual-semantic features play a vital role for crowded video understanding. Convolutional Neural Networks (CNNs) have experienced a significant breakthrough in learning representations from images. However, the learning of visualsemantic features, and how it can be effectively extracted for video analysis, still remains a challenging task. In this study, we propose a novel visual-semantic method to capture both appearance and dynamic representations. In particular, we propose a spatial context method, based on the fractional Fisher vector (FV) encoding on CNN features, which can be regarded as our main contribution. In addition, to capture temporal context information, we also applied fractional encoding method on dynamic images. Experimental results on the WWW crowed video dataset demonstrate that the proposed method outperform the state of the art.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chunhua Deng, Chunhua Deng, Junwen Zhang, Junwen Zhang, } " Deep visual-semantic for crowded video understanding", Proc. SPIE 10609, MIPPR 2017: Pattern Recognition and Computer Vision, 106091E (8 March 2018); doi: 10.1117/12.2285848; https://doi.org/10.1117/12.2285848
PROCEEDINGS
5 PAGES


SHARE
RELATED CONTENT


Back to Top