10 April 2018 Multi-dimension feature fusion for action recognition
Author Affiliations +
Proceedings Volume 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017); 106151C (2018) https://doi.org/10.1117/12.2302485
Event: Ninth International Conference on Graphic and Image Processing, 2017, Qingdao, China
Abstract
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. The challenge for action recognition is to capture and fuse the multi-dimension information in video data. In order to take into account these characteristics simultaneously, we present a novel method that fuses multiple dimensional features, such as chromatic images, depth and optical flow fields. We built our model based on the multi-stream deep convolutional networks with the help of temporal segment networks and extract discriminative spatial and temporal features by fusing ConvNets towers multi-dimension, in which different feature weights are assigned in order to take full advantage of this multi-dimension information. Our architecture is trained and evaluated on the currently largest and most challenging benchmark NTU RGB-D dataset. The experiments demonstrate that the performance of our method outperforms the state-of-the-art methods.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Pei Dong, Pei Dong, Jie Li, Jie Li, Junyu Dong, Junyu Dong, Lin Qi, Lin Qi, } "Multi-dimension feature fusion for action recognition", Proc. SPIE 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017), 106151C (10 April 2018); doi: 10.1117/12.2302485; https://doi.org/10.1117/12.2302485
PROCEEDINGS
8 PAGES


SHARE
Back to Top