31 December 2018 Trajectory-aware three-stream CNN for video action recognition
Zhengkui Weng, Yepeng Guan
Author Affiliations +
Abstract
Video-based human action recognition is a challenging task in computer vision. In recent years, the convolution neural network (CNN) and its extended versions have shown promising results for video action recognition. However, most of the existing methods cannot deal with the global motion information effectively, especially for long-term motion which is crucial to represent complex none-periodic actions. To address this issue, a stacked trajectory energy image (STEI) is proposed by extracting trajectories from motion saliency regions and stacked them onto one grayscale image. This will result in an STEI with discriminative texture feature which can effectively characterize the global motion from multiple consecutive frames. Then, a three-stream CNN framework is proposed to simultaneously capture spatial, temporal, and global motion information of the action from RGB frames, optical flow, and STEI. Moreover, a trajectory-aware convolution strategy is introduced by incorporating local and long-term motion information so as to learn the motion features directly and effectively from three complementary action-related regions. Finally, the learned features are aggregated and categorized by a linear support vector machine. The experimental results on two challenging datasets (i.e., HMDB51 and UCF101) demonstrate that our approach statistically outperforms a number of state-of-the-art methods.
© 2018 SPIE and IS&T 1017-9909/2018/$25.00 © 2018 SPIE and IS&T
Zhengkui Weng and Yepeng Guan "Trajectory-aware three-stream CNN for video action recognition," Journal of Electronic Imaging 28(2), 021004 (31 December 2018). https://doi.org/10.1117/1.JEI.28.2.021004
Received: 28 May 2018; Accepted: 15 November 2018; Published: 31 December 2018
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Convolution

Optical flow

Feature extraction

RGB color model

Cameras

Motion models

RELATED CONTENT


Back to Top