8 March 2019 Spatiotemporal visual-semantic embedding network for zero-shot action recognition
Rongqiao An, Zhenjiang Miao, Qingyu Li, Wanru Xu, Qiang Zhang
Author Affiliations +
Abstract
Zero-shot learning (ZSL) has recently attracted increasing attention in visual tasks like action recognition. We propose a spatiotemporal visual-semantic embedding network (STVSEM) for zero-shot action recognition. First, given the fact that two-stream architecture based action recognition algorithms have achieved excellent results in recent years, the module is assembled to our designed network by simultaneously using the spatial features (e.g., RGB appearance) and optical flow in time domain as visual features to significantly improve the visual expression capability. Then, in order to slightly alleviate the problem of semantic loss that typically occurs in the case of using embedding-based ZSL methods, an autoencoder is introduced to get a better semantic representation and complement semantic relationship information for unseen classes by seen classes. Last but not least, a joint embedding mechanism that explores and exploits the relationships of the visual data and semantic information in an intermediate space is employed to ameliorate the gap between vision and semantics. The experimental results on Charades and UCF101 datasets indicate that the proposed method outperforms the state-of-the-art methods in accuracy, which further demonstrates the effectiveness of our method.
© 2019 SPIE and IS&T 1017-9909/2019/$25.00 © 2019 SPIE and IS&T
Rongqiao An, Zhenjiang Miao, Qingyu Li, Wanru Xu, and Qiang Zhang "Spatiotemporal visual-semantic embedding network for zero-shot action recognition," Journal of Electronic Imaging 28(2), 023007 (8 March 2019). https://doi.org/10.1117/1.JEI.28.2.023007
Received: 10 November 2018; Accepted: 15 February 2019; Published: 8 March 2019
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

RGB color model

Visual process modeling

Data modeling

Digital signal processing

Classification systems

Video

RELATED CONTENT

The application of transfer learning for scene recognition
Proceedings of SPIE (November 27 2019)
Validity-weighted model vector-based retrieval of video
Proceedings of SPIE (December 18 2003)
Imperfect learning for autonomous concept modeling
Proceedings of SPIE (January 17 2005)

Back to Top