18 July 2016 Multiview fusion for activity recognition using deep neural networks
Author Affiliations +
J. of Electronic Imaging, 25(4), 043010 (2016). doi:10.1117/1.JEI.25.4.043010
Convolutional neural networks (ConvNets) coupled with long short term memory (LSTM) networks have been recently shown to be effective for video classification as they combine the automatic feature extraction capabilities of a neural network with additional memory in the temporal domain. This paper shows how multiview fusion can be applied to such a ConvNet LSTM architecture. Two different fusion techniques are presented. The system is first evaluated in the context of a driver activity recognition system using data collected in a multicamera driving simulator. These results show significant improvement in accuracy with multiview fusion and also show that deep learning performs better than a traditional approach using spatiotemporal features even without requiring any background subtraction. The system is also validated on another publicly available multiview action recognition dataset that has 12 action classes and 8 camera views.
© 2016 SPIE and IS&T
Rahul Kavi, Vinod Kulathumani, Fnu Rohit, Vlad Kecojevic, "Multiview fusion for activity recognition using deep neural networks," Journal of Electronic Imaging 25(4), 043010 (18 July 2016). https://doi.org/10.1117/1.JEI.25.4.043010


Back to Top