In this paper, we propose a novel method to recognize two-person interactions through a two-phase sparse coding approach. In the first phase, we adopt the non-negative sparse coding on the spatio-temporal interest points (STIPs) extracted from videos, and then construct the feature vector for each video by sum-pooling and l<sub>2</sub>-normalization. At the second stage, we apply the label-consistent KSVD (LC-KSVD) algorithm on the video feature vectors to train a new dictionary. The algorithm has been validated on the TV human interaction dataset, and the experimental results show that the classification performance is considerably improved compared with the standard bag-of-words approach and the single layer non-negative sparse coding.
In this paper we propose a novel method to recognize different types of two-person interactions through multi-view surveillance
cameras. From the bird-eye view, proxemics cues are exploited to segment the duration of the interaction, while from
the lateral view the corresponding interaction intervals are extracted. The classification is achieved by applying a <i>visual
bag-of-words</i> approach, which is used to train a liner multi-class SVM classifier. We test our method on the UNITN social
interaction dataset. Experimental results show that using the temporal segmentation can improve the classification