We advocate a model to effectively detect salient objects in various videos; the proposed framework [spatiotemporal saliency and coherency, (STSC)] consists of two modules, for capturing the spatiotemporal saliency and the temporal coherency information in the superpixel domain, respectively. We first extract the most straightforward gradient contrasts (such as the color gradient and motion gradient) as the low-level features to compute the high-level spatiotemporal gradient features, and the spatiotemporal saliency is obtained by computing the average weighted geodesic distance among these features. The temporal coherency, which is measured by the motion entropy, is then used to eliminate the false foreground superpixels that result from inaccurate optical flow and confusable appearance. Finally, the two discriminative video saliency indicators are combined to identify the salient regions. Extensive quantitative and qualitative experiments on four public datasets (FBMS, DAVIS, SegtrackV2, and ViSal dataset) demonstrate the superiority of the proposed method over the current state-of-the-art methods.
In cross-view action recognition, there remains a challenge that the action representation will lack the ability of transfer learning when the feature space changes. To solve this problem, a cross-view action recognition approach using a bilayer discriminative model is proposed. We first extract the key poses to capture the essence of each action sequence and represent each key pose by a bag of visual words (BoVW) in a single view. We then construct a bipartite graph between the heterogeneous poses and apply multipartitioning to cocluster the view-dependent visual words for developing the cross view bags of visual words feature, which is more discriminative in the presence of view changes. The novelty is to design a bilayer classifier consisting of SVM and HMM at the frame level and sequence level, respectively, to make up for the loss of temporal information when using a BoVW to represent the whole action sequence. Finally, DTW is used as a pruning algorithm to lessen the number of nodes for searching the Viterbi path. Extensive experiments are performed on two well-known multiple view action datasets IXMAS and N-UCLA, and a detailed performance comparison with the existing view-invariant action recognition techniques indicates that the proposed method works equally well in accuracy and efficiency.
As most countries are facing the growing population of seniors, automatic detection for abnormal behaviors has been a promising goal for a vision system operating in supportive home environment. In this paper, we investigate a novel approach for fall detection which is frequently observed in elderly people motions using a panorama camera mounting on the ceiling, we employ and modify a combination of two different features representing fall events: optical flow and human shape variation, which allows fall detection conducted from coarse to fine. In the pre-processing step, we analysis the raw video data to extract the meaningful motion region,then we designed an energy function as representing phase and magnitude of optical flow vector for the coarse detection in temporal domain, where the information entropy is adopted as the abnormal coefficient to estimate the consistency of motion directions. Once the optical flow changes abnormal, shape context descriptor is introduced to do the template matching for the fine detection, here we propose a novel shape matching descriptor which improves the rotation invariance based on the traditional shape context, while remaining its tolerance to most shape distortion. Our method is evaluated on a panorama-view fall detection database including fall events and confounding events, we demonstrate more effective performance and less computational costs on the fall detection regardless of challenging conditions and encourage the potential use of a vision-based system to provide safety and security in the homes of the elderly.