With the development of depth sensors, i.e. Kinect, it is now possible to predict human body poses from a depthmap without any manual labeling. The predicted poses can be used as meaningful features for many applications such as human action recognition. However, existing pose estimation algorithms are not perfect, which can seriously affect the performance of its following applications. In this paper, we propose a novel method to detect erroneous poses. Human poses are captured by Kinect SDK which predicts body joints and connects them with straight lines to represent a pose. We observe depth gradient of pixels located on a body part is consistent when the body part is predicted correctly. With this observation, our algorithm examines depth gradients of pixels on each body part. During the depth gradient processing, our algorithm also considers occlusions. Once a sudden change is detected in depth values on a body part, we check whether the gradient is still consistent excluding the sudden change region. We tested our algorithm on many human activities and our experimental results show that our algorithm acceptably detects erroneous poses in real time.
This paper presents a novel approach to represent human actions in a video. Our approach deals with the limitation of local representation, i.e. space-time interest points, which cannot adequately represent actions in a video due to lack of global information about geometric relationships among interest points. It adds the geometric relationships to interest points by clustering interest points using squared Euclidean distances, followed by using a minimum hexahedron to represent each cluster. Within each video, we build a multi-dimensional histogram based on the characteristics of hexahedrons in the video for recognition. The experimental results show that the proposed representation is powerful to include the global information on top of local interest points and it successfully increases the accuracy of action recognition.