Human action recognition is one of the raison d’être for doing human–computer interaction research, as it is highly vital in meeting the demands of modern society, such as automatic video surveillance for security, patient monitoring for recovery, content-based video retrieval, etc. In line with this, deep learning systems are fast becoming the defacto standard for object recognition, video understanding, and pattern recognition due to their inherent powerful feature learning ability from vast amount of data. It makes sense to capitalize on its great success and to further improve it for the complex task of action recognition. One of the contributions in this paper is an effective and yet simple method for encoding the spatiotemporal information from skeleton sequences into what we call temporal kinematic images. In the input encoding scheme, we embed various geometric relational features derived from the skeleton sequence in the form of our proposed skeletal optical flows (SOFs). SOFs collectively represent the variations of kinetic energy, angles between limbs, and pair-wise displacements between joints over consecutive frames of skeleton data, as color variations in the temporal kinematic images. Another contribution is our convolutional neural network with a correctness-vigilant regularizer. It is employed to exploit the discriminative features from the temporal kinematic image for human action recognition. Lastly, we additionally investigated an adaptive label smoothing technique employed toward the end of training iterations. Empirical results show that the efficiency of the proposed method is superior to existing works in terms of the generalizability of the generated model, training convergence speed, and the resulting classification accuracy on nine popular benchmarking datasets, such as MHAD, MSR Activity 3D, HDM05, MSR Daily Activity 3D, and the latest challenging databases, such as UTKinect-Action, NTU RGB+D, Northwestern-UCLA, UWA3DII, and SBU Kinect Interaction datasets.
This paper proposes a simple and accurate estimation of the additive white Gaussian noise for the noise-contaminated digital images. One can easily estimate the noise level through singular value decomposition (SVD) to the noise-polluted image if an image is deteriorated by the additive white Gaussian noise. As described in the paper, the sum of some specific singular values has the linear relationship with the standard deviation of noise. Based on no correlation between noises, we add known noises upon a noise image. Then noise level is estimated by solving a nonlinear over-determined matrix equation. The proposed algorithm was experimentally tested by the benchmark images and outperforms estimation method of selecting weak textured patches using principal component analysis (PCA). The proposed method is more independent on the original image information and presents a higher accuracy and a stronger robustness for a range of noise level in various images.