With the development and application of digital cameras, especially in education, a great number of digital video recordings are produced in classrooms. Taking Beijing Normal University as an example, 3.4 TB of videos are recorded every day in more than 200 classrooms. Such huge data is beneficial for us, computer vision researchers, to automatically recognize students' classroom actions and even evaluate the quality of classroom teaching. To focus action recognition on students, we propose Beijing Normal University Large-scale Classroom Student Action Database version 1.0(BNU-LCSAD) which is the first large-scale classroom student action database for student action recognition and consists of 10 classroom student action classes from digital camera recordings at BNU. We introduce the construct and label Processing of this database in detail. In Addition , we provide baseline of student action recognition results based our new database using C3D network.
Understanding human facial expressions is one of the key steps to achieving human-computer interaction. However, the facial expression is a combination of an expressive component called facial behavior and a neutral component of a person. The most commonly used taxonomy to describe facial behaviors is the Facial Action Coding System (FACS). FACS segments the visible effects of facial muscle activation into 30+ action units (AUs). So, we introduce a method to recognize AUs by extracting information of the expressive component through a de-expression learning procedure, called De-expression Residue Learning (DeRL). Firstly, we train a Generative Adversarial Network named cGAN to filter out the expressive information and generate the corresponding neutral face image. Then, we use the intermediate layers, which contains the action unit information, to recognition AUs. Our work alleviates problems of AUs recognition based on the pixel level difference, which is unreliable due to the variation between images i.e., rotation, translation and lighting condition changes, or the feature level difference, which is also unstable as the expression information may vary according to the identity information. As for experiments, we use the data augmentation method to avoid overfitting and trained deep network to recognition AUs on CK+ datasets. The results reveal that our work achieves more competitive performance than several other popular approaches.
Due to the high demand of deep learning for data quantity, semi-supervised learning (SSL) has a very important application prospect because of its successful use of unlabeled data. Existing SSL algorithms have achieved high accuracy on MINIST, CIFAR-10 and SHVN datasets, and even outperform fully supervised algorithms. However, because the above three datasets have the characteristics of balanced data categories and simple identification tasks which can’t be ignored for classification problems, the SSL algorithm has uncertainties of effectiveness in the case of unbalanced datasets and specific recognition tasks. We analyze the datasets and find that the number of “disgust” in expressions dataset is less than other categories, and so is “discussion” in the classroom action recognition dataset. Therefore, we use a novel SSL model: Deep Co-Training (DCT) model to experiment on the expression recognition database (FER2013), as well as our own classroom student action database (BNU-LCSAD) and analyze the effectiveness of the algorithm in specific application scenarios. Moreover, we use a training strategy of TSA when train our model to solve the problem of being easily overfitting which is more likely to occur when data categories are not balanced. The experimental results prove the effectiveness of the SSL algorithm in practical application and the significance of using TSA.