Recent years, the related research on facial expression recognition is increasingly attracting people’s attention. As an important branch of artificial intelligence, facial expression recognition has a significant role in human-computer interaction and other related fields. However, due to the high similarity between different expressions and the small interclass differences, it is still a daunting task for researchers to obtain a robust and accurate expression recognition model. Considering the characteristics of human expressions, a new convolution mechanism is proposed, which we call the attention convolution module. We also apply the convolution module to the field of expression recognition and propose a new expression recognition network (ATB-NET). The experimental results show that our method can focus on relatively important regions in the face, besides, compared with previous work, it also has a significant improvement on multiple public data sets (such as FER2013, CK+).
KEYWORDS: Data modeling, Machine learning, Education and training, Performance modeling, Transformers, Industry, Semantics, Deep learning, Systems modeling, Human computer interaction
Artificial intelligence is currently in an era of change, not only changing the artificial intelligence technology itself but also changing human society. It has become more and more common to use artificial intelligence as the core human-computer interaction technology to replace manpower. Intention recognition is an important part of the human-machine dialogue system, and deep learning technology is gradually being applied to the task of intent recognition. However, intent recognition based on deep learning often has problems such as low recognition accuracy and slow recognition speed. In response to these problems, this paper designs a BERT fine-tuning to improve the network structure based on the pre-training model and proposes new continuous pre-training goals. To improve the accuracy of intent recognition, a method based on multi-teacher model compression is proposed to compress the pre-training model, which reduces the time consumption of model inference.
In human communication, besides direct verbal speech, facial expressions are also the main method to convey inner thoughts and emotions. By analyzing facial expressions, it is possible to obtain information about human inner emotions. Applying expression analysis algorithms in social robots can help robots accurately understand users' emotions and intentions, which in turn leads to better human-computer interaction.Therefore, in this paper, an effective & efficient expression recognition method is designed. The network uses the Ghost Module as the core module, and the lightweight attention module is used to increase the accuracy of expression recognition. In addition, the network is trained with a distributed label loss in order to solve the problems of insufficient data and long-tail of the facial expression dataset. Experiments prove that the network is superior, performs well on the RAF-DB dataset, with 86.8% accuracy without pretraining and has a fast processing speed. We also designed a real-time expression analysis system with this network as the backbone to simulate the actual working scenario of the robot, and found superior results to satisfy the functions of real-time human-computer interaction and efficient expression recognition.
Identifying the property of sandstone slice has important directive to exploration of oil and gas resources. The traditional sandstone slice identification method based on manual observation, which is time-consuming and laborintensive. In recent years, the Convolutional Neural Networks (CNNs) have achieved excellent results in the field of image recognition and a CNN-supported recognition algorithm has practical value in various application. However, there are multiple challenges for using the classic CNN for the sandstone slice image recognition: the industrialized recognition needs efficiency, which means the sandstone slice image recognition should be finished accurately and quickly; the sandstone image recognition requires expert annotations, resulting in lack of data; some mineral particles in sandstone are uncommon, which caused the sandstone image dataset has unbalanced data distribution problem. We invited several geologists to annotate sandstone slice image and start experiments. For sandstone slice image recognition, in order to ensure recognition speed and accuracy of sandstone slice, we built the Res2DwNet model. The model is optimized for the characteristics of the sandstone slice image. In this model, we proposed the Res2Dw module, which combines deep separable convolution, dense connection and residual learning ideas. Combining geology expertise, the sandstone slice image in our model is divided into two types of single-polarized and cross-polarized images for processing, which helps the recognition accuracy improves significantly. To solve the problem of small amount of data, a unique data enhancement method is designed on the basis of the experience of sandstone slice identification. To handle the uneven data distribution problem, the class-balanced softmax loss function is used during training. Compared with other classic Convolutional Neural Network models, the Res2DwNet model has a deeper network structure, which can achieve higher recognition accuracy than models of the same magnitude while maintaining lightweight parameters, and is suitable for sandstone slice image recognition tasks.
Object detection is one of the basic problems in computer vision. Currently, the detection models based on fully supervised learning which demand fine labeled data such as bounding box annotated images are the mainstream of this research field. However, obtaining high-precision tagged images usually costs huge time and human labor. To lighten the restriction for training data of detection models, we propose an attention-based weakly supervised object detection model which can be trained only using image-level annotated images. The weakly supervised object detection model consists of two stages. In the first stage, an attention-based convolutional neural network (CNN) is designed to enhance the localization ability of CNN and generate coarse detection results. In the second stage, a neural network for edge detection is utilized to get the fine results based on the coarse results in stage one. Tested on PascalVOC 2007 and 2012, the proposed weakly supervised learning detection model achieves 53.4mAP and 48.9mAP in these two datasets, respectively, which is competitive with the state-of-the-art weakly supervised learning detection models and reduces the gap with the fully supervised learning detection models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.