Facial expression recognition (FER) is crucial for understanding and assessing human emotional states. However, in practical applications, due to the complexity and diversity of facial expressions, traditional self-supervised contrastive learning methods are often difficult to extract fine-grained expression features. To address this problem, we propose an attention-guided self-supervised distilled contrastive learning method for FER, which transfers the expression differential information learned by the teacher network to the student network by introducing attention-guided knowledge distillation in self-supervised contrastive learning. Specifically, we propose attention-guided joint feature distillation to strengthen the feature representation capability of the student network by guiding the student network through feature learning with joint attention-weighted features and comparison query vectors. In addition, to further utilize the key information in the teacher’s features, the facial key feature guidance is also proposed to make the student more focused on learning the key features extracted from the teacher’s network. These advances lead to significant performance improvements, showcasing the robustness of our method. Our method obtains excellent results of 76.83% on the Real-world Affective Face Database and 62.04% on the FER-2013 datasets, respectively, demonstrating its effectiveness in capturing subtle emotional expressions and advancing the field of self-supervised FER.
Weakly supervised semantic segmentation (WSSS) using only image-level labels is a challenging task. Most existing methods utilize class activation map (CAM) to generate pixel-level pseudo labels for supervised training. However, the gap between classification and segmentation hinders the network from obtaining more comprehensive semantic information and generating more accurate pseudo masks for segmentation. To address this issue, we propose TSD-CAM, a transformer-based self distillation (SD) method that utilizes CAM similarity. TSD-CAM uses the similarity between CAMs generated from different views as a distillation target, providing additional supervision for the network and narrowing the gap between classification and segmentation. SD supervision allows the network to acquire more semantic information and refine CAMs to generate higher precision pseudo-labels. In addition, we propose the adaptive pixel refinement module, which adaptively refines and adjusts images based on pixel variations, further improving the precision of pseudo labels. Our method is a fully end-to-end single-stage approach that achieves state-of-the-art 71.3% mIoU on PASCAL VOC 2012 and 42.9% mIoU on the MS COCO 2014 dataset, and the proposed TSD-CAM can significantly outperform other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Meanwhile, the effectiveness of our method is demonstrated by a large number of ablation experiments, and we provide a new way of thinking to solve the problems of WSSS. Our code is available at: https://github.com/pipizhum/TSD-CAM.
A self-supervised mixed comparison recommendation method based on meta-learning (MSHCL) is proposed to address the problem of poor recommendation accuracy in social recommendation algorithms. Collaborative filtering based on graph neural networks can model the inter-node interactions between users and items, and make effective use of higherorder neighbor information. However, its representation is very susceptible to interaction noise, and thus the great potential of node-level information is not well utilized. To address the above issues, We first learn the embedding of nodes through the network view and meta-path view to fully capture the heterogeneous network structure. In addition, self-supervised learning as a novel learning method using unlabeled data effectively mitigates the data sparsity problem, which in turn fuses self-supervised learning into hybrid contrast learning model training. We have done empirical and ablation studies on a real dataset to demonstrate that the MSHCL model outperforms the current mainstream methods.
Currently, researches on content based image retrieval mainly focus on robust feature extraction. However, due to the exponential growth of online images, it is necessary to consider searching among large scale images, which is very timeconsuming and unscalable. Hence, we need to pay much attention to the efficiency of image retrieval. In this paper, we propose a feature hashing method for image retrieval which not only generates compact fingerprint for image representation, but also prevents huge semantic loss during the process of hashing. To generate the fingerprint, an objective function of semantic loss is constructed and minimized, which combine the influence of both the neighborhood structure of feature data and mapping error. Since the machine learning based hashing effectively preserves neighborhood structure of data, it yields visual words with strong discriminability. Furthermore, the generated binary codes leads image representation building to be of low-complexity, making it efficient and scalable to large scale databases. Experimental results show good performance of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.