Traditional person re-identification methods are mostly based on supervised state of manually annotated data sets. Although the recognition accuracy is high, the algorithm completely relies on effective annotation information and lacks practicability. Based on the above problems, an unsupervised person re-identification method is proposed, which refines features to guide multi-label distribution sorting learning. Firstly, a single and multi-label classification network structure is constructed to improve the matching accuracy of tags by learning the characteristics of each person and its k nearest neighbor and sharing the same class. Secondly, the refinement feature extraction module is designed and embedded into the original ResNet50 network, and the potential key areas in the person image are adaptively located. While weakening the appearance differences of the features, multi-information mining is carried out to obtain the refined features of the characters and assist the classification network to improve the performance. Thirdly, reliable multi-category labels are screened by punishing the sorting between positive and negative categories of labels in multi-category labels. Finally, the network is supervised by combining multi-label distribution sorting loss and multi-label classification loss. Experiments show that without any annotation information, the recognition accuracy of Market-1501 and DukeMTMC-reID datasets is 58.4% and 42.3%, respectively, and the mAP of Market-1501 dataset is increased by 12.9%.
The main challenges are the intra-class differences of person images and the cross-modal differences between visible and infrared images for cross-modal person re-identification. How to reduce the cross-modal differences becomes the key to cross-modal person re-identification. In this paper, we propose a hybrid learning strategy using Cross-Entropy loss and weighted squared triplet loss as identity (ID) loss to solve the intra-modal and inter-modal person identity classification problem, while supervising the network to extract more effective modal shared features to form specific feature descriptors. Besides, for cross-modal person image attributes, a data augmentation method of channel-swapped random erasure is used to improve the robustness of the model to color changes, simulating different degrees of image occlusion, reducing the risk of overfitting and further enriching the image diversity. Experimental results on the public dataset SYSU-MM01 demonstrate the effectiveness of the proposed method, with an average accuracy mAP of 60.08% even in the most difficult full-search single-shot mode.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.