Face recognition has received substantial attention for a long time. Many typical methods have been proposed to perform face recognition.12.3.–4 Since Wright et al. presented the sparse representation-based classification (SRC) method,5 it has been widely studied in many pattern recognition applications due to its promising results, such as face recognition,6,7 along with gender,8 digit,9,10 biology data,11,12 and medical image13,14 classification. Although many improved SRC-based methods have been proposed for robust face recognition,1516.17.18.–19 most of them require rigid image alignment, where all images of an object or objects of interest are aligned to a fixed canonical template. Until now, much work has been performed to address the alignment problem.20,21 However, such alignment is still difficult to achieve in real scenarios, such as partial face, scale, or pose variation face recognition. To address the alignment problem in SRC, some methods22,23,24 introduced the scale-invariant feature transform (SIFT)22 or the speeded-up robust features25 descriptor to the recognition method. However, most of these methods pay little attention to the correlation among the query descriptors, which is found to be useful for classification. Thus, it is necessary to study a method exploiting the correlation of the query descriptors for robust alignment-free face recognition, which is the focus of this paper.
Supposing that one image is represented with a set of SIFT descriptors22 which are robust to handle scale variations and rotation, the SIFT-based method can solve the problem of alignment. The simple matching method22 obtains the identification for each query descriptor separately according to the best matching. By voting with the separate results, the final identification is determined. Rather than matching, Liao and Jain presented a multikeypoint descriptors-based SRC (MKD-SRC) method.23 The method solves the sparse representation (SR) problem for each query descriptor separately and determines the image identification using all of the reconstruction residuals. By exploring the discrimination of the atoms in a SIFT dictionary, Sun et al. proposed a clustering-weighted SIFT-based classification method via SR24 and obtained better robustness for alignment-free face recognition with sufficient samples.
After analyzing these methods, we find that the above-described methods treat each query descriptor independently and equally. For each query descriptor, there may be some similar atoms distributing in different classes in the dictionary, which will influence the classification performance.24 Therefore, if we solve the SR problem for each descriptor extracted from a query image separately, some false identities may be obtained, resulting in errors in the final image classification. As a result, it is beneficial to solve the SR problem simultaneously for all query descriptors by their correlation. To handle this problem, the concept of the joint sparse representation (JSR) is introduced.
In this paper, we propose a weighted joint sparse representation-based classification (WJSRC) method. There are three contributions in our work. (1) For exploration of the correlation among query descriptors, the concept of JSR is introduced. (2) Considering the reliability of the query descriptors, a modified JSR model with a weighted sparsity constraint is introduced. (3) A WJSRC algorithm is proposed to solve the modified model. Because the proposed method studies the correlation among the query descriptors and their own reliability, the performance of the alignment-free recognition is improved.
The remainder of this paper is organized as follows. In Sec. 2, we review the JSR algorithm. Section 3 proposes the WJSRC method. The experimental results using the proposed method on the Yale database,26 the Olivetti Research Laboratory database,27 and the AR database (The AR database is a public face database created by Aleix Martneza and Robert Benavente.)28 are described in Sec. 4. The conclusions are presented in Sec. 5.
Joint Sparse Representation
The original SRC method5 solves the SR problem for query descriptors separately. To explore the correlation among the query descriptors, the JSR is introduced.
As far as we know, there are two types of JSR methods. (1) The first group of JSR methods utilizes multiple types of keypoint features and dictionaries.2930.–31 If we extract the shape, color, and texture features from a face image, which are different from each other, it is necessary to construct a single dictionary for each type of feature. Thus, three dictionaries are obtained. For a test image, multiple types of query descriptors should be extracted, each of which can be just sparsely represented by its corresponding dictionary. However, the SR for all query descriptors should be performed under the JSR constraint. The workflow is shown in Fig. 1(a). (2) The second group of JSR methods utilizes multiple keypoint features and a single dictionary,3233.–34 whose workflow is shown in Fig. 1(b). This method supposes that multiview sample images for an object are obtained, and the queries of the object are also multiview images. With the same type of features, a single dictionary is constructed. Because the query images are similar, just one atom is selected from the dictionary to represent them at each iteration step of the atom selection process. After adequate iteration, one set of features from the dictionary can be selected to represent all query images simultaneously under the joint representation constraint.34 Thus, the sparse coefficients share the same sparsity pattern at the atom level,34 but the coefficient value is different, which is illustrated in Fig. 2(b). This method is different from the original SRC,5 which solves the SR problem for each view image separately. The sparse coefficients’ sparsity and value in the SRC method are different from each other, which are depicted in Fig. 2(a). In real scenarios, the multiview images of an object may not be well represented by the same features. In Refs. 32 and 33, Zhang et al. optimized it by proposing a joint dynamic sparse representation (JDSR) method, which chooses different features from the same class to represent each view image at each iteration step of the atom selection process. The sparse coefficients share the same sparsity pattern at the class level, but not at the atom level, which is depicted in Fig. 2(c).
Although the problem of face recognition with SIFT descriptors belongs to the second type, the existing methods cannot solve it perfectly for three reasons. (1) The query descriptors are quite different from each other, for example, the descriptors extracted from an eye are different from those of a mouth, which is obviously different from the characteristic of the query features in Refs. 32 to 34. (2) The number of query descriptors is often large, typically in the hundreds; as a result, the query descriptors are challenging to join. (3) Not all of the query descriptors contain correct identity information in practice. For example, the descriptors extracted from the occluded module cannot be treated equally to those from the clear module. The unreliable descriptors will mislead the JSR, which has been verified by experiments in Ref. 32. As a result, a more robust method is required to solve our problem.
Given samples collected from classes, the SIFT descriptors extracted from the samples of the ’th class constructed the ’th sub-dictionary , where denotes the quantity of the descriptors of the ’th class, and is the SIFT descriptor. All of the sub-dictionaries are pooled together to construct the dictionary , where denotes the quantity of descriptors in all classes. Given a query image , it can be represented by a set of SIFT descriptors, i.e., , where is called a query descriptor.
If a query image belongs to one of the given classes, then the query descriptors extracted from it can be well represented by the ones of the corresponding class. Because the SIFT descriptors are discriminative, for example, the descriptors of an eye are different from those of a mouth, they should be represented by different atoms of the same class, i.e., the sparse coefficients share the same sparsity pattern at the class level32,33 but not at the atom level.34 As mentioned above, the query descriptors should not be treated equally. Thus, we proposed a model of WJSR, whose mathematical model is1), is not only a measurement of the reliability for the ’th query descriptor but also a balance factor for the residuals. To obtain the whole minimal residuals, we must make the residuals of the larger weight query descriptors smaller, i.e., the representation of the more reliable descriptors must be a more accurate approximation because they contain the correct classification information. Thus, in Algorithm 1, the reliable query descriptors lead the atoms’ selection.
The weighted joint sparse representation based classification (WJSRC).
|1. Input: multi query-descriptor matrix Y, dictionary D, weight vector W, sparsity level K, the number of query vector m and class T, the residual threshold r0.|
|2. Initialize:R←Y, I←∅, k←0|
|While∑i=1mwi‖ri‖2/∑i=1mwi<r0 or k>KDo|
|Ptm×Nt←find(P,t) %get the product value in the t'th class for all query vectors|
|[Inew(t,i),Pmax(t,i)]=max Pt(i,:) %find the max value and its index for the i'th query vector in the t'th class|
|V(t)←∑i[Pmax(t,i)wi]2 %incorporate the weight into the atoms selection.|
|index←max(V) %find the best cluster of atoms belonging to the same class across all classes|
|I=[I;Inew(index,:)] %update the index matrix|
|X(Ii,i)←[D(:,Ii)TD(:,Ii)]−1D(:,Ii)TY(:,i) %calculate the sparse coefficient for the ith query vector|
|R=Y−DX % update the residual|
|3. Output: the sparse coefficient matrix X|
WJSR-Based Classification Algorithm
Calculating the weight for each query descriptor
For classification, not all query descriptors contribute equally. In this paper, we measure the importance of each query descriptor by the similarity between the query descriptor and the dictionary , i.e., , and then the weight of can be defined as
Solving the WJSR problem
Solving Eq. (1) is an NP-hard problem due to the mixed-norm minimization with a weighted joint sparsity constraint. In this paper, we propose a greedy algorithm, i.e., the WJSRC algorithm, to solve this problem, which is described in detail in Algorithm 1. The algorithm is similar to the orthogonal matching pursuit algorithm,35 with a major difference in the atom selection criteria. In the WJSRC algorithm, the most relevant set of atoms belonging to the same class is selected at each iteration step in the atom selection process. To minimize the whole residuals, we propose the weighted atoms selection criteria, which is automatically led by the larger weight descriptors.
The proposed WJSRC method is summarized as follows:
i) Extract the SIFT descriptors from the sample and query images to construct the dictionary and the multi query-descriptor matrix , respectively.
ii) Calculate the weight for each query descriptor using equation (2) to form the weight vector .
iii) Solve the WJSR for all of the query descriptors using Algorithm 1 to obtain the sparse coefficient matrix .
Determine the identity of the query image using Eq. (3).
In this section, we present the performance of the “proposed method” on three public databases: (1) the Yale database,26 (2) the Olivetti Research Laboratory (ORL) database,27 and (3) the AR database.28 We focus on three scenarios of alignment-free face recognition: (1) arbitrary patches of holistic faces; (2) faces with arbitrary pose and expression variations; and (3) faces with occlusions. A performance comparison among the related methods, namely the SIFT matching,22 MKD-SRC,23 CWS-SRC,24 JDSRC,33 and the original SRC methods,5 is conducted. The three experiments are performed on gray images. The SIFT descriptors extracted from images are of dimension 128.
Determination of the Parameters
In the experiments, one parameter should be set manually, i.e., the sparsity , which is the number of iterations in Algorithm 1. At each iteration step in the atom selection process, one set of descriptors is selected to represent the query descriptors. Therefore, with the increase in , the representation for most of the query descriptors becomes more approximate. To ascertain the relationship between the recognition performance and the sparsity , we examined different values of on the ORL database and evaluated the resulting performance in terms of the accuracy. The curve is depicted in Fig. 3, which shows that when is greater than 7, the recognition accuracy is stable, i.e., the approximation is adequate. Therefore, is set as 7, which has been proven to also be suitable for other databases.
Partial Face Recognition with an Arbitrary Patch
This experiment is conducted on the Yale database, which consists of 165 frontal face images of 15 subjects with an image size of . Two, three and four images (per subject) are randomly selected as samples. From each of the remaining images in the three settings, one patch of random size at a random position is cropped as a query, where and are randomly selected from (100,160) and (80,110). The queries are all partial faces. Some examples of the sample and query images are shown in Fig. 4.
For each experimental setting, we use 10 random splits of the data in the experiment. The average results are presented in Table 1. Because the original SRC method is not suitable in this scenario, the other five algorithms are compared. The descriptors extracted from a query partial face are relatively insufficient, and the classification information is limited. Thus, it is necessary to join all of the query descriptors by their correlation. The results in Table 1 show that the WJSRC method achieves the highest recognition rate in the three settings, which indicates the validity and advantage of the proposed method in the scenario of incomplete classification information.
The average recognition performance of the partial faces.
Face Recognition with Pose and Expression Variation
This experiment is conducted on the ORL database, which contains 400 images of 40 subjects with different expressions, frontal poses, and slight scale variations. We randomly selected two, three, four, and five images from each subject as samples and the remaining as queries. Some examples of the sample and query images are shown in Fig. 5. For each experimental setting, we use 10 random splits of the data in the experiment. The average results are presented in Table 2.
The average recognition performance on the ORL database.
In this experiment, the recognition rate of the proposed WJSRC method is found to be outstanding. The original SRC method does not work ideally due to the alignment problem. As the database exhibits great changes in pose and expressions and the dictionary does not have sufficient samples, there are many unreliable query descriptors. Because the proposed method considers the query descriptors holistically and joints all of the reliable ones, it achieves a better performance than the others.
Holistic Face Recognition with Occlusion
This experiment is conducted on the AR database. The AR database contains 120 subjects, including 65 males and 55 females. For each subject, 26 images were taken in two sessions, of which 14 images are nonoccluded and the remainder are occluded by various objects, such as scarves and sunglasses. Experiments are performed on the images of two separate sessions. We selected nonoccluded face images in one session as the samples for each subject. The remaining occluded face images in that session are selected as the queries. Therefore, there are 840 samples and 720 queries in each experimental setting. All images were cropped to . No alignment was performed between the queries and the samples. Some examples of the samples and the queries are shown in Fig. 6.
The recognition performance is presented in Table 3. The performance of the proposed method is found to be outstanding. The MKD-SRC and CWS-SRC methods also work well. As is known, most of the SIFT descriptors extracted from the occluded module in a face image are not reliable. Compared to the results of the JDSRC method, we can see that consideration of the reliability of the query descriptors is practical. Thus, the calculation of the weight of the query descriptors is important. Analyzing the misrecognition images by WJSRC, we find that our method works poorly for the face images containing too many descriptors, especially in the case where most of these descriptors are unreliable. Our future work will focus on this issue of face images with too many descriptors.
The recognition performance on the AR database.
|Experiment on session 1||78%||43%||94.5%||95%||85.2%||96.7%|
|Experiment on session 2||81.3%||46%||97.3%||97.3%||87.7%||98%|
Conclusion and Future Work
In this work, a novel framework for robust alignment-free face recognition was proposed. The approach studies the reliability of the query descriptors holistically and utilizes the correlation among them. We demonstrated promising experimental results on images of partial faces, occluded faces, and faces with variations due to different poses and expressions. Comparison of the proposed algorithm with the related algorithms indicated that the proposed method is more robust for alignment-free scenarios. Meanwhile, some methods may be used to improve the robustness, such as optimizing the approach of weight calculation, which will be studied in the future.
This work was supported by the Fundamental Research Funds for the Central Universities (2014KJJCA15), the State Key Laboratory of Acoustics, Chinese Academy of Sciences (SKLA201304), the National Natural Science Foundation of China (61431004), and the Fundamental Research Funds for the Central Universities (2013NT55). We thank Prof. Xiaoming Zhu for helping us revise the organizational structure and grammar issues of the paper.
X. D. Jiang, “Asymmetric principal component and discriminant analyses for pattern classification,” IEEE Trans. Patt. Anal. Mach. Intell. 31(5), 931–937 (2009).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.2008.258Google Scholar
B. HeiseleP. HoT. Poggio, “Face recognition with support vector machines: Global versus component-based approach,” in Eighth IEEE International Conference on Computer Vision, 2001. ICCV 2001. Proceedings, Vol. 2, pp. 688–694 (2001).http://dx.doi.org/10.1109/ICCV.2001.937693Google Scholar
I. NaseemR. TogneriM. Bennamoun, “Linear regression for face recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 2106–2112 (2010).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.2010.128Google Scholar
J. Wrightet al., “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.2008.79Google Scholar
R. Heet al., “Two-stage nonnegative sparse representation for large-scale face recognition,” IEEE Trans. Neural Netw. Learn. Syst. 24(1), 35–46 (2013).1045-9227http://dx.doi.org/10.1109/TNNLS.2012.2226471Google Scholar
J. HuangX. HuangD. Metaxas, “Simultaneous image transformation and sparse representation recovery,” in IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2008, pp. 1–8 (2008).http://dx.doi.org/10.1109/CVPR.2008.4587640Google Scholar
R. KhorsandiM. Abdel-Mottaleb, “Gender classification using 2-D ear images and sparse representation,” in 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp. 461–466 (2013).http://dx.doi.org/10.1109/WACV.2013.6475055Google Scholar
I. RamirezP. SprechmannG. Sapiro, “Classification and clustering via dictionary learning with structured incoherence and shared features,” in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3501–3508 (2010).http://dx.doi.org/10.1109/CVPR.2010.5539964Google Scholar
J. YangK. YuT. Huang, “Supervised translation-invariant sparse coding,” in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3517–3524 (2010).http://dx.doi.org/10.1109/CVPR.2010.5539958Google Scholar
H. Caoet al., “Classification of multicolor fluorescence in situ hybridization (M-FISH) images with sparse representation,” IEEE Trans. Nanobiosci. 11(2), 111–118 (2012).ITMCEL1536-1241http://dx.doi.org/10.1109/TNB.2012.2189414Google Scholar
Y. LiA. Ngom, “Fast sparse representation approaches for the classification of high-dimensional biological data,” in IEEE Int. Conf. Bioinformatics and Biomedicine, pp. 1–6 (2012).http://dx.doi.org/10.1109/BIBM.2012.6392688Google Scholar
A. JulazadehJ. AlirezaieP. Babyn, “A novel automated approach for segmenting lateral ventricle in MR images of the brain using sparse representation classification and dictionary learning,” in 11th Int. Conf. Information Science, Signal Processing and their Applications, pp. 888–893 (2012).http://dx.doi.org/10.1109/ISSPA.2012.6310680Google Scholar
M. Xuet al., “Tumor classification via sparse representation based on metasample,” in 2th Int. Symposium on Knowledge Acquisition and Modeling, pp. 31–34 (2009).http://dx.doi.org/10.1109/KAM.2009.310Google Scholar
J. LaiX. Jiang, “Modular weighted global sparse representation for robust face recognition,” IEEE Signal Process. Lett. 19(9), 571–574 (2012).IESPEJ1070-9908http://dx.doi.org/10.1109/LSP.2012.2207112Google Scholar
Y. ChenT. DoT. Tran, “Robust face recognition using locally adaptive sparse representation,” in 17th IEEE Int. Conf. Image Processing (ICIP), pp. 1657–1660 (2010).http://dx.doi.org/10.1109/ICIP.2010.5652203Google Scholar
J. Wagneret al., “Toward a practical face recognition system: Robust registration and illumination by sparse representation,” in Conference on IEEE Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 597–604 (2009).http://dx.doi.org/10.1109/CVPR.2009.5206654Google Scholar
M. Coxet al., “Least squares congealing for unsupervised alignment of images,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (2008).http://dx.doi.org/10.1109/CVPR.2008.4587573Google Scholar
E. Learned-Miller, “Data driven image models through continuous joint alignment,” Pattern Anal. Mach. Intell. 28(2), 236–250 (2006).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.2006.34Google Scholar
G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision 60, 91–110 (2004).IJCVEQ0920-5691http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94Google Scholar
S. LiaoA. K. Jain, “Partial face recognition: an alignment free approach,” in 2011 International Joint Conference on Biometrics (IJCB), pp. 1–8 (2011).http://dx.doi.org/10.1109/IJCB.2011.6117573Google Scholar
B. SunF. XuJ. He, “Clustering-weighted SIFT-based classification method via sparse representation,” J. Electron. Imaging 23(4), 043007 (2014).JEIME51017-9909http://dx.doi.org/10.1117/1.JEI.23.4.043007Google Scholar
P. N. BelhumeurJ. P. HespanhaD. J. Kriegman, “Eigenfaces vs. Fisherfaces: recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997).ITPIDJ0162-8828http://dx.doi.org/10.1109/34.598228Google Scholar
F. S. SamariaA. C. Harter, “Parameterisation of a stochastic model for human face identification,” in Proceedings of the Second IEEE Workshop on Applications of Computer Vision, 1994, pp. 138–142 (1994).http://dx.doi.org/10.1109/ACV.1994.341300Google Scholar
A. MartinezR. Benavente, “The AR face database, Technical report,” CVC Technical Report (1998).Google Scholar
X.-T. YuanX. LiuS. Yan, “Visual classification with multitask joint sparse representation,” IEEE Trans. Image Process. 21(10), 4349–4360 (2012).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2012.2205006Google Scholar
H. Namet al., “Robust multi-sensor classification via joint sparse representation,” in 2011 Proceedings of the 14th International Conference on Information Fusion (FUSION), pp. 1–8 (2011).Google Scholar
S. Shekharet al., “Joint sparse representation for robust multimodal biometrics recognition,” IEEE Trans. Pattern Anal. Mach Intell. 36(1), 113–126 (2014).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.2013.109Google Scholar
H. Zhanget al., “Multi-observation visual recognition via joint dynamic sparse representation,” in 2011 IEEE International Conference on Computer Vision (ICCV), pp. 595–602 (2011).http://dx.doi.org/10.1109/ICCV.2011.6126293Google Scholar
H. Zhanget al., “Multi-view face recognition via joint dynamic sparse representation,” in 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3025–3028 (2011).http://dx.doi.org/10.1109/ICIP.2011.6116301Google Scholar
H. Zhanget al., “Multi-view automatic target recognition using joint sparse representation,” IEEE Trans. Aerosp. Electron. Syst. 48(3), 2481–2497 (2012).IEARAX0018-9251http://dx.doi.org/10.1109/TAES.2012.6237604Google Scholar
Y. C. PatiR. RezaiifarP. S. Krishnaprasad, “Orthogonal matching pursuits: Recursive function approximation with applications to wavelet decomposition,” in 1993 Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, Vol. 1, pp. 40–44(1993).http://dx.doi.org/10.1109/ACSSC.1993.342465Google Scholar
Bo Sun received a BSc in computer science from Beihang University, China, and MSc and PhD degrees from Beijing Normal University, China. He is currently a professor in the Department of Computer Science and Technology at Beijing Normal University. His research interests include pattern recognition, natural language processing, and information systems. He is a member of ACM and a senior member of the China Society of Image and Graphics.
Feng Xu received a BSc in electronic science and technology from Beijing Normal University in 2009. He is currently working toward the MSc degree in computer application technology at Beijing Normal University. His research interests include pattern recognition and signal processing.
Guoyan Zhou received a BSc in computer science and technology from Beijing Normal University, 2009. She is currently working toward the MSc degree in Computer Application Technology at Beijing Normal University. Her research interest includes signal processing.
Jun He received a BSc in optical engineering and a PhD in physical electronics from Beijing Institute of Technology, China, in 1998 and 2003, respectively. Since 2003, she has been with the College of Information Science and Technology of Beijing Normal University, China. She was elected as a lecturer and an assistant professor in 2003 and 2010, respectively. Her research interests include image processing application and pattern recognition.
Fengxiang Ge received the PhD degree in communication and information systems from Tsinghua University in 2003. From 2003 to 2005, he was a postdoctoral research associate at the University of Hong Kong. In 2005, he joined Intel Corporation as a senior researcher and an architect. In November 2011, he joined the College of Information Science and Technology, Beijing Normal University, China. His research interests include signal processing and its applications.