Weighted joint sparse representation-based classification method for robust alignment-free face recognition

Abstract. This work proposes a weighted joint sparse representation (WJSR)-based classification method for robust alignment-free face recognition, in which an image is represented by a set of scale-invariant feature transform descriptors. The proposed method considers the correlation and the reliability of the query descriptors. The reliability is measured by the similarity information between the query descriptors and the atoms in the dictionary, which is incorporated into the l0∖l2-norm minimization to seek the optimal WJSR. Compared with the related state-of-art methods, the performance is advanced, as verified by the experiments on the benchmark face databases.


Introduction
Face recognition has received substantial attention for a long time.2][3][4] Since Wright et al. presented the sparse representation-based classification (SRC) method, 5 it has been widely studied in many pattern recognition applications due to its promising results, such as face recognition, 6,7 along with gender, 8 digit, 9,10 biology data, 11,12 and medical image 13,14 classification.Although many improved SRC-based methods have been proposed for robust face recognition, [15][16][17][18][19] most of them require rigid image alignment, where all images of an object or objects of interest are aligned to a fixed canonical template.Until now, much work has been performed to address the alignment problem. 20,21However, such alignment is still difficult to achieve in real scenarios, such as partial face, scale, or pose variation face recognition.To address the alignment problem in SRC, some methods 22,23,24 introduced the scale-invariant feature transform (SIFT) 22 or the speededup robust features 25 descriptor to the recognition method.However, most of these methods pay little attention to the correlation among the query descriptors, which is found to be useful for classification.Thus, it is necessary to study a method exploiting the correlation of the query descriptors for robust alignment-free face recognition, which is the focus of this paper.
Supposing that one image is represented with a set of SIFT descriptors 22 which are robust to handle scale variations and rotation, the SIFT-based method can solve the problem of alignment.The simple matching method 22 obtains the identification for each query descriptor separately according to the best matching.By voting with the separate results, the final identification is determined.Rather than matching, Liao and Jain presented a multikeypoint descriptors-based SRC (MKD-SRC) method. 23The method solves the sparse representation (SR) problem for each query descriptor separately and determines the image identification using all of the reconstruction residuals.By exploring the discrimination of the atoms in a SIFT dictionary, Sun et al. proposed a clustering-weighted SIFT-based classification method via SR 24 and obtained better robustness for alignment-free face recognition with sufficient samples.
After analyzing these methods, we find that the abovedescribed methods treat each query descriptor independently and equally.For each query descriptor, there may be some similar atoms distributing in different classes in the dictionary, which will influence the classification performance. 24herefore, if we solve the SR problem for each descriptor extracted from a query image separately, some false identities may be obtained, resulting in errors in the final image classification.As a result, it is beneficial to solve the SR problem simultaneously for all query descriptors by their correlation.To handle this problem, the concept of the joint sparse representation (JSR) is introduced.
In this paper, we propose a weighted joint sparse representation-based classification (WJSRC) method.There are three contributions in our work.(1) For exploration of the correlation among query descriptors, the concept of JSR is introduced.(2) Considering the reliability of the query descriptors, a modified JSR model with a weighted sparsity constraint is introduced.(3) A WJSRC algorithm is proposed to solve the modified model.Because the proposed method studies the correlation among the query descriptors and their own reliability, the performance of the alignment-free recognition is improved.
The remainder of this paper is organized as follows.In Sec. 2, we review the JSR algorithm.Section 3 proposes the WJSRC method.The experimental results using the proposed method on the Yale database, 26 the Olivetti Research Laboratory database, 27 and the AR database (The AR database is a public face database created by Aleix Martneza and Robert Benavente.) 28are described in Sec. 4. The conclusions are presented in Sec. 5.

Joint Sparse Representation
The original SRC method 5 solves the SR problem for query descriptors separately.To explore the correlation among the query descriptors, the JSR is introduced.
As far as we know, there are two types of JSR methods.0][31] If we extract the shape, color, and texture features from a face image, which are different from each other, it is necessary to construct a single dictionary for each type of feature.Thus, three dictionaries are obtained.For a test image, multiple types of query descriptors should be extracted, each of which can be just sparsely represented by its corresponding dictionary.However, the SR for all query descriptors should be performed under the JSR constraint.The workflow is shown in Fig. 1(a).(2) The second group of JSR methods utilizes multiple keypoint features and a single dictionary, [32][33][34] whose workflow is shown in Fig. 1(b).This method supposes that multiview sample images for an object are obtained, and the queries of the object are also multiview images.With the same type of features, a single dictionary is constructed.Because the query images are similar, just one atom is selected from the dictionary to represent them at each iteration step of the atom selection process.After adequate iteration, one set of features from the dictionary can be selected to represent all query images simultaneously under the joint representation constraint. 34Thus, the sparse coefficients share the same sparsity pattern at the atom level, 34 but the coefficient value is different, which is illustrated in Fig. 2(b).This method is different from the original SRC, 5 which solves the SR problem for each view image separately.The sparse coefficients' sparsity and value in the SRC method are different from each other, which are depicted in Fig. 2(a).In real scenarios, the multiview images of an object may not be well represented by the same features.In Refs.Fig. 2 The sparsity pattern of multiple task sparse representation, 23 joint sparse representation 34 and joint dynamic sparse representation. 33 dynamic sparse representation (JDSR) method, which chooses different features from the same class to represent each view image at each iteration step of the atom selection process.The sparse coefficients share the same sparsity pattern at the class level, but not at the atom level, which is depicted in Fig. 2(c).
Although the problem of face recognition with SIFT descriptors belongs to the second type, the existing methods cannot solve it perfectly for three reasons.(1) The query descriptors are quite different from each other, for example, the descriptors extracted from an eye are different from those of a mouth, which is obviously different from the characteristic of the query features in Refs.32 to 34. (2) The number of query descriptors is often large, typically in the hundreds; as a result, the query descriptors are challenging to join.(3) Not all of the query descriptors contain correct identity information in practice.For example, the descriptors extracted from the occluded module cannot be treated equally to those from the clear module.The unreliable descriptors will mislead the JSR, which has been verified by experiments in Ref. 32.As a result, a more robust method is required to solve our problem.

Proposed Method
Given samples collected from c classes, the SIFT descriptors extracted from the samples of the k'th class constructed the k'th sub-dictionary D k ¼ ½d 1 ; d 2 ; : : : ; d N k , where N k denotes the quantity of the descriptors of the k'th class, and d ∈ R N×1 ðN ¼ 128Þ is the SIFT descriptor.All of the sub-dictionaries are pooled together to construct the dictionary D ¼ ½D 1 ; D 2 ; : : : ; D c ¼ ½d 1 ; d 2 ; : : : ; d N , where N ¼ P c k¼1 N k denotes the quantity of descriptors in all classes.Given a query image Y, it can be represented by a set of SIFT descriptors, i.e., Y ¼ ½y 1 ; y 2 ; : : : ; y m , where y i is called a query descriptor.

WJSR Model
If a query image belongs to one of the given classes, then the query descriptors extracted from it can be well represented by the ones of the corresponding class.Because the SIFT descriptors are discriminative, for example, the descriptors of an eye are different from those of a mouth, they should be represented by different atoms of the same class, i.e., the sparse coefficients share the same sparsity pattern at the class level 32,33 but not at the atom level. 34As mentioned above, the query descriptors should not be treated equally.Thus, we proposed a model of WJSR, whose mathematical model is where k • k 0 and k • k 2 denote l 0 -norm and l 2 -norm, respectively, K is the sparsity, and w i is the weight for the classification reliability of the i'th query descriptor.Supposing the coefficients matrix and the nonzero coefficients of the selected atoms matrix x gk ¼ ½x 0 1k ; x 0 2k ; : : : ; x 0 mk represents the k'th selected set, the atoms of which belong to the same class, and X g ¼ ½kx g1 k 2 ; kx g2 k 2 ; : : : T is a matrix constraint term enabling the sparse coefficients to be in line with the weighted joint sparsity pattern.In Eq. ( 1), w i is not only a measurement of the reliability for the i'th query descriptor but also a balance factor for the residuals.To obtain the whole minimal residuals, we must make the residuals of the larger weight query descriptors smaller, i.e., the representation of the more reliable descriptors must be a more accurate approximation because they contain the correct classification information.Thus, in Algorithm 1, the reliable query descriptors lead the atoms' selection.
Algorithm 1 The weighted joint sparse representation based classification (WJSRC).
1. Input: multi query-descriptor matrix Y, dictionary D, weight vector W, sparsity level K , the number of query vector m and class T , the residual threshold r 0 .
2. Initialize: R←Y, I←∅, k ←0 For classification, not all query descriptors contribute equally.In this paper, we measure the importance of each query descriptor by the similarity c i between the query descriptor y i and the dictionary D, i.e., c i ¼ maxðy T i • DÞ, and then the weight of y i can be defined as where c ¼ ð P m i¼1 c i Þ∕m, and c 0 ¼ min fc i g i¼1;: : : ;m is the least similarity value.Then we can construct the weight vector as W ¼ ½w 1 ; w 2 ; : : : ; w m .

Solving the WJSR problem
Solving Eq. ( 1) is an NP-hard problem due to the l 0 \ l 2 mixed-norm minimization with a weighted joint sparsity constraint.In this paper, we propose a greedy algorithm, i.e., the WJSRC algorithm, to solve this problem, which is described in detail in Algorithm 1.The algorithm is similar to the orthogonal matching pursuit algorithm, 35 with a major difference in the atom selection criteria.In the WJSRC algorithm, the most relevant set of atoms belonging to the same class is selected at each iteration step in the atom selection process.To minimize the whole residuals, we propose the weighted atoms selection criteria, which is automatically led by the larger weight descriptors.

Determining the identity of the query image
The identity of the image Y is determined by combining the residuals of all of the query descriptors.
where δ c ð•Þ is a function to select the coefficients belonging to the c'th class.

Summary
The proposed WJSRC method is summarized as follows: i) Extract the SIFT descriptors from the sample and query images to construct the dictionary D and the multi query-descriptor matrix Y, respectively.ii) Calculate the weight for each query descriptor using equation ( 2) to form the weight vector W. iii) Solve the WJSR for all of the query descriptors using Algorithm 1 to obtain the sparse coefficient matrix X.
Determine the identity of the query image using Eq.(3).

Experiments
In this section, we present the performance of the "proposed method" on three public databases: (1) the Yale database, 26 (2) the Olivetti Research Laboratory (ORL) database, 27 and (3) the AR database. 28We focus on three scenarios of alignment-free face recognition: (1) arbitrary patches of holistic faces; (2) faces with arbitrary pose and expression variations; and (3) faces with occlusions.A performance comparison among the related methods, namely the SIFT matching, 22 MKD-SRC, 23 CWS-SRC, 24 JDSRC, 33 and the original SRC methods, 5 is conducted.The three experiments are performed on gray images.The SIFT descriptors extracted from images are of dimension 128.

Determination of the Parameters
In the experiments, one parameter should be set manually, i.e., the sparsity K, which is the number of iterations in Algorithm 1.At each iteration step in the atom selection process, one set of descriptors is selected to represent the query descriptors.Therefore, with the increase in K, the representation for most of the query descriptors becomes more approximate.To ascertain the relationship between the recognition performance and the sparsity K, we examined different values of K on the ORL database and evaluated the resulting performance in terms of the accuracy.The curve is depicted in Fig. 3, which shows that when K is greater than 7, the recognition accuracy is stable, i.e., the approximation is adequate.Therefore, K is set as 7, which has been proven to also be suitable for other databases.For each experimental setting, we use 10 random splits of the data in the experiment.The average results are presented in Table 1.Because the original SRC method is not suitable in this scenario, the other five algorithms are compared.The descriptors extracted from a query partial face are relatively insufficient, and the classification information is limited.Thus, it is necessary to join all of the query descriptors by their correlation.The results in Table 1 show that the WJSRC method achieves the highest recognition rate in the three settings, which indicates the validity and advantage of the proposed method in the scenario of incomplete classification information.
Fig. 3 The relationship between the recognition accuracy and the sparsity K .

Face Recognition with Pose and Expression
Variation This experiment is conducted on the ORL database, which contains 400 images of 40 subjects with different expressions, frontal poses, and slight scale variations.We randomly selected two, three, four, and five images from each subject as samples and the remaining as queries.Some examples of the sample and query images are shown in Fig. 5.For each experimental setting, we use 10 random splits of the data in the experiment.The average results are presented in Table 2.
In this experiment, the recognition rate of the proposed WJSRC method is found to be outstanding.The original SRC method does not work ideally due to the alignment problem.As the database exhibits great changes in pose and expressions and the dictionary does not have sufficient samples, there are many unreliable query descriptors.Because the proposed method considers the query descriptors holistically and joints all of the reliable ones, it achieves a better performance than the others.

Holistic Face Recognition with Occlusion
This experiment is conducted on the AR database.The AR database contains 120 subjects, including 65 males and 55 females.For each subject, 26 images were taken in two sessions, of which 14 images are nonoccluded and the remainder are occluded by various objects, such as scarves and sunglasses.Experiments are performed on the images of two separate sessions.We selected nonoccluded face images in one session as the samples for each subject.The remaining occluded face images in that session are selected as the queries.Therefore, there are 840 samples and 720 queries in each experimental setting.All images were cropped to 128 × 170 pixels.No alignment was performed between the queries and the samples.Some examples of the samples and the queries are shown in Fig. 6.
The recognition performance is presented in Table 3.The performance of the proposed method is found to be outstanding.The MKD-SRC and CWS-SRC methods also work well.As is known, most of the SIFT descriptors extracted from the occluded module in a face image are not reliable.Compared to the results of the JDSRC method, we can see that consideration of the reliability of the query descriptors is practical.Thus, the calculation of the weight of the query descriptors is important.Analyzing the misrecognition images by WJSRC,     we find that our method works poorly for the face images containing too many descriptors, especially in the case where most of these descriptors are unreliable.Our future work will focus on this issue of face images with too many descriptors.

Conclusion and Future Work
In this work, a novel framework for robust alignment-free face recognition was proposed.The approach studies the reliability of the query descriptors holistically and utilizes the correlation among them.We demonstrated promising experimental results on images of partial faces, occluded faces, and faces with variations due to different poses and expressions.Comparison of the proposed algorithm with the related algorithms indicated that the proposed method is more robust for alignment-free scenarios.Meanwhile, some methods may be used to improve the robustness, such as optimizing the approach of weight calculation, which will be studied in the future.
32 and 33, Zhang et al. optimized it by proposing a joint

Fig. 1
Fig. 1 The workflows of the two types of joint sparse representation methods.(a) The workflow of the multiple types of features and dictionaries-based JSR method.(b) The workflow of the multiple features and single dictionary-based JSR method.
Fig.2The sparsity pattern of multiple task sparse representation,23 joint sparse representation34 and joint dynamic sparse representation.33Each column vector denotes a coefficient vector and each block denotes a coefficient value.The white block denotes a zero value and others denote different nonzero values.(a) Multiple task sparse representation.It solves the SR problem for each query feature separately.The coefficient sparsity and value of each query feature may be different.(b) Joint sparse representation.Sparse coefficients share the same sparsity pattern at atom level, i.e., selecting the same atoms for all query vectors simultaneously, but with different coefficient values.(c) Joint dynamic sparse representation.The atoms on the same arrow line represent one set of features selected at each iteration step of the atom selection process.From one iteration to the next iteration, the algorithm keeps the existing atoms in the set and tries to find the next best atoms to add to the set.Sparse coefficients share the same sparsity pattern at class-level.

4. 2
Partial Face Recognition with an Arbitrary Patch This experiment is conducted on the Yale database, which consists of 165 frontal face images of 15 subjects with an image size of 170 × 230.Two, three and four images (per subject) are randomly selected as samples.From each of the remaining images in the three settings, one patch of random size h × w at a random position is cropped as a query, where h and w are randomly selected from (100,160) and (80,110).The queries are all partial faces.Some examples of the sample and query images are shown in Fig. 4.

Fig. 4
Fig. 4 Some examples of the sample and partial query images.(a) The examples of partial query images.(b) The examples of the sample images.

Fig. 5
Fig. 5 Some examples of the sample and query images.(a) The examples of the sample images.(b) The examples of the query images.

Fig. 6
Fig. 6 Some examples of the sample and query images.(a) The examples of the sample images.(b) The examples of the query images.

Table 1
The average recognition performance of the partial faces.

Table 2
The average recognition performance on the ORL database.