Sparse representation (SR)1,2 has become a hot topic in recent years. SR considers a query signal as a linear representation of the columns in , i.e., , where is the dictionary (each column in is typically referred to as an atom), is a sparse representation coefficient vector over the dictionary , and denotes the noise. In Ref. 3, Wright et al. presented a new method sparse representation-based classification (SRC), which achieved high recognition accuracy on face recognition. Due to this approach’s promising performance in image classification, SRC has been widely used in many pattern recognition applications, such as face recognition,4,5 gender,6 digit,7,8 biology data,9,10 and medical image11,12 classification.
For robustness, many methods have been improved and presented. For handling contiguously occluded face recognition, such as disguise or expression variation, a modular weighted global sparse representation method was proposed in Ref. 13, which divided the image into modules and determined the reliability of each module based on its sparsity and residual. Next, a reconstructed image from the modules weighted by their reliability is formed for robust recognition. To obtain rotation and scale invariance, in Ref. 14, the authors constructed a dictionary based on a large number of vehicle images captured at different angles and distances, which made the dictionary large scale and the method time consuming. In Ref. 15, a practical face recognition system was presented, which gained robustness for registration and illumination by minimizing the sparsity of the registration error and capturing a sufficient set of training illuminations for linearly interpolating practical lighting conditions, respectively. In Ref. 16, the authors presented a block-based face-recognition algorithm, which is based on a sparse linear-regression subspace model via a locally adaptive dictionary constructed from the past observable data (i.e., training samples). Though it obtained a high recognition rate, prealignment and a certain scale were always required, i.e., those methods are more suitable for applications in constrained environments. To handle the problem of alignment, in Ref. 17 the authors introduced SIFT descriptors18 to the SRC framework, and proposed multikeypoint descriptors SRC (MKD-SRC) method, which has achieved preliminary success on both holistic and partial face recognition. Additionally, modified MKD-SRC has been proposed based on the Gabor Ternary pattern (GTP) descriptors in Ref. 19. Those two methods may be affiliated to a feature-based SRC method, which has shown good robustness for alignment and affine transform and thus may extend the application of SRC. Obviously, a feature-based dictionary is the core, and it may contain considerable useful information for recognition, which may be omitted with present methods.
Although several researchers who focus on SRC have paid attention to the similarity of atoms,2021.–22 they only use it to optimize the dictionary rather than to improve the recognition rate. For example, in Ref. 22, the authors presented an efficient face recognition algorithm based on the SRC using an adaptive K-means method, which clustered similar atoms of the same class and merged them into one atom while preserving the accuracy. Obviously, the method has not considered the similarity of the atoms belonging to different classes, which will affect the recognition performance.
In this paper, focusing on the scenario of disguises or partial targets and scale and illumination or expression variation without alignment, we propose a clustering-weighted SIFT descriptor-based SRC (CWS-SRC) method.
The remainder of this paper is organized as follows. Motivation for the proposed method is given in Sec. 2. Section 3 proposes the CWS-SRC method. The experimental results of the AR database,23 the Yale face database24 and a self-built car-models database are shown in Sec. 4. The conclusions and future research areas are presented in Sec. 5.
In this section, we first describe the principle of the MKD-SRC method.17 Given a set of sample images collected from different subjects, subdictionary can be constructed by pooling all of the descriptors extracted from the samples of each subject, and a gallery dictionary can be obtained . A probe image can be denoted with a set of SIFT descriptors, i.e., , where () is the ’th probe descriptor. Thus, the problem of recognition of is converted to the problem of solving a multitask -minimization problem:
With a SIFT descriptor-based dictionary, MKD-SRC17 has not only successfully resolved the problem of alignment, but also handled the affine transformation to some extent. Although several images or even one as samples per subject are sufficient for face recognition with the MKD-SRC method,17 this approach may not always work well for a general three-dimensional (3-D) target, which may be due to different application requirements. For frontal face recognition, a few (even one) samples are sufficient. For a general 3-D target, more sample images are necessary for recognizing an image in an arbitrary view. For example, for vehicle recognition, rotation invariance is important and many more vehicle images taken from different angles are crucial.14 Those are often similar. In such scenarios, there will be more similar SIFT descriptors. For convenience, similar descriptors in the dictionary are called similar subsets. They will influence the sparse representation result of the orthogonal matching pursuit (OMP) algorithm.25 The reason for that will be deduced next.
It is known that with OMP, the sparsest linear combination of is obtained by calculating the correlation and projecting orthogonally, alternately, and iteratively. OMP selects the atom with the highest correlation to the current residual at each step. Once the atom is selected, the signal is orthogonally projected to the space spanned by the selected atoms. The residual is subsequently recomputed, and the process is repeated. Though the most correlated atom is selected in each iteration, the final linear combination of the atoms may not be the best representation for . It seems that such a SIFT descriptor-based dictionary is far from the requirement of the restricted isometry property (RIP),26,27 which is discussed in Ref. 28. However, the distribution of similar descriptors in classes can characterize their discrimination.29 Therefore, studying and utilizing the distribution of similar descriptors to improve recognition performance are beneficial.
As mentioned in Sec. 2, considerable discriminative information may be included in similar SIFT descriptors, which will affect the recognition rate. To tackle this problem, we propose a clustering-weighted SIFT descriptor-based SRC method in this paper.
Gallery Dictionary Construction
Extracting the SIFT descriptors
Given a set of sample images of different subjects, we extract the SIFT descriptors 18 from them and subsequently construct the following dictionary:
Clustering for each atom in A according to similarity
In this paper, the similarity is measured by the inner product . If it is greater than a threshold , atoms and are treated as similar. For each atom in the dictionary, we clustered atoms similar to it and pooled them together as a subset . Then, clustering subsets denoted as are obtained, where is the number of descriptors in the ’th subset.
Determining the Weight of the Atoms in Dictionary A
To resolve the multitask problem, we introduce a weighted-voting classifier in this paper. The primary challenge is how to assign the appropriate weight to each atom in the dictionary.
Relationship between the distribution of the similar atoms and their weight
After clustering, we obtain clustering subsets. Similar atoms in each subset may belong to either the same or different classes. The distribution of atoms will determine how discriminative the corresponding atom is in dictionary . Consider the extreme case. If the atoms in subset all belong to the ’th class, atom is the most representative and discriminative for the ’th class. In this instance, if a probe descriptor only matches this atom via the sparse representation, we can deduce that reliably it belongs to the ’th class. Otherwise, if similar atoms of a subset are distributed in many classes, a misjudgment is likely to occur.
Therefore, considering the distribution of similar atoms in a subset, we can infer (1) for sufficient samples, if the atoms of subset concentrate on the same class as , can be observed as common and representative for that class. The larger the quantity of the similar atoms in that belong to the same class as , the more important is. We call it intraclass similarity; (2) if a large percentage of similar atoms belong to a certain class, i.e., the distribution is more intensive, the corresponding atom can characterize the class more effectively, and the atom will have greater discrimination ability. On the contrary, if the distribution is dispersed, the discrimination ability of the corresponding atom is smaller. We refer to it as interclass discrimination.
The purpose of the weighted method is to find the common and representative atoms for each subject and attach a weight to them. The weight of one atom is determined by both its intraclass similarity and interclass discrimination, which will be presented next.
Given a clustering subset and the corresponding atom , according to , we will determine a quantity vector: , , where denotes the quantity of the atoms of the ’th class in the ’th subset . If there is no descriptor of the ’th class, does not include . We determine the weight of the atom by two factors: the intraclass similarity and the interclass discrimination.
Calculating the intraclass similarity
For the atom in , suppose it belongs to the ’th class, then its intraclass similarity is proportional to the quantity of the similar atoms belonging to the ’th class in , which is denoted as
Calculating the interclass discrimination
The interclass discrimination of the atoms is determined by the distribution of all similar atoms in the corresponding clustering subset. We adopt the following method to measure the interclass discrimination of atoms.
How does it stand for discrimination? We will examine this question briefly. For simplicity, in the following equations, the superscript or subscript for the ’th clustering subset is omitted; for example, is replaced with , replaces , etc. Thus, according to the definition of the norm, Eq. (5) can be written as
The average and variance of the elements in are defined as8) shows that is positively correlated to the variance of and negatively correlated to the average and the norm of , and its meaning can be highlighted with two extreme cases: (1) if similar atoms in all belong to the ’th class, i.e., , , , the corresponding atom is the most discriminative for the class; (2) if the atoms in are equally distributed among all classes, i.e., , , , is the least discriminative, and the discriminative power decreases as the number of classes increases. Thus, in a clustering subset, Eq. (5) shows the relationship between the distribution of the atoms over all classes and the interclass discrimination.
If there are SIFT descriptors detected for a probe image, we have
For , we have the following sparse representation by the gallery dictionary .
If belongs to some class, the nonzero coefficient in vector will be concentrated on that class, i.e., the value of that class in is larger.3 In Ref. 17, the authors demonstrated that the concentration of the sparse representation coefficient can determine the best matching class. Thus, we have the following weighted-voting function to determine the identity of the probe image
The proposed CWS-SRC method can be summarized as follows:
1. Extract the SIFT descriptors from the sample images and construct the dictionary denoted as Eq. (3).
2. Cluster by similarity and obtain clustering subsets.
4. Have the sparse representation of each SIFT descriptor detected in a probe image, and then obtain the identity of the probe image by taking the SRC result of each descriptor to the weighted-voting classifier using Eq. (13).
In this paper, three databases, i.e., the AR database,23 the Yale face database,24 and a self-obtained car-model database, are used for evaluation. A performance comparison among the proposed methods, the SIFT matching approach,18 the MKD-SRC method,17 and the original SRC algorithm3 (just on the occluded image experiment), is conducted. Three different scenarios are considered: (1) occluded face (AR), (2) enlarged arbitrary patch extracted from the holistic face (Yale face database), and (3) different scales and pitch angles of car-model recognition. Because the interclass discrimination and intraclass similarity are of primary importance for the proposed method, sufficient samples for the sparse representation dictionary are required. All experiments were performed on gray images. The SIFT descriptors extracted from images are of dimension 128. The weight of the atoms in CWS-SRC method is calculated offline. Therefore, the speed of the proposed algorithm is up to the scale of the dictionary.
Holistic Face Recognition with Occlusion
This experiment was conducted on the AR database. The AR database contains 120 subjects, including 65 males and 55 females. The images were captured in two different sessions, with different expressions and occlusions, such as sunglasses, scarf, and so on. For each subject, 26 images were taken, of which 14 images are nonoccluded. We randomly selected three images from the nonoccluded ones as samples and all occluded ones as probes. Thus, there were 360 face images in the sample set and 1440 images in the probe set. All images were cropped to . No alignment has been performed between the probes and the samples. Some examples of the sample and the probe are shown in Fig. 1.
To ascertain the relationship between the recognition performance and the similarity threshold , we examined different values of and evaluated the resulting performance in terms of accuracy. The curve is shown as Fig. 2. Therefore, we set the value of as 0.97, which has been proven to also be suitable for other databases, and may be set as an empirical value.
For recognition rate, we compared the proposed CWS-SRC method to the other three algorithms. Following the experimental settings, we use 10 random splits of the data for the experiment. The average and deviation results of the algorithms are listed in Table 1. It has been shown that the CWS-SRC achieves the highest recognition rate of up to (), which is slightly higher than that of MKD-SRC and much higher than those of the others. Because no alignment has been performed between the sample and the probe sets, the recognition rate of SRC is considerably lower. Therefore, for occluded holistic face recognition without the alignment process, the CWS-SRC method can achieve a better performance.
The results of holistic face recognition with occlusion through the method of SIFT matching, sparse representation-based classification (SRC), multikeypoint descriptors-SRC (MKD-SRC), and clustering-weighted SIFT-based SRC (CWS-SRC).
|Recognition rate (%)||53.47±0.83||12.01±1.04||88.82±0.79||93.89±0.84|
Partial Face Recognition with Arbitrary Patch
The cropped Yale database consists of 165 frontal face images of 15 subjects with an image size of . We randomly selected two images per subject as samples and the remaining as the probes. For each probe image, one patch of random size at a random position was cropped as a partial face, where and were randomly selected from (120,180) and (90,130), respectively. Thus, there were 135 partial images (nine images per subject) in the probe set and 30 images in the sample set (two images per subject). Examples are shown in Fig. 3.
The threshold value of the similarity is still 0.97. Because the original SRC algorithm is unsuited to partial or scale variation scenarios, only three methods are compared in this part. Following the experiment settings, we use 10 random splits of the data for the experiment. The performance of the remaining three methods is shown in Table 2. The proposed CWS-SRC method achieves the highest recognition rate of . The recognition rate for SIFT matching and the MKD-SRC method are and , respectively.
The results of partial face recognition through the method of SIFT matching, multikeypoint descriptors-SRC (MKD-SRC), and the proposed clustering-weighted SIFT-based SRC (CWS-SRC).
|Recognition rate (%)||65.93±0.78||79.52±0.92||85.93±0.89|
Car Model Image Recognition with Different Scales and Pitch Angles
The car-model database is self-built and is captured using the equipment shown in Fig. 4. By adjusting the photography parameters, e.g., distance, pitch angle, illumination, we can capture car images of different scales and postures. The database consists of 10 vehicles (e.g., Touran, Tiguan, Polo, Passat, etc.), which are shown in Fig. 5(a). Examples of the sample and probe set are shown in Figs. 5(b) and 5(c), whose photography parameters are listed in Table 3.
Photography parameters of the two groups of car-model images.
|Distance (cm)||Pitch angle (deg)||Illumination (lx)||Rotating angle (deg)||Quantity||Image size|
In this experiment, we took different quantities of the samples to evaluate the performance of the CWS-SRC method. The quantity of the sample set per subject was increased from 20 to 60 with a step of 10, and the newly added sample images were randomly selected. Simultaneously, the number of similar descriptors grew rapidly. The experimental results are shown in Fig. 6 (where ). It is shown that the CWS-SRC and the MKD-SRC methods are superior to the SIFT matching. With the quantity of sample images increasing, the result shows that the CWS-SRC method is more suitable for a target recognition task when many more samples are available.
The results of the three experiments demonstrate that the weighted-voting classifier based on the similarity of features has contributed to improving the recognition rate, and the proposed CWS-SRC method can obtain a better performance in alignment-free scenarios and also exhibits good robustness for scale variation and affine transformation. Comparing the experimental results, we find that the result of the holistic face with an occlusion is the best, possibly due to its relatively simple experimental condition. The result shows that sufficient information is necessary to improve the performance of the SRC-based method; therefore, it makes sense to explore optimization based on the similarity of the features.
Conclusions and Future Work
In this work, a novel framework for robust target recognition with sufficient sample images is proposed, the CWS-SRC method. With this method, each image is represented by a set of SIFT descriptors. First, we obtain subsets by clustering based on the similarity. Next, based on the subsets, we calculate each atom’s weight, and a weighted-voting classifier is created. Finally, each descriptor detected in a probe image can be sparsely represented by the dictionary, and the identity of the probe image can be inferred via the classifier.
We evaluated the proposed approach on three conditions, i.e., the holistic face with occlusion (AR database), the partial face (Yale database), and the car-model with affine transformation and scale variation. Compared to the SIFT matching, the MKD-SRC and the original SRC methods, the experimental results clearly and consistently indicate that the proposed method is more robust with an increase in the number of sample images for alignment-free image recognition. Meanwhile, there are still methods that may improve the robustness, such as dictionary optimization, which will be studied in the future.
Bo Sun received his BSc degree in computer science from Beihang University, China, and his MSc and PhD degrees from Beijing Normal University, China. He is currently a professor in the Department of Computer Science and Technology at Beijing Normal University. His research interests include pattern recognition, natural language processing, and information systems. He is a member of ACM and a senior member of the China Society of Image and Graphics.
Feng Xu received his BSc degree in electronic science and technology from Beijing Normal University, 2009. He is currently working toward his MSc degree in computer application technology at Beijing Normal University. His research interests include pattern recognition and signal processing.
Jun He received her BSc degree in optical engineering and her PhD degree in physical electronics from Beijing Institute of Technology, China in 1998 and 2003, respectively. Since 2003, she has been with the College of Information Science and Technology of Beijing Normal University, China. She was elected as a lecturer and an assistant professor in 2003 and 2010, respectively. Her research interests include image processing application and pattern recognition.