Open Access
21 September 2015 Progressive sparse representation-based classification using local discrete cosine transform evaluation for image recognition
Xiaoning Song, Zhen-Hua Feng, Guosheng Hu, Xibei Yang, Jing-yu Yang, Yunsong Qi
Author Affiliations +
Abstract
This paper proposes a progressive sparse representation-based classification algorithm using local discrete cosine transform (DCT) evaluation to perform face recognition. Specifically, the sum of the contributions of all training samples of each subject is first taken as the contribution of this subject, then the redundant subject with the smallest contribution to the test sample is iteratively eliminated. Second, the progressive method aims at representing the test sample as a linear combination of all the remaining training samples, by which the representation capability of each training sample is exploited to determine the optimal “nearest neighbors” for the test sample. Third, the transformed DCT evaluation is constructed to measure the similarity between the test sample and each local training sample using cosine distance metrics in the DCT domain. The final goal of the proposed method is to determine an optimal weighted sum of nearest neighbors that are obtained under the local correlative degree evaluation, which is approximately equal to the test sample, and we can use this weighted linear combination to perform robust classification. Experimental results conducted on the ORL database of faces (created by the Olivetti Research Laboratory in Cambridge), the FERET face database (managed by the Defense Advanced Research Projects Agency and the National Institute of Standards and Technology), AR face database (created by Aleix Martinez and Robert Benavente in the Computer Vision Center at U.A.B), and USPS handwritten digit database (gathered at the Center of Excellence in Document Analysis and Recognition at SUNY Buffalo) demonstrate the effectiveness of the proposed method.

1.

Introduction

The great success of sparse representation in image processing triggers a great deal of research and practices on representation-based pattern classification. Recently, a sparse representation-based classification (SRC) method1,2 has been successfully proposed by Wright et al., which led to promising results in face recognition. SRC aims to represent a test sample using an untraditional dictionary that consists of all training samples across all subjects. Then, it evaluates the contribution to represent the test sample from each class and exploits the optimal evaluation result to classify the test sample. The role of the SRC is that the performance is measured in terms of sparsity of the representation and fidelity to the original signals. In fact, the traditional dictionary learning rules were not assumed to have any particular semantic meaning because they are typically chosen from standard bases such as Fourier, Wavelet, Curvelet, and Gabor, or even random matrices.3,4 Fortunately, we can reach a conclusion that the robust representation has naturally discriminative properties on the basis of the literatures:1,2 among all the subsets of base vectors, it selects the subset that well reconstructs the input signal and rejects all the other possible but inferior representations. For example, some alternative methods have been proposed to reconstruct the dimensional structure of data, and can robustly extract better discriminant features. Specifically, Zhang et al.5 proposed the tensor linear Laplacian discrimination (TLLD) method for nonlinear feature extraction from tensor data, which could be viewed as an extension of both LDA and LLD6 in directions of nonlinearity and tensor representation. Meanwhile, to handle the high dimensional face-shape regression problem, a hierarchical pose regression approach has been proposed by Zhang et al.7 to hierarchically estimate the head rotation, face components, and facial landmarks. Through empirical studies, Sun et al.8 also discovered three properties of deep neural activations critical for the high performance: sparsity, selectiveness, and robustness. Therefore, the capability of robust representation to uncover discriminative information depends on the effective learning model, which exploits important structures of training samples.

Meanwhile, the local untraditional dictionary learning methods were also developed to enhance representation-based pattern classification. Previous studies have shown that the local learning methods have been focused on by lots of researchers in recent years.916 For example, Yang et al. in Ref. 17 introduced the similarity and distinctiveness of features into collaborative representation-based classification (CRC),18 and present a more general model. Xu et al.19,20 have proposed selection strategies to seek best valued training samples for a new SRC model, which had clear rationales and achieved high face-recognition accuracies. Especially, the method proposed in Ref. 19 is somewhat associated with the idea of CRC, but the former usually leads to higher accuracy due to the use of the reasonable selection strategy. Representation-based classification has also been extended for bimodal biometrics.21,22 Yang et al.23,24 proposed Kernel CRC and CRC projections method. Liu et al.25 evaluated the reconstruction error of the test sample to improve the accuracy of CRC. For the abovementioned studies, we refer to all of the algorithms that exploit only a subset of the training samples rather than all of them to classify each test sample as “local” methods. It can also be viewed as an evaluation method that introduces “local” training samples to represent and classify the query samples. Specifically, some heuristic strategies are generally presented to determine the optimized “local” training samples for the test sample. We then use these remaining “local” training samples to perform classification. Therefore, we can conclude that the “local” methods practically convert the original classification problem into a simpler one that contains scale-reduced subjects. In fact, the rationale of these “local” methods is described as follows: it has been empirically proven and widely admitted that if the test sample is provided with high correlation to the training samples from a subject, it should be great reasonable to classify the test sample into this subject.

In addition, discrete trigonometric transform such as DCT has been widely used in image processing for transform-based coding. The main reason for employing DCT to transform image-feature space is that features can be extracted from images in a compressed format. Here, compression means that the signal enclosed by a limited number of DCT coefficients can be restored. Meanwhile, the cosine similarity measure based on Bhattacharya’s distance is defined as the inner product of these two vectors divided by the product of their lengths, which is a classical measure method. The related work is proposed in Refs. 2627.28.29. Furthermore, local image descriptors based on interesting regions have proven to be very successful in pattern-recognition applications. Song and Li30 proposed local polar DCT features (LPDFs), which extract and rearrange the selected two-dimensional (2-D) DCT features in a designed polar geometric structure.

The present study proposes a progressive sparse representation-based classification algorithm (P-SRC) using a local correlative degree evaluation to perform face recognition. The contributions of our work are threefold.

  • The contribution from each subject for representing a test sample is calculated by the sum of the contributions of all the training samples of this subject, then the redundant subject with the smallest score could be eliminated from the training set. This procedure is iteratively implemented for the remaining subjects till the predefined termination conditions have been met.

  • The transformed DCT evaluation is constructed to measure the similarity between the test sample and each local training sample by using cosine distance metrics in the DCT domain. Then the representation capability of each local training sample is exploited to determine the optimal “nearest neighbors” for representing the test sample.

  • The final goal of the proposed method is to generate an optimal weighted sum of L nearest neighbors that is obtained under the local correlative degree evaluation, which is approximately equal to the test sample and we can use this weighted linear combination to perform classification. It is noted that the mechanism of the progressive dictionary learning in this method not only achieves a high accuracy but also can be clearly interpreted.

The rest of the paper is organized as follows: Sec. 2 describes a typical global training samples representation-based algorithm. The proposed method is presented in Sec. 3. Experimental results are reported in Sec. 4 and Sec. 5 concludes the paper.

2.

Outline of Global Training Samples Representation

As the typical sparse representation-based classification algorithm, a different classification rule for the test sample can be developed, in which the linear combination (i.e., coefficient) of training samples is sequentially handled. This leads to the global training samples representation method.19 This section introduces the global method that exploits all the training samples to represent and classify the test sample.

Suppose that there are n training samples, respectively, denoted by n column vectors x1,,xn. The global method assumes that test sample y can be approximately represented by a linear combination of all the training samples. In other words, the following equation is approximately satisfied:

Eq. (1)

y=i=1nαixi.

Equation (1) can be rewritten as

Eq. (2)

y=Xα,
where α=(α1,,αn), X=(x1,,xn). If XTX is nonsingular, the least-squares solution of Eq. (2) is:

Eq. (3)

α=(XTX)1XTy,
if XTX is (or is nearly) singular, we solve α using

Eq. (4)

α=(XTX+λI)1XTy,
where λ is a small positive constant and I is the identity matrix. Once we obtain α using Eqs. (3) or (4), we refer to yXα as the deviation of the linear combination Xα from test sample y.

According to Eq. (1), we know that the contribution of the i’th training sample is αixi; thus, we can calculate the sum of the contribution of the training samples from one class. For instance, if all the training samples from the k’th class are xs,,xt, then the contribution of the k’th class is gk=αsxs++αtxt. The deviation of gk from y can be calculated by Dk=ygk2, and the smallest deviation Dk means that gk has the combination of the most competitive within-class training samples to represent the test sample.

3.

Progressive Sparse Representation-Based Classification

The proposed framework can be divided into three stages, namely, redundant subject elimination stage, local correlative degree evaluation stage, and the classification stage. Figure 1 shows the schematic diagram of the proposed P-SRC framework.

Fig. 1

Schematic diagram of the proposed progressive sparse representation-based classification (P-SRC) algorithm.

JEI_24_5_053010_f001.png

3.1.

Motivation of the Present Work

There exist two previous works that we have undertaken on how to design traditional or untraditional dictionaries to better fit the sparse model by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals.

Specifically, in our previous work,31 we develop an iterative class elimination algorithm to represent a test sample as a linear combination of the relatively competitive training samples; here, this training set can be viewed as an untraditional dictionary. The contribution from each subject in presenting the test sample is calculated by the sum of contribution of all the training samples of this subject, then the redundant subject with the smallest score to this test sample would be eliminated from the training sets. This procedure is iteratively implemented for the remaining subjects after the predefined termination conditions have been met, and the final remaining training samples are used to generate a representation of the test sample and to classify it. Meanwhile, a different update rule for the dictionary can be proposed, in which the atoms (i.e., columns) in the dictionary are sequentially handled. This leads to the K-means singular value decomposition (K-SVD) algorithm, as developed by Aharon et al.32 In another previous study,33 considering that the existing K-SVD algorithm is employed to dwell on the concept of a binary class assignment, which means that the multiclass samples are definitely assigned to the given classes. Thus, the method proposed in our study provides a parameterized fuzzy adaptive way to adapting the traditional dictionaries, which is called parameterized fuzzy K-SVD. Actually, it is worth stressing that the proposed fuzzy K-SVD algorithm has been one of the effective algorithms due to its capacity for optimization update rule for the traditional dictionary.

In contrast, the key idea of the present work is to accomplish face recognition by interpreting a P-SRC algorithm under a local correlative degree evaluation. The proposed method has the following characteristics: the remaining samples obtained by the iterative elimination stage of redundant subjects still have different representation abilities in representing the test sample. This motivates us to consider whether we can construct an evaluation model to find the more important training samples that hold high correlations to the test sample. Now, we highlight the favorable properties of P-SRC and main contributions of this work as follows. First, when the initial updated training set is obtained by iterative elimination stage of redundant subjects, we construct the transformed DCT evaluation to measure the similarity between the test sample and each remaining training sample by using cosine distance metrics in the DCT domain. Second, we collect the former L training samples that have large similarity values in the DCT domain; the remaining training samples should be discarded. Third, we determine an optimal weighted sum of L nearest neighbors, which is approximately equal to the test sample, and we can use this weighted linear combination to perform classification.

3.2.

Stage of Redundant Subject Elimination

This section details the first stage of the proposed P-SRC. This stage aims at representing a test sample as a linear combination of the most competitive training samples, and the update of the coefficient columns is implemented by integrating an elimination mechanism for redundant subjects. Thus, one redundant subject with the smallest score to this test sample would be iteratively eliminated from the training sets. In fact, if there are many indistinctive subjects for test sample’s classification, this step only strengthens the most competitive subjects.

Suppose that there are C classes and n training samples x1,,xn. Motivated by collaborative representation-based classification in Ref. 18, we exploit a linear combination of k’th subject to represent the test sample. If y belongs to the k’th class, it should be represented as below:

Eq. (5)

y=X¯kβk,k=1,2,,C,
where X¯k=[w(1)k,w(2)k,,w(M)k], w(m)k is the m’th training sample in k’th class. When X¯k is a nonsingular square matrix, βk can be solved by using βk=X¯k1y; otherwise, it can be solved by means of βk=(X¯kTX¯k+μI)1X¯kTy, where μ is a small positive constant and I is the identity matrix. The recursive subject elimination strategy proposed in this stage aims to remove the redundant subjects that receive a small error in the linear representation equation. The subset of training samples from a redundant subject will be entirely removed after each iteration; meanwhile, the linear representation equation is recomputed. Therefore, the theoretical description of the first step of P-SRC is described as follows.

This step essentially builds an empirical risk minimizer (ERM) ϕ, on the basis of training data (X¯k,y), k=1,2,,C, for which both the signal Xk and the true label y are known. Evidently, we expect ϕ to have good generalization performance, meaning that we want the true error rate of ϕ as low as possible. In fact, ϕ can be well estimated under the condition that the empirical risk is minimized.34,35 According to the ERM theory, we conclude that the set of training samples from the k’th class (k=1,2,,C) makes a respective contribution to representing the test sample, and this contribution can be evaluated by reconstruction error between Xkβk and y, i.e., rk=yX¯kβk22. The rk can also be viewed as a discrimination measurement between the test sample and the k’th class. It shows that the redundant subject corresponding to the smallest score can be eliminated from the training set, and the procedure is iteratively implemented for the remaining subjects after the predefined termination conditions have been met. Therefore, the first step of P-SRC is treated as an iterative method that alternates between sparse representation and a process of updating the training classes to better fit the test sample.

3.3.

Stage of Local Discrete Cosine Transform Evaluation

The second stage of P-SRC aims at designing a correlation evaluation for the remaining samples that are obtained from the first stage of our method. Actually, the samples obtained in the first stage still have different representation abilities in representing the test sample. We intend to introduce an evaluation model to find the important training samples that hold high degrees of correlation to the test sample. Hereafter, we can use these competitive training samples to accurately represent the test sample and perform the final classification. In this study, the correlation degree evaluation in DCT domain is introduced first.

According to the cosine distance metrics in DCT domain, it is the cosine value of the angle between two vectors, e.g., gi denotes the training samples of the remaining subjects, y denotes the test sample, as follows:

Eq. (6)

cos(gi,y)=gi·ygiy,
where · indicates the dot-product of the vectors, and · indicates the length of the vector. For vectors with non-negative elements, the resulting similarity ranges from 1 meaning exactly opposite, to 1 meaning exactly the same, with 0 usually indicating independence, and in-between values indicating intermediate similarity or dissimilarity.

Equation (6) can be viewed as a measurement of the cosine similarity in the DCT domain between gi and y. A large value of this measurement means that gi is similar to the test sample y, thus we collect the former L training samples that have large similarity values and the remaining training ones should be discarded. Assume that H={h1,,hL} is a set of some numbers, standing for the set of labels of L competitive training samples. The label is defined as follows: if a useful training sample is derived from the k’th class (k=1,,C), the category of this training sample is labeled as k. H should be one subset of the set {1,,C}, i.e., H{1,,C}. On the contrary, if a discarded training sample is from the k’th class (k=1,,C) then the category k must not be an element of H. Consequently, the test sample could not be finally classified into the k’th class.

3.4.

Classification Stage

The third stage of the P-SRC is to represent the test sample as a linear combination of the determined L nearest neighbors and uses the representation result to classify the test sample. This phase assumes that the following equation is approximately satisfied:

Eq. (7)

y=γ1g1++γLgL,
where gi(i=1,,L) are the identified L nearest neighbors and γi(i=1,,L) are the coefficients. Here, Eq. (7) is rewritten as

Eq. (8)

y=Gγ,
where γ=[γ1,,γL]T, G=[g1,,gL]. If G is a nonsingular square matrix, we can solve γ by using γ=G1y; otherwise, we can solve it by using γ=(GTG+μI)1GTy, where μ is a small positive constant and I is the identity matrix.

Since the nearest neighbors might be from different classes, we calculate the sum of the contribution to represent the test sample of the neighbors from each class and exploit the sum to classify the test sample. More specifically, if all the neighbors from the k’th (k=1,,C) class are gs,,gt, then the sum of the contribution to representing the test sample of the k’th class is described as follows:

Eq. (9)

yk=γsgs++γtgt.

Thus, the residual formula of yk from y is presented as below:

Eq. (10)

Dk=yyk22.

Obviously, the above formula allows the residual between the test sample and each yk to be evaluated in a fair way by simultaneously exploiting yyk22. Finally, a smaller deviation Dk means a greater contribution to representing the test sample, and we classify y into the class that produces the smallest deviation.

3.5.

Detailed Progressive Sparse Representation-Based Classification Algorithm

The pipeline of the proposed P-SRC algorithm is detailed as follows:

P-SRC Task:

The update of the training dictionary should be implemented to better represent the test data y, by approximating the solution to the classification problem.

Step 1. Initialization:

Suppose that we have C subjects, and let X¯=[X¯1,,X¯C].

Main procedure:

Step 2. Stage of redundant subject elimination:

  • 1. Code y over X by β=Py, obtain coefficient β, where, P=(X¯TX¯+μI)1X¯T, μ is a small positive constant and I is the identify matrix.

  • 2. Design the following procedure to update the columns of the training dictionary matrix X and obtain Xj. Repeat for j=1,2,,η, compute the regularized residuals

    rk=yX¯kβk22/βk22,k=1,,C.

  • 3. Discard the redundant subject corresponding to the smallest score from the training set as X¯j={X¯X¯k}, that is, the set of training samples from k’th subject is entirely eliminated.

  • 4. If the predefined termination condition η has been met, go to Step3. Otherwise, go to Step2 for another iteration. Here, η stands for the number of iterations of redundant subject elimination.

Step 3. Stage of local DCT evaluation:

  • 1. Employ DCT to transform image feature space, the features can be extracted from images in the compressed format.

  • 2. Use the transformed DCT evaluation to measure the similarity between the test sample and each remaining training sample by using Eq. (6).

  • 3. Exploit cos(gi,y) to identify L training samples that have greatest contributions, representing the test sample by linear combination of [g1,,gL].

Step 4. Stage of classification:

  • 1. If all the nearest neighbors from the r’th (rC) subject are gs,,gt, where {gs,,gt}{g1,,gL}, then calculate the sum of the contributions to reconstruct test sample y

    yr=γsgs++γtgt.

  • 2. The residual of yr from y is calculated by using

    Dr=yyr22.

  • 3. Output the identity of y as

    Identify(y)=argminr{Dr}.

3.6.

Discussions

This section provides an extensive discussion regarding potential advantages of the proposed P-SRC method. The superiority of the proposed method stems from two aspects as follows.

First, considering that the training samples from different subjects might have inherent mutual relationship easily leads to ignoring the relationship between the different subjects when we design a strategy that the redundant subjects are entirely eliminated only once. As a result, the problem of multicollinearity that causes the weights to become extremely large or small will be raised because of numerical ill-conditioning. In contrast, in the present study, the redundant subject corresponding to the smallest score could be eliminated from the training sets with each iteration. In the experiment, the range of elimination percentage is designed as [10%, 90%] with 10% intervals; then the optimal value of η can be empirically assigned. Therefore, this step of P-SRC is viewed as an iterative method that alternates between a sparse representation and a process of updating the training classes to better fit the test sample. It is able to greatly reduce the inverse influence on the classification when a part of training samples from different subjects are very dissimilar to the test sample.

Second, we measure the similarity between the test sample and each local training sample by using cosine distance metrics in the DCT domain. Then the representation capability of each local training sample is exploited to determine the optimal “nearest neighbors” for representing the test sample by a linear formulation. The goal of the proposed method is to determine an optimal weighted sum of nearest neighbors that are obtained under the local correlation evaluation; we then use this weighted linear combination to perform robust classification. Therefore, the proposed method can also be treated as an evaluation method that introduces “local” training samples to represent and classify the query samples, i.e., it belongs to the local untraditional dictionary learning methods. It is worth noting that the local polar DCT features proposed by Song and Li30 are extracted and used for feature matching between two images; Song and Li first quantize the preprocessed local patch in the gray-level domain. Then the 2-D DCT features in the frequency domain are extracted and a subset of the reordered coefficients selected as compact descriptor. In contrast, we design a criterion of correlation degree in DCT domain to evaluate the training samples. The evaluation aims to find the useful training samples that hold high correlation to represent the test sample. Therefore, this is the key difference between this previous study and the proposed method.

4.

Experimental Results

This section conducts the experiments of P-SRC on various image classification tasks, including face recognition and handwritten digit recognition.

4.1.

Face Recognition

We conducted a number of experiments on the ORL,36 FERET,37 and AR38 databases. The face images of these three databases were obtained under the conditions of varying pose, facial expression, or lighting. Occluded face images are also included in the AR face database.

Meanwhile, a biometric system can be regarded as a pattern-recognition system, where a feature set is first extracted from the original samples and then is compared with the stored template set to make a decision on the identity of an individual. In biometric verification mode,39 the decision is whether a person is “who he/she claims to be.” In biometric identification mode, the decision is “whose biometric data is this?” More specifically, face verification aims to determine whether two given faces refer to the same person. The identification task is inherently more difficult than verification, since the input face image must be compared with, and matched to, each face in the enrolment database. The test face is then identified as belonging to the face class that shows the highest similarity.

There are two main respective evaluation plots for face verification and identification: the receiver operating characteristic (ROC) curve40 and the recognition rate (PR) curve. The ROC curve examines the relation between the true-positive rate and the false-positive rate, while the PR curve extracts the relation between the number of samples (or classes) and identification precision. Concretely, the ROC curve describes the performance of a verification or diagnostic rule. This curve is generated by plotting the fraction of true positives out of the positives (true-positive rate) versus the fraction of false positives out of the negatives (false-positive rate), at various threshold settings. In the two-class verification case (for example, face and nonface), the true positive means the portion of face images to be detected by the system, while the false positive means the portion of nonface images to be detected as faces. Thus, the ROC curve must be used in the experiments of face verification. In turn, the PR curve is usually employed in the experiments of face identification. Therefore, the present study focuses on a progressive SRC for face recognition (identification), which is the reason why we utilize PR curve to evaluate the performance of the experiments.

The ORL contains a set of faces taken between April 1992 and April 1994 at the Olivetti Research Laboratory in Cambridge, UK. It contains 40 distinct persons with 10 images per subject. The images were taken at different time instances, with varying lighting conditions, facial expressions, and facial details. All persons are in the up-right, frontal position, with tolerance for some side movement. Each image was normalized and presented by a 46×56 pixel array, whose gray levels ranged between 0 and 255. Some sample images from the ORL database are shown in Fig. 2.

Fig. 2

Part of images from ORL face image database.

JEI_24_5_053010_f002.png

The FERET face image database is a result of the FERET program, which was sponsored by the U.S. Department of Defense through the DARPA program. It has become a standard database for the evaluation of state-of-the-art face recognition techniques. The proposed algorithm was evaluated on a subset of FERET database, which includes 1400 images of 200 individuals with seven different images of each individual. Some sample images from the FERET database are shown in Fig. 3. For the AR face database, we used 3120 gray images from 120 subjects with each subject providing 26 images. Some sample images from the AR database are shown in Fig. 4.

Fig. 3

Part of images from FERET face database.

JEI_24_5_053010_f003.png

Fig. 4

Part of images from AR face database.

JEI_24_5_053010_f004.png

We resized each face image of the AR database to a 40×50. The face images of the FERET databases were also resized using the same algorithm. In the experiments on the AR database, which is randomly partitioned into a training set and a test set with no overlap between the two. The partition of the database into training and testing sets, which call for four images per individual, randomly chosen for training, and the remaining images for test. Thus, a training set of 480 images and a test set with 2640 images are created. To make full use of the available data and to more accurately evaluate the generalization power of algorithms, the figures of merit are success rates averaged over 10 runs, with each run being performed on such random partitions in the AR database. Moreover, the error margin for both methods (mean and standard deviations) is provided in the following experiment. The proposed method exploits 200 finally remaining training samples to represent and classify the test sample on the AR database.

In the experiments on the ORL database, five samples per class were used as training samples and the others were used as test samples. The 40 finally remaining training samples are used to represent and classify the test sample. For the ORL database, four sets of training samples and test samples were generated. The first set of training samples consists of the first, second, third, fourth, and fifth samples of each subject. The second, third, and fourth training sets are composed of the first, second, third, fourth, and sixth samples of each subject; the first, second, third, fourth, and seventh samples of each subject; the first, second, third, fourth, and eighth samples of each subject, respectively. For each set of training samples, the set of test samples consists of the samples that were not used as training samples. In the experiments on the FERET database, we chose the former four images per individual for training and the remaining images for testing. The proposed method exploited 200 finally remaining training samples to represent and classify the test sample.

Table 1 indicates the classification error rates of principle component analysis (PCA),41 linear discriminant analysis (LDA),42 two-phase test sample sparse representation (TPTSR),19 coarse-to-fine face recognition (CFFR),21 sparse representation-based classification (SRC),1 extended sparse representation-based classification (ESRC),43 collaborative representation-based classification (CRC),18 sparse discriminant analysis with l2,1-minimizaiton (l2,1-SDA)44 and the proposed method on the ORL face image database. As shown in Table 1, it is therefore reasonable to believe that the proposed method is the most effective one in the presence of different number of training samples per individual.

Table 1

The classification error rate (%) of each method varies with number of training samples per individual on the ORL face image database.

MethodNumber of training samples
56
PCA(50)7.25.2
PCA(100)7.96.0
PCA(150)7.86.1
LDA(39)4.83.7
TPTSR4.43.3
CFFR3.72.8
SRC4.53.6
CRC4.23.3
ESRC4.33.3
l2,1-SDA4.83.8
Proposed method2.91.4
Note: The bold values denote the best experimental results compared with other methods.
Note: PCA(50), PCA(100), and PCA(150) indicate that PCA used 50, 100, and 150 transform axes for feature extraction, respectively. LDA(39) means that the LDA used 39 transform axes for feature extraction.

Table 2 presents the recognition accuracies of PCA,41 LDA,42 SRC,1 TPTSR,19 CFFR,21 and the proposed method, and the error margin for both methods (mean and standard deviations) is given in Table 2. For all methods, the average CPU time consumed for training and testing is also given in Table 2. Likewise, the experimental results indicate that the proposed method is the most effective one among the facial feature extraction approaches.

Table 2

Comparison results (%) between the different algorithms on the AR face image database.

Results/methodsPCA(100)LDA(119)SRCTPTSRCCFRProposed method
Accuracy51.27±3.6355.12±3.2964.29±2.8866.48±2.6465.78±2.5268.25±2.40
CPU time (s)38.740.892.695.797.1122.8
Note: The bold value denote the best experimental results compared with other methods.
Note: PCA(100) indicates that PCA used 100 transform axes for feature extraction. LDA(119) means that the LDA used 119 transform axes for feature extraction.

The comparison of classification error rates between the proposed algorithm and PCA,41 LDA,42 SRC,1 TPTSR,19 CFFR,21 and the proposed methods used on the FERET database are also summarized in Table 3. As shown in Table 3, the proposed algorithm performs better than the others.

Table 3

Comparison results (%) between the different algorithms on FERET face image database.

PCA(100)LDA(119)SRCTPTSRCFFRProposed method
56.861.354.664.767.169.2
Note: The bold value denote the best experimental results compared with other methods.
Note: PCA(100) indicates that PCA used 100 transform axes for feature extraction. LDA(119) means that the LDA used 119 transform axes for feature extraction.

As described in the above experiments, Tables 1 to 3 show the respective classification results of different methods. These figures clearly demonstrate that the proposed method is the most effective one among the traditional facial feature extraction approaches. However, it is worth stressing that the proposed method needs more CPU time for the whole process (re-estimation processes) because it costs more computation by producing the sparseness in a supervised way, which alternates between a sparse representation and a process of updating the untraditional training dictionary to better fit the test data.

Moreover, a sensitivity (robust) face-recognition system should deal well with the case where the samples of the same class (subject) are very different due to some factors such as illumination, variable poses, variable expressions, and disguises, especially performed in cases with insufficient samples. Therefore, if we observe the problem from the performance perspective of facial recognition, the high recognition rate of the proposed method could also be regarded as a sensitivity (robust) capability. Therefore, we design an experiment to verify the sensitivity of our proposed method as follows. The AR database is divided into three subsets, which include ordinary subset (14 samples per subject), sunglasses disguise subset (6 samples per subject), and scarf disguise subset (6 samples per subject). We choose all the images from sunglasses disguise subset as the training samples, and the test images belong to the ordinary subset without sunglasses and scarf. Thus, in this experiment, the difference between the test and training samples is very large due to the disguise. Actually, these two types of the samples are derived from the different signal domains. Table 4 indicates the recognition rates of the proposed method versus elimination percentage of redundant samples in different signal domains.

Table 4

Recognition rates of progressive sparse representation-based classification (P-SRC) versus elimination percentage of redundant samples in different signal domains.

Elimination rate90%80%70%60%50%40%30%20%10%
Recognition rate0.9350.9230.9130.9070.9110.9100.9140.9150.914
Note: The bold value denote the best experimental results compared with other methods.

Meanwhile, Fig. 5 shows an example of the first 36 remaining images (all images from sunglasses disguise subset and 720 in total) ordered by their contribution degrees. It shows that the first few remaining samples with higher contribution degrees are all from the same class of the test sample, which leads to correct classification result.

Fig. 5

An example of the first 36 remaining images (720 in total) ordered by their contribution degrees. The test image is the first sample of first subject. In this experiment, we selected all images from the sunglasses disguise subset (six images per person) as the training samples. It shows that the first few remaining samples with higher contribution degrees are all from the same class (within-class) of the test sample, which leads to correct classification result.

JEI_24_5_053010_f005.png

Similarly, we made another experiment on the FERET to show the selection of the best features for all postures of a person by P-SRC. In this experiment, the former two images per individual were chosen for training, and the rest were used for test. Thus, a training set of 400 images and a test set with 1000 images are generated. Figure 6 illustrates an example of the first 20 remaining images (400 in total) ordered by their contribution degrees. It shows that the first remaining samples with highest contribution degree are from the same class (within-class) of the test sample, which also leads to correct classification result.

Fig. 6

An example of the first 20 remaining images (400 in total) ordered by their contribution degrees. The test image is third sample from the first subject. In this experiment, the former two images per individual were chosen for training. It shows that the first remaining samples with highest contribution degree are from the same class (within-class) of the test sample, which also leads to correct classification result.

JEI_24_5_053010_f006.png

As described in Sec. 3.5, the contribution can be evaluated by the residual between yr and y, i.e., Dr=yyr22. The Dr can also be viewed as a discrimination measurement between the test sample and the i’th class. Here, we made an experiment on the ORL to show the discrimination of images of different persons by P-SRC. In this experiment, the former four images per individual were chosen for training and the rest were used for test. Thus, a training set of 160 images and a test set with 240 images are generated. For instance, we choose the fifth image of the first subject as the test sample. Table 5 indicates the discrimination measurement (range from 0 to 1) between this test sample and each subject. From Table 5, we can find that the smallest measurement result 0.42 occurs in the first subject (with same class to the test sample), and it should be noted that the value 1 that appears in Table 5 demonstrates the corresponding subjects (classes) have been completely discarded from the training sets.

Table 5

The discrimination measurement (range from 0 to 1) between the test sample and each subject on ORL.

Subject12345678910
Measurement0.420.931.001.001.001.001.001.001.001.00
Subject11121314151617181920
Measurement1.001.000.961.001.000.831.001.011.001.00
Subject21222324252627282930
Measurement1.001.001.001.001.001.001.001.001.001.00
Subject31323334353637383940
Measurement1.000.941.001.001.001.001.001.001.000.95
Note: The bold value denote the best experimental results compared with other methods.

4.2.

Handwritten Digit Recognition

In this experiment, the handwritten digit recognition on the widely used USPS database,45,46 which has 7291 training and 2007 test images, is performed. We artificially augmented the training set by shifting the digit images by 1 pixel in every direction. The proposed method is compared with COPAR, JDL, and the handwritten digit recognition methods reported in Refs. 4748.49. These methods include the state-of-the-art reconstructive DL methods with linear and bilinear classifier models, which is denoted by REC-L and REC-BL in Ref. 48, the state-of-the-art supervised DL methods with generative training and discriminative training, which is denoted by SDL-G and SDL-D in Ref. 48, the state-of-the-art methods of sparse representation for signal classification, which is denoted by SRSC in Ref. 47 and DLSI in Ref. 49. In addition, the results of some problem-specific methods (i.e., the standard Euclidean KNN and SVM with a Gaussian kernel) reported in Ref. 49 are also listed. Here, the number of atoms in each subdictionary of P-SRC is set to 200. Figure 7 illustrates the part of learned dictionary atoms of digits 5 and 6 and Table 6 indicates the recognition error rates of the proposed P-SRC and the other methods. We can find that the P-SRC outperforms all the competing methods. Meanwhile, it should be noted that the SVM classifier performs classification with a one-versus-all strategy. In comparison, P-SRC interprets a progressive dictionary update rule under a local DCT evaluation, and its classifier in design mode can also be clearly interpreted.

Table 6

Handwritten digit error rates (%) of competing methods on the USPS dataset.

MethodError rate
SRC6.05
REC-L(BL)6.83 (4.38)
SDL-G(D)6.67 (3.54)
DLSI3.98
KNN5.20
SVM4.20
COPAR3.61
JDL6.08
Proposed method2.85
Note: The bold value denote the best experimental results compared with other methods.

Fig. 7

The part of learned dictionary atoms of digits 5 and 6 by P-SRC.

JEI_24_5_053010_f007.png

5.

Conclusion

This paper developed a progressive sparse representation-based classification (P-SRC) algorithm. The P-SRC aims to exploit an optimal representation of training samples from the classes with major relevant contributions, by which the representation ability of each training sample is exploited to determine some optimal “nearest neighbors” for the test sample. It is noted that the transformed DCT evaluation measures the similarity between the test sample and each local training sample by using cosine distance metrics in the DCT domain. Future work is required to enable such a trend; among the many possible untraditional dictionary research directions, we should notice two: (1) exploration of the connection between the chosen untraditional dictionary update rule in the SRC and the method used later in the application, and (2) a study of the effect of introducing weights to the untraditional training dictionary, allowing them to get varying fuzzy degrees of popularity.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive advice. This work was supported by the National Science Foundation of China (Grant Nos. 61373055 and 61471182), the Natural Science Foundation of Jiangsu Province (Grant Nos. BK2012700 and BK20130473), the Foundation of Artificial Intelligence Key Laboratory of Sichuan Province (Grant No. 2012RZY02), the Open Project Program of the State Key Laboratory of CAD&CG of Zhejiang University (Grant No. A1418), the Foundation of Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University.

References

1. 

J. Wright et al., “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., 31 (2), 210 –227 (2009). http://dx.doi.org/10.1109/TPAMI.2008.79 ITPIDJ 0162-8828 Google Scholar

2. 

J. Wright et al., “Sparse representation for computer vision and pattern recognition,” Proc. IEEE, 98 ((6)), 1031 –1044 (2010). http://dx.doi.org/10.1109/JPROC.2010.2044470 Google Scholar

3. 

E. Cande`s and T. Tao, “Near-optimal signal recovery from random projections: universal encoding strategies?,” IEEE Trans. Inf. Theory, 52 (12), 5406 –5425 (2006). http://dx.doi.org/10.1109/TIT.2006.885507 IETTAW 0018-9448 Google Scholar

4. 

E. Cande`s, “Compressive sampling,” in Proc. Int. Congress of Mathematicians, 1433 –1452 (2006). Google Scholar

5. 

W. Zhang, Z. C. Lin and X.O. Tang, “Tensor linear Laplacian discrimination (TLLD) for feature extraction,” Pattern Recognit., 42 1941 –1948 (2009). http://dx.doi.org/10.1016/j.patcog.2009.01.010 PTNRA8 0031-3203 Google Scholar

6. 

D. Zhao et al., “Linear Laplacian discrimination for feature extraction,” in Proc. Int. Conf. on Computer Vision and Pattern Recognition, 1 –7 (2007). Google Scholar

7. 

Z. P. Zhang et al., “Hierarchical facial landmark localization via cascaded random binary patterns,” Pattern Recognit., 48 1277 –1288 (2015). http://dx.doi.org/10.1016/j.patcog.2014.09.007 PTNRA8 0031-3203 Google Scholar

8. 

Y. Sun, X. G. Wang and X. O. Tang, “Deeply learned face representations are sparse, selective, and robust,” (2014). Google Scholar

9. 

X. Gao et al., “Local face sketch synthesis learning,” Neurocomputing, 71 (10–12), 1921 –1930 (2008). http://dx.doi.org/10.1016/j.neucom.2007.10.025 NRCGEO 0925-2312 Google Scholar

10. 

C. Deng et al., “Invariant image watermarking based on local feature regions,” in Proc. Intl. Conf. on Cyberworlds 2008 (CW2008), 6 –10 (2008). Google Scholar

11. 

T. Zhang, D. Tao and J. Yang, “Discriminative locality alignment,” in Proc. 10th European Conf. on Computer Vision (ECCV), 725 –738 (2008). Google Scholar

12. 

W. Bian and D. Tao, “Biased discriminant Euclidean embedding for content-based Image retrieval,” IEEE Trans. Image Process., 19 (2), 545 –554 (2010). http://dx.doi.org/10.1109/TIP.2009.2035223 IIPRE4 1057-7149 Google Scholar

13. 

T. Zhang et al., “Local coordinates alignment (LCA): a novel manifold learning approach,” Int. J. Pattern Recognit. Artif. Intell., 22 (4), 667 –690 (2008). http://dx.doi.org/10.1142/S0218001408006478 Google Scholar

14. 

V. Vural et al., “Using local dependencies within batches to improve large margin classifiers,” J. Mach. Learn. Res., 10 183 –206 (2009). Google Scholar

15. 

Y. Xu, G. Feng and Y. Zhao, “One improvement to two-dimensional locality preserving projection method for use with face recognition,” Neurocomputing, 73 245 –249 (2009). http://dx.doi.org/10.1016/j.neucom.2009.09.010 NRCGEO 0925-2312 Google Scholar

16. 

Z. Fan, Y. Xu and D. Zhang, “Local linear discriminant analysis framework using sample neighbors,” IEEE Trans. Neural Networks, 22 (7), 1119 –1132 (2011). http://dx.doi.org/10.1109/TNN.2011.2152852 ITNNEP 1045-9227 Google Scholar

17. 

M. Yang et al., “Relaxed collaborative representation for pattern classification,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2224 –2231 (2012). Google Scholar

18. 

L. Zhang et al., “Collaborative representation based classification for face recognition,” Google Scholar

19. 

Y. Xu et al., “A two-phase test sample sparse representation method for use with face recognition,” IEEE Trans. Circuits Syst. Video Technol., 21 (9), 1255 –1262 (2011). http://dx.doi.org/10.1109/TCSVT.2011.2138790 Google Scholar

20. 

Y. Xu, W. M. Zuo and Z. Z. Fan, “Supervised sparse representation method with a heuristic strategy and face recognition experiments,” Neurocomputing, 79 125 –131 (2012). http://dx.doi.org/10.1016/j.neucom.2011.10.013 NRCGEO 0925-2312 Google Scholar

21. 

Y. Xu et al., “Using the idea of the sparse representation to perform coarse-to-fine face recognition,” Inf. Sci., 238 138 –148 (2013). http://dx.doi.org/10.1016/j.ins.2013.02.051 Google Scholar

22. 

Y. Xu et al., “A sparse representation method of bimodal biometrics and palmprint recognition experiments,” Neurocomputing, 103 164 –171 (2013). http://dx.doi.org/10.1016/j.neucom.2012.08.038 NRCGEO 0925-2312 Google Scholar

23. 

W. K. Yang et al., “Image classification using kernel collaborative representation with regularized least square,” Appl. Math. Comput., 222 13 –28 (2013). http://dx.doi.org/10.1016/j.amc.2013.07.024 AMHCBQ 0096-3003 Google Scholar

24. 

W. Yang, Z. Wang and C. Sun, “A collaborative representation based projections method for feature extraction,” Pattern Recognit., 48 (1), 20 –27 (2015). http://dx.doi.org/10.1016/j.patcog.2014.07.009 PTNRA8 0031-3203 Google Scholar

25. 

Z. H. Liu et al., “A novel classification method for palmprint recognition based on reconstruction error and normalized distance,” Appl. Intell., 39 307 –314 (2013). http://dx.doi.org/10.1007/s10489-012-0414-4 APITE4 0924-669X Google Scholar

26. 

H. Khorrami and M. Moavenian, “A comparative study of DWT, CWT and DCT transformations in ECG arrhythmias classification,” Expert Syst. Appl., 37 5751 –5757 (2010). http://dx.doi.org/10.1016/j.eswa.2010.02.033 ESAPEH 0957-4174 Google Scholar

27. 

J. J. Wu et al., “Cosine interesting pattern discovery,” Inf. Sci., 184 176 –195 (2012). http://dx.doi.org/10.1016/j.ins.2011.09.006 Google Scholar

28. 

J. Ye, “Cosine similarity measures for intuitionistic fuzzy sets and their applications,” Math. Comput. Modell., 53 91 –97 (2011). http://dx.doi.org/10.1016/j.mcm.2010.07.022 MCMOEG 0895-7177 Google Scholar

29. 

G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill Book Company, New York (1983). Google Scholar

30. 

T. C Song and H. L. Li, “Local polar DCT features for image description,” IEEE Signal Process. Lett., 20 (1), 59 –62 (2013). http://dx.doi.org/10.1109/LSP.2012.2229273 Google Scholar

31. 

X. N. Song et al., “A new sparse representation-based classification algorithm using iterative class elimination,” Neural Comput. Appl., 24 1627 –1637 (2014). http://dx.doi.org/10.1007/s00521-013-1399-6 Google Scholar

32. 

M. Aharon, M. Elad and A. Bruckstein, “K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., 54 (11), 4311 –4322 (2006). http://dx.doi.org/10.1109/TSP.2006.881199 ITPRED 1053-587X Google Scholar

33. 

X. N. Song et al., “A parameterized fuzzy adaptive K-SVD approach for the multi-classes study of pursuit algorithms,” Neurocomputing, 123 131 –139 (2014). http://dx.doi.org/10.1016/j.neucom.2013.06.017 NRCGEO 0925-2312 Google Scholar

34. 

V. Vapnik, Statistical Learning Theory, Wiley, New York (1998). Google Scholar

35. 

T. Mary-Huard, S. Robin and J.J. Daudin, “A penalized criterion for variable selection in classification,” J. Multivar. Anal., 98 695 –705 (2007). http://dx.doi.org/10.1016/j.jmva.2006.06.003 JMVAAI 0047-259X Google Scholar

36. 

T. Heap and F. Samaria, “Real-time hand tracking and gesture recognition using smart snakes,” in Proc. Interface to Human and Virtual Worlds, 50 (1995). Google Scholar

37. 

P. J. Phillips et al., “The FERET evaluation methodology for face-recognition algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., 22 (10), 1090 –1104 (2000). http://dx.doi.org/10.1109/34.879790 Google Scholar

38. 

A. M. Martinez, “The AR face database,” CVC Tech. Rep., 24 (1998). Google Scholar

39. 

L. L. Shen, L. Bai and M. Fairhurst, “Gabor wavelets and general discriminant analysis for face identification and verification,” Image Vis. Comput., 25 (5), 553 –563 (2007). http://dx.doi.org/10.1016/j.imavis.2006.05.002 IVCODK 0262-8856 Google Scholar

40. 

C. Gigliarano, S. Figini and P. Muliere, “Making classifier performance comparisons when ROC curves intersect,” Comput. Stat. Data Anal., 77 300 –312 (2014). http://dx.doi.org/10.1016/j.csda.2014.03.008 CSDADW 0167-9473 Google Scholar

41. 

J. Yang et al., “Two-dimensional PCA: a new approach to appearance-based face representation and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 26 (1), 131 –137 (2004). http://dx.doi.org/10.1109/TPAMI.2004.1261097 ITPIDJ 0162-8828 Google Scholar

42. 

X. N. Song et al., “A complete fuzzy discriminant analysis approach for face recognition,” Appl. Soft Comput., 10 208 –214 (2010). http://dx.doi.org/10.1016/j.asoc.2009.07.002 Google Scholar

43. 

W. H. Deng, J. N. Hu and J. Guo, “Extended SRC: undersampled face recognition via intraclass variant dictionary,” IEEE Trans. Pattern Anal. Mach. Intell., 34 (9), 1864 –1870 (2012). http://dx.doi.org/10.1109/TPAMI.2012.30 ITPIDJ 0162-8828 Google Scholar

44. 

X. Shi et al., “Face recognition by sparse discriminant analysis via joint L2, 1-norm minimization,” Pattern Recognit., 47 (7), 2447 –2453 (2014). http://dx.doi.org/10.1016/j.patcog.2014.01.007 PTNRA8 0031-3203 Google Scholar

45. 

J. J. Hull, “A database for handwritten text recognition research,” IEEE Trans. Pattern Anal. Mach. Intell., 16 (5), 550 –554 (1994). http://dx.doi.org/10.1109/34.291440 ITPIDJ 0162-8828 Google Scholar

46. 

M. Yang et al., “Sparse representation based Fisher discrimination dictionary learning for image classification,” Int. J. Comput. Vis., 109 (3), 209 –232 (2014). http://dx.doi.org/10.1007/s11263-014-0722-8 IJCVEQ 0920-5691 Google Scholar

47. 

K. Huang and S. Aviyente, “Sparse representation for signal classification,” in Proc. Neural Information and Processing Systems, 609 –616 (2006). Google Scholar

48. 

J. Mairal et al., “Supervised dictionary learning,” in Proc. Neural Information and Processing Systems, 1033 –1040 (2009). Google Scholar

49. 

I. Ramirez, P. Sprechmann and G. Sapiro, “Classification and clustering via dictionary learning with structured incoherence and shared features,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 3501 –3508 (2010). Google Scholar

Biography

Xiaoning Song received his PhD in pattern recognition and intelligence system at Nanjing University of Science and Technology, China, in 2010. He was a visiting researcher in the Centre for Vision, Speech, and Signal Processing (CVSSP), University of Surrey, United Kingdom from 2014 to 2015. Presently, he is an associate professor in Jiangnan University, Wuxi, China. His current research interests include pattern recognition, sparse representation, image recognition, and fuzzy systems.

Zhen-Hua Feng is currently a research fellow at the Centre for Vision, Speech and Signal Processing (CVSSP), faculty of engineering and physical sciences, University of Surrey, United Kingdom. He has published more than 10 papers in his fields of research. His current research interests include pattern recognition, sparse representation, automatic face alignment algorithms, including active shape models, active appearance models, and their extensions.

Guosheng Hu received his PhD in pattern recognition and intelligence system at Centre for Vision, Speech and Signal Processing (CVSSP), faculty of engineering and physical sciences, University of Surrey, United Kingdom, in 2015. His current research interests include pattern recognition, sparse representation, and three-dimensional morphable model.

Xibei Yang received his PhD in pattern recognition and intelligence system at Nanjing University of Science and Technology, China, in 2010. Presently, he is an associate professor in the School of Computer Science and Engineering, Jiangsu University of Science and Technology, China. His research interests include knowledge discovery, granular computing, and rough set theory.

Jingyu Yang is currently a distinguished professor in the Department of Computer Science at Nanjing University of Science and Technology. He is the author of more than 400 scientific papers in computer vision, pattern recognition, and artificial intelligence. He has won more than 20 provincial and national awards. His current research interests include pattern recognition, robot vision, image processing, data fusion, and artificial intelligence.

Yunsong Qi received his PhD in pattern recognition and intelligence system at Nanjing University of Science and Technology, China, in 2011. Currently, he is a professor in the School of Computer Science and Engineering, Jiangsu University of Science and Technology, China. His research interests include data mining and machine learning.

© 2015 SPIE and IS&T 1017-9909/2015/$25.00 © 2015 SPIE and IS&T
Xiaoning Song, Zhen-Hua Feng, Guosheng Hu, Xibei Yang, Jing-yu Yang, and Yunsong Qi "Progressive sparse representation-based classification using local discrete cosine transform evaluation for image recognition," Journal of Electronic Imaging 24(5), 053010 (21 September 2015). https://doi.org/10.1117/1.JEI.24.5.053010
Published: 21 September 2015
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Associative arrays

Image classification

Facial recognition systems

Feature extraction

Error analysis

Algorithm development


CHORUS Article. This article was made freely available starting 20 September 2016

RELATED CONTENT


Back to Top