1 December 2007 Kernel uncorrelated neighborhood discriminative embedding for feature extraction
Author Affiliations +
Optical Engineering, 46(12), 120502 (2007). doi:10.1117/1.2821866
Feature extraction is a crucial step for pattern recognition. Recently, some manifold learning algorithms have drawn much attention. Although their properties of locality preserving are fairly significant, most manifold-based algorithms have limits to solve classification problems. First, they do not have good discriminant ability. Second, they fail to remove the redundancy among the extracted features. We present a new feature extraction method, called kernel uncorrelated neighborhood discriminative embedding (KUNDE), which integrates two abilities of manifold learning and pattern classification. The purpose of KUNDE is to preserve the within-class neighboring geometry while maximizing the between-class scatter. Optimizing an objective function in a kernel feature space, nonlinear features are extracted. Moreover, by putting a simple uncorrelated constraint on the computation of the basis vectors, the extracted features via KUNDE are statistically uncorrelated and thus contain minimum redundancy. Experimental results on radar target recognition indicate the promising performance of the proposed method.
Yu and Wang: Kernel uncorrelated neighborhood discriminative embedding for feature extraction



In the past few years, some nonlinear manifold learning algorithms have been proposed to discover the nonlinear structure of the manifold by investigating the local geometry of samples, such as isomap,1 locally linear embedding (LLE),2, 3 and Laplacian eigenmap.4 These methods are defined only on training data, and the issue of how to map new test data remains difficult. Therefore, they cannot be applied directly to recognition problems.

Recently, some manifold-based linear algorithms, like neighborhood preserving embedding (NPE),5 resolve the difficulty by finding a mapping on the whole data space, not just on training data. However, they have a common inherent limitation: they deemphasize discriminant information that is very important for the recognition task. In addition, the basis vectors of these methods are statistically correlated, and then the extracted features contain much redundancy. As a result, the overlapped information can distort the distribution of the features and even degrade the recognition performance.

In this work, a new feature extraction algorithm, called kernel uncorrelated neighborhood discriminative embedding (KUNDE) is proposed to address the problems mentioned earlier. On the one hand, the method explicitly considers both the within-class neighboring information and the between-class scatter information and emphasizes the discriminant information. On the other hand, the method obtains statistically uncorrelated features with minimum redundancy by putting a simple uncorrelated constraint on the computation of the basis vectors. Mapping the input data to some high-dimensional feature space using the kernel technique, nonlinear features are extracted.


Kernel Uncorrelated Neighborhood Discriminative Embedding

Given a data set X=[x1,,xN] in RD , suppose that each data point xi belongs to one of the C classes ω1,,ωC , and each class contains nc (c=1,,C) samples. The data is then mapped into an implicit high-dimensional feature space F by a nonlinear mapping function ϕ:xRDϕ(x)F . The problem is to find a transformation matrix V that maps these points to be new points Y=[y1,,yN]=VTϕ(X) in Rd(dD) , where yi=VTϕ(xi) .

The algorithm procedure of KUNDE is stated here:

  • 1. Construct the kernel matrix K=ϕ(X)Tϕ(X) whose elements are Kij=k(xi,xj) , where k is a kernel function that satisfies k(xi,xj)=[ϕ(xi)ϕ(xj)] .

  • 2. Compute the weight matrix W by minimizing the reconstruction error E(W)=iϕ(xi)jWijϕ(xj)2 , where jWij=1 , and Wij0 if ϕ(xj) is one of the n identical-label nearest neighbors of ϕ(xi) ; otherwise, Wij=0 . An efficient way to minimize the error can refer to Ref. 3.

  • 3. Compute matrices M , L , and G as follows: M=(IW)(IW)T , L=IE , and G=I(1N)eeT , where I is an identity matrix, Eij=1nc if xi and xj belong to the c th class; otherwise, Eij=0 , and e=(1,,1)T .

  • 4. Solve the generalized eigenvalue problem K(M+L)Kai=λiKGKai , with λ1<<λd , and constitute the matrix A=[a1,,ad] .

  • 5. For any data point x in RD , the embedded feature in Rd is given by y=VTϕ(x)=AT[k(x1,x),,k(xN,x)]T .


Theoretical Justification

In this section, we provide the theoretical analysis of the KUNDE algorithm. As mentioned earlier, the objective of KUNDE is to preserve the within-class neighboring geometry while maximizing the between-class scatter in the low-dimensional space.

First, we characterize the within-class geometry of each data point in the feature space F by linear coefficients that reconstruct the data point from its n identical-label nearest neighbors. To preserve the within-class neighboring relations, the basic idea is that the same weights that reconstruct the point ϕ(xi) in F should also reconstruct its embedded counterpart yi in Rd . Therefore, we should minimize the following cost function:


where M=(IW)(IW)T .

Second, since the purpose of KUNDE is to solve classification problems, we should make the embedded vectors from different classes far from each other. Here, we propose to maximize the between-class scatter:


where Sbϕ is the between-class scatter matrix in F . According to Ref. 6, the total scatter matrix Stϕ , within-class scatter matrix Swϕ , and between-class scatter matrix Sbϕ can be represented as follows:


where G=I(1N)eeT , L=IE , and B=GL . Thus, Eq. 2 can be rewritten as:


Combining Eqs. 1, 4, we should minimizing the following objective function:



Now, we turn to the statistically uncorrelated constraint. Assuming that any two different components yi and yj (ji) of the extracted feature y=VTx are uncorrelated, this means that:


where vi and vj are two different columns of the matrix V . Besides, vi should be normalized. Without loss of generalization, let vi satisfy:


Then, from Eqs. 6, 7, we get:



As a result, KUNDE can be formulated as the following constrained minimization problem:


Further, Eq. 9 is equivalent to:


Since each column of V should lie in the span of all training samples in F , there exist coefficients αj(j=1,,N) such that v=j=1Nαjϕ(xj)=ϕ(X)a , where a=[α1,,αN]T . Therefore, Eq. 10 becomes:


where K is the kernel matrix, which is defined in Sec. 2.

Last, the constrained minimization problem is reduced to a generalized eigenvalue problem, as follows:


The matrix A is determined by the d eigenvectors corresponding to the first d smallest eigenvalues of Eq. 12. Once A is obtained, for any data point x in the input space, the nonlinear feature is given as y=VTϕ(x)=AT[k(x1,x),,k(xN,x)]T .


Experimental Results

In this section, experiments are performed on radar target recognition with measured and simulated range profiles, respectively. The performance of the KUNDE algorithm is compared with those of kernel NPE (KNPE),5 KPCA,7 and KFDA.8 The Gaussian kernel k(xi,xj)=exp(xixj2σ2) is adopted, and the parameter σ is simply set to 1. Since we focus only on feature extraction, as for classification, the nearest-neighbor classifier using a Euclidean metric is employed for the sake of simplicity.


Experiments on Measured Data

The measured data are from three flying airplanes, including An-26, Yark-42, and Cessna Citation S/II. Each airplane has 260 range profiles. In the experiments, each profile is preprocessed by energy normalization. Then, for each airplane, one third of all profiles are used for training and the rest for testing. Figure 1 shows the plots of recognition rates obtained by each method versus the reduced dimensionality. Note that the dimensionality of KFDA is at most c1 , where c is the number of classes.

Fig. 1

Recognition rate versus reduced dimensionality on measured range profiles.


From Fig. 1, it can be seen that KUNDE achieves the best recognition results at each feature dimensionality and KFDA performs second best, while KNPE and KPCA are relatively poor since neither of them considers discrimination information. This indicates that KUNDE has more discriminative power than the other three methods by incorporating the within-class neighboring information and the between-class scatter information and that the statistically uncorrelated features are very helpful for improving the recognition performance.


Experiments on Simulated Data

The simulated profiles are from six airplanes: Mirage, IDF, F16, J8II, SU27, and E2C, and each airplane has 60 profiles. Similarly, each profile is normalized to unit energy, and 20 profiles per target are used for training and the rest for testing. The recognition results are shown in Fig. 2. The results suggest that KUNDE outperforms again all the other methods. In addition, it can be seen that as the feature dimensionality increases, KUNDE has a higher recognition rate and KPCA and KNPE can also give satisfactory results.

Fig. 2

Recognition rate versus reduced dimensionality on simulated range profiles.




A novel algorithm called kernel uncorrelated neighborhood discriminative embedding (KUNDE) is presented for pattern recognition. KUNDE has two prominent characteristics. First, it integrates the within-class neighboring information and the between-class scatter information, which enables its powerful discriminative ability. Second, it extracts statistically uncorrelated features with minimum redundancy by introducing an uncorrelated constraint, which is helpful to improve recognition performance. Experimental results on radar target recognition show that KUNDE achieves better recognition performance than all the other involved methods.


The authors would like to thank the anonymous reviewer and editors for their helpful comments and suggestions. This work is partially supported by the Key Project of the Chinese Ministry of Education (Grant No. 105150) and the foundation of the ATR Key Lab (Grant No. 51483010305DZ0207).


1.  J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science0036-8075 10.1126/science.290.5500.2319 290(5500), 2319–2323 (2000). Google Scholar

2.  S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science0036-8075 10.1126/science.290.5500.2323 290(5500), 2323–2326 (2000). Google Scholar

3.  L. K. Saul and S. T. Roweis, “Think globally, fit locally: unsupervised learning of low dimensional manifolds,” J. Mach. Learn. Res.1532-4435 4, 119–155 (2003). Google Scholar

4.  M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput.0899-7667 5(6), 1373–1396 (2003). Google Scholar

5.  X. F. He, D. Cai, S. C. Yan, and H. J. Zhang, “Neighborhood preserving embedding,” in Proc. 10th Int. Conf. Computer Vision (2005). Google Scholar

6.  X. F. He, S. C. Yan, Y. X. Hu, P. Niyogi, and H. J. Zhang, “Face recognition using Laplacian faces,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 10.1109/TPAMI.2005.55 27(3), 328–340 (2005). Google Scholar

7.  B. Scholkopf, A. Smola, and K. R. Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Comput.0899-7667 10.1162/089976698300017467 10, 1299–1319 (1998). Google Scholar

8.  Q. S. Liu, R. Huang, H. Q. Lu, and S. D. Ma, “Face recognition using kernel-based fisher discriminant analysis,” in Proc. Int. Conf. Automatic Face and Gesture Recognition, Washington (2002). Google Scholar

Xuelian Yu, Xuegang Wang, "Kernel uncorrelated neighborhood discriminative embedding for feature extraction," Optical Engineering 46(12), 120502 (1 December 2007). https://doi.org/10.1117/1.2821866

Back to Top