## 1.

## Introduction

In the past few years, some nonlinear manifold learning algorithms have been proposed to discover the nonlinear structure of the manifold by investigating the local geometry of samples, such as isomap,^{1} locally linear embedding (LLE),^{2, 3} and Laplacian eigenmap.^{4} These methods are defined only on training data, and the issue of how to map new test data remains difficult. Therefore, they cannot be applied directly to recognition problems.

Recently, some manifold-based linear algorithms, like neighborhood preserving embedding (NPE),^{5} resolve the difficulty by finding a mapping on the whole data space, not just on training data. However, they have a common inherent limitation: they deemphasize discriminant information that is very important for the recognition task. In addition, the basis vectors of these methods are statistically correlated, and then the extracted features contain much redundancy. As a result, the overlapped information can distort the distribution of the features and even degrade the recognition performance.

In this work, a new feature extraction algorithm, called kernel uncorrelated neighborhood discriminative embedding (KUNDE) is proposed to address the problems mentioned earlier. On the one hand, the method explicitly considers both the within-class neighboring information and the between-class scatter information and emphasizes the discriminant information. On the other hand, the method obtains statistically uncorrelated features with minimum redundancy by putting a simple uncorrelated constraint on the computation of the basis vectors. Mapping the input data to some high-dimensional feature space using the kernel technique, nonlinear features are extracted.

## 2.

## Kernel Uncorrelated Neighborhood Discriminative Embedding

Given a data set $\mathit{X}=[{\mathit{x}}_{1},\dots ,{\mathit{x}}_{\mathit{N}}]$ in ${\mathit{R}}^{D}$ , suppose that each data point ${\mathit{x}}_{i}$ belongs to one of the $C$ classes ${\omega}_{1},\dots ,{\omega}_{C}$ , and each class contains ${n}_{c}$ $(c=1,\dots ,C)$ samples. The data is then mapped into an implicit high-dimensional feature space $\mathit{F}$ by a nonlinear mapping function $\varphi :x\u220a{\mathit{R}}^{D}\to \varphi \left(\mathit{x}\right)\u220a\mathit{F}$ . The problem is to find a transformation matrix $\mathit{V}$ that maps these points to be new points $\mathit{Y}=[{\mathit{y}}_{1},\dots ,{\mathit{y}}_{\mathit{N}}]={\mathit{V}}^{T}\varphi \left(\mathit{X}\right)$ in ${\mathit{R}}^{d}(d\u2aa1D)$ , where ${\mathit{y}}_{i}={\mathit{V}}^{T}\varphi \left({\mathit{x}}_{i}\right)$ .

The algorithm procedure of KUNDE is stated here:

1. Construct the kernel matrix $\mathit{K}=\varphi \left(\mathit{X}{)}^{T}\varphi \right(\mathit{X})$ whose elements are ${K}_{ij}=k({\mathit{x}}_{i},{\mathit{x}}_{j})$ , where $k$ is a kernel function that satisfies $k({\mathit{x}}_{i},{\mathit{x}}_{j})=\left[\varphi \right({\mathit{x}}_{i})\cdot \varphi ({\mathit{x}}_{j}\left)\right]$ .

2. Compute the weight matrix $\mathit{W}$ by minimizing the reconstruction error $E\left(\mathit{W}\right)={\sum}_{i}\Vert \varphi \left({\mathit{x}}_{i}\right)-{\sum}_{j}{W}_{ij}\varphi \left({\mathit{x}}_{j}\right){\Vert}^{2}$ , where ${\sum}_{j}{W}_{ij}=1$ , and ${W}_{ij}\ne 0$ if $\varphi \left({\mathit{x}}_{j}\right)$ is one of the $n$ identical-label nearest neighbors of $\varphi \left({\mathit{x}}_{i}\right)$ ; otherwise, ${W}_{ij}=0$ . An efficient way to minimize the error can refer to Ref. 3.

3. Compute matrices $\mathit{M}$ , $\mathit{L}$ , and $\mathit{G}$ as follows: $\mathit{M}=(\mathit{I}-\mathit{W})(\mathit{I}-\mathit{W}{)}^{T}$ , $\mathit{L}=\mathit{I}-\mathit{E}$ , and $\mathit{G}=\mathit{I}-(1\u2215N){\mathit{ee}}^{T}$ , where $\mathit{I}$ is an identity matrix, ${E}_{ij}=1\u2215{n}_{c}$ if ${\mathit{x}}_{i}$ and ${\mathit{x}}_{j}$ belong to the $c$ th class; otherwise, ${E}_{ij}=0$ , and $\mathit{e}=(1,\dots ,1{)}^{T}$ .

4. Solve the generalized eigenvalue problem $\mathit{K}(\mathit{M}+\mathit{L}){\mathit{Ka}}_{i}={\lambda}_{i}{\mathit{KGKa}}_{i}$ , with ${\lambda}_{1}<\cdots <{\lambda}_{d}$ , and constitute the matrix $\mathit{A}=[{\mathit{a}}_{1},\dots ,{\mathit{a}}_{d}]$ .

5. For any data point $\mathit{x}$ in ${\mathit{R}}^{D}$ , the embedded feature in ${\mathit{R}}^{d}$ is given by $\mathit{y}={\mathit{V}}^{T}\varphi \left(\mathit{x}\right)={\mathit{A}}^{T}\left[k\right({\mathit{x}}_{1},\mathit{x}),\dots ,k({\mathit{x}}_{N},\mathit{x}){]}^{T}$ .

## 3.

## Theoretical Justification

In this section, we provide the theoretical analysis of the KUNDE algorithm. As mentioned earlier, the objective of KUNDE is to preserve the within-class neighboring geometry while maximizing the between-class scatter in the low-dimensional space.

First, we characterize the within-class geometry of each data point in the feature space $\mathit{F}$ by linear coefficients that reconstruct the data point from its $n$ identical-label nearest neighbors. To preserve the within-class neighboring relations, the basic idea is that the same weights that reconstruct the point $\varphi \left({\mathit{x}}_{i}\right)$ in $\mathit{F}$ should also reconstruct its embedded counterpart ${\mathit{y}}_{i}$ in ${\mathit{R}}^{d}$ . Therefore, we should minimize the following cost function:

## 1

$${J}_{1}\left(\mathit{V}\right)={\sum}_{i}{\Vert {\mathit{y}}_{i}-{\sum}_{j}{\mathit{W}}_{ij}{\mathit{y}}_{j}\Vert}^{2}={\Vert \mathit{Y}(\mathit{I}-\mathit{W})\Vert}^{2}\text{,}=trace\left[\mathit{Y}(\mathit{I}-\mathit{W}){(\mathit{I}-\mathit{W})}^{T}{\mathit{Y}}^{T}\right]\text{,}=trace\left[{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{M}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}\right],$$Second, since the purpose of KUNDE is to solve classification problems, we should make the embedded vectors from different classes far from each other. Here, we propose to maximize the between-class scatter:

## 2

$${J}_{2}\left(\mathit{V}\right)=trace\left({\mathit{V}}^{T}{\mathit{S}}_{b}^{\varphi}\mathit{V}\right),$$## 3.

## 4

$${J}_{2}\left(\mathit{V}\right)=trace\left[{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{B}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}\right].$$## 5

$$J\left(\mathit{V}\right)=\frac{trace\left[{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{M}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}\right]}{trace\left[{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{B}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}\right]}.$$Now, we turn to the statistically uncorrelated constraint. Assuming that any two different components ${y}_{i}$ and ${y}_{j}$ $(j\ne i)$ of the extracted feature $\mathit{y}={\mathit{V}}^{T}\mathit{x}$ are uncorrelated, this means that:

## 6

$$E\left\{[{y}_{i}-E\left({y}_{i}\right)][{y}_{j}-E\left({y}_{j}\right)]\right\}={\mathit{v}}_{i}^{T}{\mathit{S}}_{t}^{\varphi}{\mathit{v}}_{j}=0,$$As a result, KUNDE can be formulated as the following constrained minimization problem:

## 9

$$\underset{{\mathit{V}}^{T}{\mathit{S}}_{t}^{\varphi}\mathit{V}=\mathit{I}}{\mathrm{min}}\frac{trace\left[{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{M}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}\right]}{trace\left[{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{B}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}\right]}=\underset{{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{G}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}=\mathit{I}}{\mathrm{min}}\frac{trace\left[{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{M}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}\right]}{trace[\mathit{I}-{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{L}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}]}.$$## 10

$$\underset{{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)\mathit{G}\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}=\mathit{I}}{\mathrm{min}}trace\left[{\mathit{V}}^{T}\varphi \left(\mathit{X}\right)(\mathit{M}+\mathit{L})\varphi {\left(\mathit{X}\right)}^{T}\mathit{V}\right].$$## 11

$$\underset{{\mathit{A}}^{T}\mathit{KGKA}=\mathit{I}}{\mathrm{min}}trace\left[{\mathit{A}}^{T}\mathit{K}(\mathit{M}+\mathit{L})\mathit{KA}\right],$$Last, the constrained minimization problem is reduced to a generalized eigenvalue problem, as follows:

The matrix $\mathit{A}$ is determined by the $d$ eigenvectors corresponding to the first $d$ smallest eigenvalues of Eq. 12. Once $\mathit{A}$ is obtained, for any data point $\mathit{x}$ in the input space, the nonlinear feature is given as $\mathit{y}={\mathit{V}}^{T}\varphi \left(\mathit{x}\right)={\mathit{A}}^{T}\left[k\right({\mathit{x}}_{1},\mathit{x}),\dots ,k({\mathit{x}}_{N},\mathit{x}){]}^{T}$ .## 4.

## Experimental Results

In this section, experiments are performed on radar target recognition with measured and simulated range profiles, respectively. The performance of the KUNDE algorithm is compared with those of kernel NPE (KNPE),^{5} KPCA,^{7} and KFDA.^{8} The Gaussian kernel
$k({\mathit{x}}_{i},{\mathit{x}}_{j})=\mathrm{exp}(-\Vert {\mathit{x}}_{i}-{\mathit{x}}_{j}{\Vert}^{2}\u2215{\sigma}^{2})$
is adopted, and the parameter
$\sigma $
is simply set to 1. Since we focus only on feature extraction, as for classification, the nearest-neighbor classifier using a Euclidean metric is employed for the sake of simplicity.

## 4.1.

### Experiments on Measured Data

The measured data are from three flying airplanes, including An-26, Yark-42, and Cessna Citation S/II. Each airplane has 260 range profiles. In the experiments, each profile is preprocessed by energy normalization. Then, for each airplane, one third of all profiles are used for training and the rest for testing. Figure 1 shows the plots of recognition rates obtained by each method versus the reduced dimensionality. Note that the dimensionality of KFDA is at most $c-1$ , where $c$ is the number of classes.

From Fig. 1, it can be seen that KUNDE achieves the best recognition results at each feature dimensionality and KFDA performs second best, while KNPE and KPCA are relatively poor since neither of them considers discrimination information. This indicates that KUNDE has more discriminative power than the other three methods by incorporating the within-class neighboring information and the between-class scatter information and that the statistically uncorrelated features are very helpful for improving the recognition performance.

## 4.2.

### Experiments on Simulated Data

The simulated profiles are from six airplanes: Mirage, IDF, F16, J8II, SU27, and E2C, and each airplane has 60 profiles. Similarly, each profile is normalized to unit energy, and 20 profiles per target are used for training and the rest for testing. The recognition results are shown in Fig. 2. The results suggest that KUNDE outperforms again all the other methods. In addition, it can be seen that as the feature dimensionality increases, KUNDE has a higher recognition rate and KPCA and KNPE can also give satisfactory results.

## 5.

## Conclusions

A novel algorithm called kernel uncorrelated neighborhood discriminative embedding (KUNDE) is presented for pattern recognition. KUNDE has two prominent characteristics. First, it integrates the within-class neighboring information and the between-class scatter information, which enables its powerful discriminative ability. Second, it extracts statistically uncorrelated features with minimum redundancy by introducing an uncorrelated constraint, which is helpful to improve recognition performance. Experimental results on radar target recognition show that KUNDE achieves better recognition performance than all the other involved methods.

## Acknowledgments

The authors would like to thank the anonymous reviewer and editors for their helpful comments and suggestions. This work is partially supported by the Key Project of the Chinese Ministry of Education (Grant No. 105150) and the foundation of the ATR Key Lab (Grant No. 51483010305DZ0207).