## 1.

## Introduction

Feature extraction is one of the key steps of synthetic aperture radar automatic target recognition (SAR ATR), which can reduce the dimensions of SAR images and extract the effective discriminating feature.

Generally, feature extraction methods are placed into two categories: linear and nonlinear. Classical linear methods, such as principal component analysis (PCA),^{1} and linear discriminant analysis (LDA),^{2} are based on the global linear structure of data. The best recognition rates employing PCA and LDA are higher than 80%, as shown by experiments based on the Moving and Stationary Target Acquisition and Recognition (MSTAR) database.^{3}

With the development of the support vector machine,^{4} nonlinear feature extraction methods based on kernel tricks, such as kernel principal component analysis^{5} (KPCA) and kernel linear discriminant analysis (KLDA),^{6} have been widely applied in SAR ATR. By introducing the kernel function, those methods can solve the linearly inseparable problem of sample data to some extent. However, a main shortcoming of the kernel tricks is that the recognition performance depends on the selection of kernel settings.

Another novel nonlinear method, manifold learning,^{7} has been proposed on the premise that high-dimensional images lie on or near a low-dimensional manifold embedded in the high-dimensional space. For the purpose of seeking a low-dimensional manifold embedded in the high-dimensional data space, various manifold learning algorithms have been proposed, such as isometric feature mapping,^{8} locally linear embedding^{9} (LLE), Laplacian eigenmaps (LE),^{10} locality preserving projections (LPP),^{11} neighborhood preserving embedding (NPE),^{12} and orthogonal neighborhood preserving projections (ONPP).^{13}

In Cai et al.,^{14} the LPP algorithm was introduced into inverse synthetic aperture radar target recognition, and the classification results were better than those obtained from PCA and LDA. However, LPP ignored the class information of the samples and discarded the target information from SAR images.^{15}

The main goals of the manifold learning algorithms above are to preserve localities or similar rankings, and those methods are more appropriate for retrieval or clustering, rather than classification. By integrating the neighborhood information and class relations of samples, some supervised manifold learning methods have been proposed, including local discriminant embedding^{16} (LDE). Bryant^{17} demonstrated the application of signature manifold methods on SAR images, which has achieved considerable detection and classification results. In Venkataraman et al.,^{18} capturing the inter-class and intra-class variability of target shapes, coupled view and identity manifolds for shape representation was applied to target tracking and recognition. This method produced an effective classification performance. Manifold learning algorithms such as LDE only structure the adjacent graphs using samples in their neighborhoods; they ignore the spatial relationships between neighborhoods, which will restrict the classification performance.

To solve the aforementioned problems, a new feature extraction method, neighborhood virtual points discriminant embedding (NVPDE), is proposed. By introducing the virtual point in every sample’s neighborhood, relations between the samples in the neighborhood taken into account, and the spatial relationships of the neighborhoods are established. When embedded into the low-dimensional feature space, the neighborhood virtual points with the same class label, as well as every sample and its neighborhood virtual point, get together, whereas the neighborhood virtual points from different classes separate from each other. In this way, the recognition performance can be improved.

This paper is organized as follows. Section 2 details the proposed algorithm framework: samples gathered in the neighborhood, neighborhood virtual point discriminant, and objective function are detailed in Secs. 2.1 to 2.3, respectively. Section 3 shows experimental results, and Sec. 4 concludes this paper.

## 2.

## Neighborhood Virtual Points Discriminant Embedding

Let $M$ be a manifold embedded in ${R}^{n}$. The training dataset is $\{{\mathbf{x}}_{i}\in {R}^{n},i=1,2,\cdots ,N\}\in M$, and corresponding data class labels are $\{{y}_{i}\in [1,2,\cdots ,c],i=1,2,\cdots ,N\}$, where $N$ denotes the amount of training data, and $c$ denotes the class number of training data. Any subset of data points that belong to the same class is assumed to lie on a submanifold of $M$. In NVPDE, the virtual points are introduced into the samples’ within-class neighborhoods, and then an embedding based on linear projection is constructed: ${\mathbf{x}}_{i}\in {R}^{n}\mapsto {\mathbf{z}}_{i}={\mathbf{V}}^{T}{\mathbf{x}}_{i}\in {R}^{l}$, $(l\ll n)$. Via embedding, each sample ${\mathbf{x}}_{i}$ in the low-dimensional space moves toward its neighborhood virtual point ${\mathbf{p}}_{i}$, while the virtual points with the same class label get together, and the virtual points from different classes separate from each other.

## 2.1.

### Samples Gathered in Neighborhood

We calculate the within-class neighborhood ${N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$ for each sample ${\mathbf{x}}_{i}$ and select the neighborhood virtual point $\{{\mathbf{p}}_{i}\in \phantom{\rule{0ex}{0ex}}{R}^{n},i=1,2,\cdots ,N\}$ from each neighborhood ${N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$. Here, ${N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$ indicates the set of the ${k}_{1}$ nearest neighbors of the sample ${\mathbf{x}}_{i}$ in the same class, ${\mathbf{p}}_{i}=\phantom{\rule{0ex}{0ex}}\mathrm{\Phi}({\mathbf{x}}_{i},{\mathbf{x}}_{{a}_{1}},{\mathbf{x}}_{{a}_{2}},\cdots ,{\mathbf{x}}_{{a}_{{k}_{1}}})$, ${\mathbf{x}}_{{a}_{1}},{\mathbf{x}}_{{a}_{2}},\cdots ,{\mathbf{x}}_{{a}_{{k}_{1}}}\in {N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$, and $\mathrm{\Phi}(\xb7)$ is the virtual points selecting function. Here, we select the geometric center of the neighborhood ${N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$ as the neighborhood virtual point: ${\mathbf{p}}_{i}=(1/{k}_{1}+1)({\mathbf{x}}_{i}+\sum _{j=1}^{{k}_{1}}{\mathbf{x}}_{{a}_{j}})$, $i=1,2,\cdots ,N$, ${\mathbf{x}}_{{a}_{j}}\in {N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$. The sample’s within-neighborhood objective function is defined as

## (1)

$${J}_{n}(\mathbf{V})=\sum _{i,j}{\Vert {\mathbf{z}}_{i}-{\mathbf{c}}_{j}\Vert}^{2}{w}_{ij}^{(n)}\phantom{\rule{0ex}{0ex}}=\sum _{i,j}{\Vert {\mathbf{V}}^{T}{\mathbf{x}}_{i}-{\mathbf{V}}^{T}{\mathbf{p}}_{j}\Vert}^{2}{w}_{ij}^{(n)},$$## (2)

$${w}_{ij}^{(n)}=\{\begin{array}{ll}\mathrm{exp}\{-{\Vert {\mathbf{x}}_{i}-{\mathbf{p}}_{j}\Vert}^{2}\},& \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathbf{x}}_{i}\in {N}_{{k}_{1}}^{+}({\mathbf{x}}_{j})\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{or}\phantom{\rule[-0.0ex]{1em}{0.0ex}}i=j\\ 0,& \text{otherwise}\end{array},\phantom{\rule{0ex}{0ex}}$$## (3)

$${\mathbf{c}}_{j}={\mathbf{V}}^{T}{\mathbf{p}}_{j}=\frac{1}{{k}_{1}+1}\left({\mathbf{V}}^{T}{\mathbf{x}}_{j}+\sum _{i=1}^{{k}_{1}}{\mathbf{V}}^{T}{\mathbf{x}}_{{a}_{i}}\right).$$Referring to Eq. (1), we can infer that

## (4)

$${J}_{n}(\mathbf{V})=\sum _{i,j}{\Vert {\mathbf{z}}_{i}-{\mathbf{c}}_{j}\Vert}^{2}{w}_{ij}^{(n)}\phantom{\rule{0ex}{0ex}}=\sum _{i,j}{\Vert {\mathbf{V}}^{T}{\mathbf{x}}_{i}-{\mathbf{V}}^{T}{\mathbf{p}}_{j}\Vert}^{2}{w}_{ij}^{(n)}\phantom{\rule{0ex}{0ex}}=\text{trace}[\sum _{i,j}{\mathbf{V}}^{T}({\mathbf{x}}_{i}-{\mathbf{p}}_{j}){w}_{ij}^{(n)}{({\mathbf{x}}_{i}-{\mathbf{p}}_{j})}^{T}\mathbf{V}]\phantom{\rule{0ex}{0ex}}=\text{trace}\left\{{\mathbf{V}}^{T}\right[\sum _{i,j}({\mathbf{x}}_{i}{w}_{ij}^{(n)}{\mathbf{x}}_{i}^{T}-{\mathbf{x}}_{i}{w}_{ij}^{(n)}{\mathbf{p}}_{j}^{T}-{\mathbf{p}}_{j}{w}_{ij}^{(n)}{\mathbf{x}}_{i}^{T}\phantom{\rule{0ex}{0ex}}+{\mathbf{p}}_{j}{w}_{ij}^{(n)}{\mathbf{p}}_{j}^{T})\left]\mathbf{V}\right\}\phantom{\rule{0ex}{0ex}}=\text{trace}[{\mathbf{V}}^{T}({\mathbf{X}\mathbf{W}}^{(n)}{\mathbf{X}}^{T}-{\mathbf{X}\mathbf{W}}^{(n)}{\mathbf{P}}^{T}-{\mathbf{P}\mathbf{W}}^{(n)}{\mathbf{X}}^{T}\phantom{\rule{0ex}{0ex}}+{\mathbf{P}\mathbf{W}}^{(n)}{\mathbf{P}}^{T})\mathbf{V}],$$## (5)

$${J}_{n}(\mathbf{V})=\text{trace}[{\mathbf{V}}^{T}({\mathbf{X}\mathbf{W}}^{(n)}{\mathbf{X}}^{T}-{\mathbf{X}\mathbf{W}}^{(n)}{\mathbf{H}}^{T}{\mathbf{X}}^{T}-{\mathbf{X}\mathbf{H}\mathbf{W}}^{(n)}{\mathbf{X}}^{T}\phantom{\rule{0ex}{0ex}}+{\mathbf{X}\mathbf{H}\mathbf{W}}^{(n)}{\mathbf{H}}^{T}{\mathbf{X}}^{T})\mathbf{V}]\phantom{\rule{0ex}{0ex}}=\text{trace}[{\mathbf{V}}^{T}\mathbf{X}({\mathbf{W}}^{(n)}-{\mathbf{W}}^{(n)}{\mathbf{H}}^{T}-{\text{H}\mathbf{W}}^{(n)}\phantom{\rule{0ex}{0ex}}+{\mathbf{H}\mathbf{W}}^{(n)}{\mathbf{H}}^{T}){\mathbf{X}}^{T}\mathbf{V}].$$## 2.2.

### Neighborhood Virtual Point Discriminant

Because each neighborhood virtual point ${\mathbf{p}}_{i}$ is a linear combination of the samples in the neighborhood ${N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$ and ${\mathbf{x}}_{i}$, the neighborhood virtual points are essentially high-dimensional image data, as well. Therefore, they should lie on or near the low-dimensional manifold $M$ embedded in the high-dimensional space. According to the samples’ class labels, the class labels corresponding to the neighborhood virtual points $\{{\mathbf{p}}_{i}\in {R}^{n},i=1,2,\cdots ,N\}$ are $\{{y}_{i}\in \phantom{\rule{0ex}{0ex}}[1,2,\cdots ,c],\hspace{0.17em}i=1,2,\cdots ,N\}$. Any subset of neighborhood virtual points that belong to the same class is assumed to lie on a submanifold of $M$. The within-class neighborhood virtual point objective function is defined as

## (7)

$${J}_{w}(\mathbf{V})=\frac{1}{2}\sum _{i,j}{\Vert {\mathbf{c}}_{i}-{\mathbf{c}}_{j}\Vert}^{2}{w}_{ij}^{(w)}\phantom{\rule{0ex}{0ex}}=\frac{1}{2}\sum _{i,j}{\Vert {\mathbf{V}}^{T}{\mathbf{p}}_{i}-{\mathbf{V}}^{T}{\mathbf{p}}_{j}\Vert}^{2}{w}_{ij}^{(w)},$$## (8)

$${w}_{ij}^{(w)}=\{\begin{array}{ll}\mathrm{exp}\{-{\Vert {\mathbf{p}}_{i}-{\mathbf{p}}_{j}\Vert}^{2}\},& \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathbf{p}}_{i}\in {N}_{{k}_{2}}^{+}({\mathbf{p}}_{j})\\ 0,& \text{otherwise}\end{array},$$${J}_{w}(\mathbf{V})$ demonstrates the spatial relationships between the neighborhood virtual points in the same class: The smaller the value of ${J}_{w}(\mathbf{V})$, the closer the neighborhood virtual points are to each other. The explanation of the process of neighborhood virtual point discriminant in the same class is shown in Fig. 2, which shows that the virtual points in the same class will get together.

The between-class neighborhood virtual point objective function is defined as

## (9)

$${J}_{b}(\mathbf{V})=\frac{1}{2}\sum _{i,j}{\Vert {\mathbf{c}}_{i}-{\mathbf{c}}_{j}\Vert}^{2}{w}_{ij}^{(b)}\phantom{\rule{0ex}{0ex}}=\frac{1}{2}\sum _{i,j}{\Vert {\mathbf{V}}^{T}{\mathbf{p}}_{i}-{\mathbf{V}}^{T}{\mathbf{p}}_{j}\Vert}^{2}{w}_{ij}^{(b)},$$## (10)

$${w}_{ij}^{(b)}=\{\begin{array}{ll}\mathrm{exp}\{-{\Vert {\mathbf{p}}_{i}-{\mathbf{p}}_{j}\Vert}^{2}\},& \mathrm{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathbf{p}}_{i}\in {N}_{{k}_{3}}^{-}({\mathbf{p}}_{j})\\ 0,& \text{otherwise}\end{array},$$${J}_{b}(\mathbf{V})$ demonstrates the spatial relationships between the neighborhood virtual points from different classes: The larger the value of ${J}_{b}(\mathbf{V})$, the further the neighborhood virtual points are from each other. The explanation of the process of neighborhood virtual point discriminant from different classes is shown in Fig. 3, which shows that the virtual points from different classes will separate from each other.

Referring to Eq. (7), we can infer that

## (11)

$${J}_{w}(\mathbf{V})=\frac{1}{2}\sum _{i,j}{\Vert {\mathbf{c}}_{i}-{\mathbf{c}}_{j}\Vert}^{2}{w}_{ij}^{(w)}\phantom{\rule{0ex}{0ex}}=\frac{1}{2}\sum _{i,j}{\Vert {\mathbf{V}}^{T}{\mathbf{p}}_{i}-{\mathbf{V}}^{T}{\mathbf{p}}_{j}\Vert}^{2}{w}_{ij}^{(w)}\phantom{\rule{0ex}{0ex}}=\frac{1}{2}\text{trace}[\sum _{i,j}{\mathbf{V}}^{T}({\mathbf{p}}_{i}-{\mathbf{p}}_{j}){w}_{ij}^{(w)}{({\mathbf{p}}_{i}-{\mathbf{p}}_{j})}^{T}\mathbf{V}]\phantom{\rule{0ex}{0ex}}=\text{trace}\left\{{\mathbf{V}}^{T}\right[\sum _{i,j}({\mathbf{p}}_{i}{w}_{ij}^{(w)}{\mathbf{p}}_{i}^{T}-{\mathbf{p}}_{i}{w}_{ij}^{(w)}{\mathbf{p}}_{j}^{T})\left]\mathbf{V}\right\}\phantom{\rule{0ex}{0ex}}=\mathrm{trace}[{\mathbf{V}}^{T}\mathbf{P}({\mathbf{D}}^{(w)}-{\mathbf{W}}^{(w)}){\mathbf{P}}^{T}\mathbf{V}],$$## (12)

$${J}_{w}(\mathbf{V})=\text{trace}[{\mathbf{V}}^{T}\mathbf{XH}({\mathbf{D}}^{(w)}-{\mathbf{W}}^{(w)}){\mathbf{H}}^{T}{\mathbf{X}}^{T}\mathbf{V}]\phantom{\rule{0ex}{0ex}}=\text{trace}({\mathbf{V}}^{T}{\mathbf{X}\mathbf{L}}^{(w)}{\mathbf{X}}^{T}\mathbf{V}),$$Referring to Eq. (9), we can infer that

## (13)

$${J}_{b}(\mathbf{V})=\frac{1}{2}\sum _{i,j}{\Vert {\mathbf{c}}_{i}-{\mathbf{c}}_{j}\Vert}^{2}{w}_{ij}^{(b)}\phantom{\rule{0ex}{0ex}}=\frac{1}{2}\sum _{i,j}{\Vert {\mathbf{V}}^{T}{\mathbf{p}}_{i}-{\mathbf{V}}^{T}{\mathbf{p}}_{j}\Vert}^{2}{w}_{ij}^{(b)}\phantom{\rule{0ex}{0ex}}=\frac{1}{2}\text{trace}[\sum _{i,j}{\mathbf{V}}^{T}({\mathbf{p}}_{i}-{\mathbf{p}}_{j}){w}_{ij}^{(b)}{({\mathbf{p}}_{i}-{\mathbf{p}}_{j})}^{T}\mathbf{V}]\phantom{\rule{0ex}{0ex}}=\text{trace}\left\{{\mathbf{V}}^{T}\right[\sum _{i,j}({\mathbf{p}}_{i}{w}_{ij}^{(b)}{\mathbf{p}}_{i}^{T}-{\mathbf{p}}_{i}{w}_{ij}^{(b)}{\mathbf{p}}_{j}^{T})\left]\mathbf{V}\right\}\phantom{\rule{0ex}{0ex}}=\mathrm{trace}[{\mathbf{V}}^{T}\mathbf{P}({\mathbf{D}}^{(b)}-{\mathbf{W}}^{(b)}){\mathbf{P}}^{T}\mathbf{V}],$$## (14)

$${J}_{b}(\mathbf{V})=\text{trace}[{\mathbf{V}}^{T}\mathbf{X}\mathbf{H}({\mathbf{D}}^{(b)}-{\mathbf{W}}^{(b)}){\mathbf{H}}^{T}{\mathbf{X}}^{T}\mathbf{V}]\phantom{\rule{0ex}{0ex}}=\text{trace}({\mathbf{V}}^{T}{\mathbf{X}\mathbf{L}}^{(b)}{\mathbf{X}}^{T}\mathbf{V}),$$## 2.3.

### Objective Function

For the purpose of classification, we expect that ${J}_{n}(\mathbf{V})$ and ${J}_{w}(\mathbf{V})$ have small values, while ${J}_{b}(\mathbf{V})$ has a large value, so that each sample in the low-dimensional space will move toward its neighborhood virtual point, while the virtual points in the same class get together, and the virtual points from different classes separate from each other. Therefore, the samples in the same class will get close, whereas the samples of different classes will separate from each other in low-dimensional space.

Let ${J}_{w}^{\prime}(\mathbf{V})={J}_{n}(\mathbf{V})+{J}_{w}(\mathbf{V})$. Referring to Eqs. (6) and (12), we can infer that

## (15)

$${J}_{w}^{\prime}(\mathbf{V})=\text{trace}({\mathbf{V}}^{T}{\mathbf{X}\mathbf{L}}^{(n)}{\mathbf{X}}^{T}\mathbf{V})+\text{trace}({\mathbf{V}}^{T}{\mathbf{X}\mathbf{L}}^{(w)}{\mathbf{X}}^{T}\mathbf{V})\phantom{\rule{0ex}{0ex}}=\text{trace}[{\mathbf{V}}^{T}\mathbf{X}({\mathbf{L}}^{(n)}+{\mathbf{L}}^{(w)}){\mathbf{X}}^{T}\mathbf{V}]\phantom{\rule{0ex}{0ex}}=\text{trace}({\mathbf{V}}^{T}{\mathbf{XL}}^{(w)\prime}{\mathbf{X}}^{T}\mathbf{V}).$$Consequently, according to Eqs. (14) and (15) and Fisher’s criterion,^{19} the objective function of NVPDE can be formulated as

## (16)

$${\mathit{V}}^{*}=\underset{\mathit{V}}{\mathrm{argmax}\text{\hspace{0.17em}}\text{trace}}\left(\frac{{\mathit{V}}^{T}{\mathbf{X}\mathbf{L}}^{(b)}{\mathbf{X}}^{T}\mathit{V}}{{\mathit{V}}^{T}{\mathbf{X}\mathbf{L}}^{(w)\prime}{\mathbf{X}}^{T}\mathit{V}}\right).$$The columns of the optimal $\mathit{V}$ are the generalized eigenvectors corresponding to the $l$ largest eigenvalues in

## (17)

$${\mathbf{X}\mathbf{L}}^{(b)}{\mathbf{X}}^{T}\mathbf{v}=\lambda {\mathbf{X}\mathbf{L}}^{(w)\prime}{\mathbf{X}}^{T}\mathbf{v}.$$The NVPDE algorithm procedures are formally stated as follows:

1. Compute the weight matrices ${\mathbf{W}}^{(n)}$, ${\mathbf{W}}^{(w)}$, and ${\mathbf{W}}^{(b)}$ according to Eqs. (2), (8), and (10).

2. According to Eqs. (14) and (15), compute the matrices ${\mathit{L}}^{(b)}$ and ${\mathit{L}}^{(w)\prime}$, and then solve the optimal embedding $\mathit{V}$ according to Eq. (17).

3. Feature extraction: Given a testing sample ${\mathbf{x}}_{t}$, the extracted feature based on NVPDE is ${\mathbf{z}}_{t}={\mathbf{V}}^{T}{\mathbf{x}}_{t}$, where ${\mathbf{z}}_{t}\in {R}^{l}$.

Concerning the computational complexity of the proposed algorithm, we note that the complexity of searching $k$ nearest neighbors for all the samples and neighborhood virtual points is $\mathrm{O}(n{N}^{2})$. The complexity of calculating the elements of weight matrices is $\mathrm{O}(nN)$. The complexity of computing the matrices ${\mathbf{L}}^{(b)}$ and ${\mathbf{L}}^{(w)\prime}$ is $\mathrm{O}({N}^{3})$, and the complexity of solving the generalized eigenvalue decomposition problem is $\mathrm{O}({n}^{3})$. In most cases, the number of training samples is less than the dimension of the training sample $(N<n)$. Therefore, like most other feature extraction methods, the computational bottleneck of NVPDE is solving the generalized eigenvalue problem, whose computational complexity is $\mathrm{O}({n}^{3})$.

## 3.

## Experimental Results

In this section, the MSTAR^{20} and AT&T face databases are utilized to evaluate the proposed algorithm.

The MSTAR dataset consists of X-band original SAR images ($128\times 128$ pixels) with a resolution of one foot by one foot. The target images are three types of military vehicles. Each object includes images covering the full aspect range of 0 deg to 360 deg. In this work, the training dataset contains SAR images at a depression angle of 17 deg, and the testing dataset contains images at a depression angle of 15 deg. Table 1 lists the type and number of each object.

## Table 1

The training and testing samples in experiments.

Training set | Size | Testing set | Size |
---|---|---|---|

BMP2sn_c21 | 233 | BMP2sn_9563 | 195 |

BMP2sn_9566 | 196 | ||

BMP2sn_c21 | 196 | ||

BTR70sn_c71 | 233 | BTR70sn_c71 | 196 |

T72sn_132 | 232 | T72sn_132 | 196 |

T72sn_812 | 195 | ||

T72sn_s7 | 191 | ||

Total | 698 | Total | 1365 |

We mainly make use of the targets in the MSTAR SAR images to evaluate the performance of the proposed algorithm. The original SAR image dataset has been preprocessed^{21} to extract the target areas of SAR images before feature extraction. The steps of SAR image preprocessing are as follows:

1. Two-parameter CFAR

^{22}and geometric clustering are conducted to target segmentation. Then the binary mask matrices of the images are obtained.2. The targets of SAR images are extracted by masking the binary matrices to the corresponding original SAR images. The location of the target is recentered on the centroid through centering image registration.

^{23}3. Energy normalization preprocessing is used to normalize the energy of images in the same range. The gray enhancement based on power function

^{24}is executed to enhance the information of the SAR images.

The optical images and the corresponding SAR images of the three targets in the MSTAR dataset are shown in Figs. 4 and 5. In Fig. 6, it is shown that the binary mask matrices of the targets are obtained after two-parameter CFAR and geometric clustering conducted in the SAR images. Figure 7 indicates that the target areas of SAR images are extracted, and the image registration centered on the centroid is operated. The gray enhancement and energy normalization preprocessing are executed as shown in Fig. 8.

The AT&T face database^{25} contains images from 40 individuals, each providing 10 different images. For some subjects, the images were taken at different times, varying the lighting, facial expressions, and facial details. All images are grayscale. For each individual, four images are randomly selected for training, and the rest are used for testing. Thus, we get 160 training samples and 240 testing samples for this experiment.

The experiment includes four parts. The theoretical approach of the proposed algorithm will be validated using the SAR image dataset in part 1. In part 2, we compare our algorithm with five other methods (PCA, LDA, KPCA, KLDA, and LDE) to evaluate the recognition performance for SAR images. We also illustrate the classification results by a two-dimensional data visualization to evaluate the performance of NVPDE. In part 3, we evaluate and discuss the influences of the relevant neighbor parameters variation for the proposed algorithm in SAR ATR. The recognition results for the face image database are demonstrated in part 4.

## 3.1.

### Part 1

## 3.1.1.

#### Experimental steps

Because of the local Euclidean principle in manifold,^{9} samples and virtual points have a nearly linear distribution in the neighborhoods. Therefore, we utilize the scatter^{26} to measure the spatial relationships among the data points in their neighborhoods statistically.

1. We employ the SAR image training dataset as the experiment samples in part 1. We calculate the neighborhood virtual point $\{{\mathbf{p}}_{i}\in {R}^{n},i=1,2,\cdots ,N\}$ from each neighborhood ${N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$ and compute the neighborhood scatters $\xi =[{\xi}_{1},{\xi}_{2},\cdots ,{\xi}_{N}]\in {R}^{1\times N}$ in each ${N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})$ to measure the spatial relationships between the samples and their respective neighborhood virtual points statistically, where ${\xi}_{i}=\frac{1}{{k}_{1}+1}\sum _{{\mathbf{x}}_{j}\in {N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{or}\phantom{\rule[-0.0ex]{1em}{0.0ex}}i=j}{({\mathbf{x}}_{j}-{\mathbf{p}}_{i})}^{T}({\mathbf{x}}_{j}-{\mathbf{p}}_{i})$, $i=\phantom{\rule{0ex}{0ex}}1,2,\cdots ,N$, and $N=698$.

2. We find the embedding $\mathbf{V}\in {R}^{n\times n}$ by the NVPDE algorithm. We compute the neighborhood scatters ${\xi}^{\prime}=[{\xi}_{1}^{\prime},{\xi}_{2}^{\prime},\cdots ,{\xi}_{N}^{\prime}]\in {R}^{1\times N}$ to measure the spatial relationships between the embedded samples and their respective embedded neighborhood virtual points, where ${\xi}_{i}^{\prime}=\frac{1}{{k}_{1}+1}\sum _{{\mathbf{x}}_{j}\in {N}_{{k}_{1}}^{+}({\mathbf{x}}_{i})\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{or}\phantom{\rule[-0.0ex]{1em}{0.0ex}}i=j}{({\mathbf{V}}^{T}{\mathbf{x}}_{j}-\phantom{\rule{0ex}{0ex}}{\mathbf{V}}^{T}{\mathbf{p}}_{i})}^{T}({\mathbf{V}}^{T}{\mathbf{x}}_{j}-{\mathbf{V}}^{T}{\mathbf{p}}_{i})$, $\hspace{0.17em}i=1,2,\cdots ,N$. We then calculate $\mathrm{\Delta}\xi =\xi -{\xi}^{\prime}=[\mathrm{\Delta}{\xi}_{1},\mathrm{\Delta}{\xi}_{2},\cdots ,\mathrm{\Delta}{\xi}_{N}]\in {R}^{1\times N}$ to measure the neighborhood scatter variations within each sample’s neighborhood. According to the vector $\mathrm{\Delta}\xi $ and the class information, the neighborhood scatter variation stem plot of three class targets is shown in Fig. 9.

3. We follow the same routine, calculating the scatter variations $\mathrm{\Delta}\psi =\psi -{\psi}^{\prime}=[\mathrm{\Delta}{\psi}_{1},\mathrm{\Delta}{\psi}_{2},\cdots ,\mathrm{\Delta}{\psi}_{N}]\in \phantom{\rule{0ex}{0ex}}{R}^{1\times N}$ for each virtual point’s within-class neighborhood, where $\mathrm{\Delta}{\psi}_{i}={\psi}_{i}-{\psi}_{i}^{\prime}$, and the virtual point’s within-class neighborhood scatter before embedding ${\psi}_{i}=\frac{1}{{k}_{2}}\sum _{{\mathbf{p}}_{j}\in {N}_{{k}_{2}}^{+}({\mathbf{p}}_{i})}{({\mathbf{p}}_{j}-{\mathbf{p}}_{i})}^{T}({\mathbf{p}}_{j}-{\mathbf{p}}_{i})$ and the virtual point’s within-class neighborhood scatter after embedding ${\psi}_{i}^{\prime}=\frac{1}{{k}_{2}}\sum _{{\mathbf{p}}_{j}\in {N}_{{k}_{2}}^{+}({\mathbf{p}}_{i})}{({\mathbf{V}}^{T}{\mathbf{p}}_{j}-{\mathbf{V}}^{T}{\mathbf{p}}_{i})}^{T}({\mathbf{V}}^{T}{\mathbf{p}}_{j}-{\mathbf{V}}^{T}{\mathbf{p}}_{i})$, $i=1,2,\cdots ,N$. According to the vector $\mathrm{\Delta}{\psi}_{i}$ and the class information, the virtual points’ within-class neighborhood scatter variation stem plot of three class targets is shown in Fig. 10.

4. We calculate the scatter variations $\mathrm{\Delta}\eta =\phantom{\rule{0ex}{0ex}}\eta -{\eta}^{\prime}=[\mathrm{\Delta}{\eta}_{1},\mathrm{\Delta}{\eta}_{2},\cdots ,\mathrm{\Delta}{\eta}_{N}]\in {R}^{1\times N}$ for each virtual point’s between-class neighborhood, where $\mathrm{\Delta}{\eta}_{i}=\phantom{\rule{0ex}{0ex}}{\eta}_{i}-{\eta}_{i}^{\prime}$, and the virtual point’s between-class neighborhood scatter before embedding ${\eta}_{i}=\phantom{\rule{0ex}{0ex}}\frac{1}{{k}_{3}}\sum _{{\mathbf{p}}_{j}\in {N}_{{k}_{3}}^{-}({\mathbf{p}}_{i})}{({\mathbf{p}}_{j}-{\mathbf{p}}_{i})}^{T}({\mathbf{p}}_{j}-{\mathbf{p}}_{i})$ and the virtual point’s between-class neighborhood scatter after embedding ${\eta}_{i}^{\prime}=\frac{1}{{k}_{3}}\sum _{{\mathbf{p}}_{j}\in {N}_{{k}_{3}}^{-}({\mathbf{p}}_{i})}{({\mathbf{V}}^{T}{\mathbf{p}}_{j}-{\mathbf{V}}^{T}{\mathbf{p}}_{i})}^{T}({\mathbf{V}}^{T}{\mathbf{p}}_{j}-\phantom{\rule{0ex}{0ex}}{\mathbf{V}}^{T}{\mathbf{p}}_{i})$, $i=1,2,\cdots ,N$. According to the vector $\mathrm{\Delta}{\eta}_{i}$ and the class information, the virtual points’ between-class neighborhood scatter variation stem plot of three class targets is shown in Fig. 11.

## 3.1.2.

#### Experimental results and discussions

We make some statistics according to the elements of $\mathrm{\Delta}\xi $, $\mathrm{\Delta}\psi $, and $\mathrm{\Delta}\eta $. The proportion corresponding to the condition $\mathrm{\Delta}{\xi}_{i}>0$ is 71.06%. The proportion corresponding to the condition $\mathrm{\Delta}{\psi}_{i}>0$ is 61.32%, and the proportion corresponding to the condition $\mathrm{\Delta}{\eta}_{i}<0$ is 55.59%.

According to Figs. 9 and 10 and the statistical results, it can be seen that most of the samples’ within-neighborhood scatter variations and the virtual points’ within-class neighborhood scatter variations are positive. This indicates that the samples move toward their neighborhood virtual points, and the virtual points in the same class get together by the embedding of the proposed method.

From Fig. 11 and the statistical results, it can be seen that more than half of the virtual points’ between-class neighborhood scatter variations are negative. This demonstrates that the NVPDE algorithm can keep the virtual points from different classes far away from each other in the low-dimensional space effectively.

## 3.2.

### Part 2

## 3.2.1.

#### Experimental steps

In this experiment, PCA, LDA, KPCA, KLDA, LDE, and NVPDE are utilized to extract features of the experimental SAR image dataset. The value of neighbor parameters are ${k}_{1}=10$ and ${k}_{2}=20$ in LDE and ${k}_{1}=18$, ${k}_{2}=20$, and ${k}_{3}=85$ in NVPDE. The nearest neighbor classifier^{27} (NNC) is utilized for the final classification.

The kernel function of KPCA and KLDA is the Gaussian kernel $k({\mathbf{x}}_{i},{\mathbf{x}}_{j})=\mathrm{exp}(-\Vert {\mathbf{x}}_{i}-{\mathbf{x}}_{j}{\Vert}^{2}/\sigma )$. As mentioned above, the recognition performance of the kernel method depends on the kernel settings, and the selection of the kernel settings is empirical in practice.

We change the kernel parameter gradually and get the corresponding top recognition rates. Then we evaluate the best kernel parameter.

Figure 12 shows that plots of top recognition rate versus the different values of kernel parameters using KPCA and KLDA. From this, we can see that the kernel parameters $\sigma =9$ for KPCA and $\sigma =6$ for KLDA are the best selections.

## 3.2.2.

#### Experimental results and discussions

Figure 13 shows plots of recognition rate versus dimensions of the feature vectors by PCA, LDA, KPCA, KLDA, LDE, and NVPDE. The maximal feature dimension based on LDA is less than the number of class $c$.^{28} Therefore, the recognition rate of LDA is the performance with two feature dimensions.

From Fig. 13, we can see that PCA and LDA have relatively low recognition rates. For the high-dimensional SAR image dataset, the manifold structure corresponds more to spatial distribution. However, the classical linear feature extraction methods, such as PCA and LDA, are all based on the global linear structure of a dataset. This limits the recognition performance of those two methods.

Figure 13 demonstrates that the recognition rates of KPCA and KLDA are similar but are significantly improved over PCA and LDA. LDE performs better than KPCA and KLDA. The NVPDE algorithm performs far better than the other methods.

The linearly inseparable problem can be transformed into a linearly separable one in a higher-dimensional space by kernel tricks, so that the linearly inseparable problem can be solved by KPCA and KLDA to some extent. However, the recognition performance depends on the selection of kernel functions, which is the main drawback of kernel tricks.

Based on manifold learning theory, LDE incorporates the class relations of samples, which can discover the low-dimensional essential structure from a high-dimensional SAR image dataset. However, this method is based only on establishing relations between samples; it ignores the spatial relationships between neighborhoods, which will restrict the recognition performance.

By introducing the neighborhood virtual point into every sample’s neighborhood in the NVPDE algorithm, the relations between the samples and their neighborhood virtual point are taken into account, and the spatial relationships of the neighborhood virtual points are established, by which the relations between neighborhoods are formed indirectly. Therefore, the algorithm is able to find out more discriminating information from the neighborhoods, and the recognition performance is far superior to LDE.

In order to evaluate the classification performance of the proposed feature extraction method systematically, we investigate the ROC of the proposed method,^{29} and two typical feature extraction methods (PCA and LDE) were conducted for a comparison. Figure 14 shows the ROC of three feature extraction methods using NNC, and the false alarm probability axis is logarithmic.

From Fig. 14, it can be seen that:

1. LDE and the NVPDE algorithm have a relatively high correct classification probability $({P}_{\mathrm{cc}})$ and a relatively low false alarm probability $({P}_{\mathrm{fa}})$. Therefore, these two methods have considerable classification results.

2. NVPDE can achieve a higher correct classification probability and a lower false alarm probability than the other two methods.

3. The area under the ROC curve of the NVPDE algorithm is larger than that of PCA and LDE. This shows that the proposed algorithm has a better recognition performance than the other methods.

The top recognition rate and the corresponding dimensions by various algorithms are shown in Table 2, and we can see that our proposed algorithm outperforms the other methods.

## Table 2

Best recognition performance by various algorithms.

Method | Top recognition (%) | Feature dimension |
---|---|---|

PCA | 92.23 | 140 |

LDA | 85.79 | 2 |

KPCA | 93.70 | 170 |

KLDA | 93.92 | 80 |

LDE | 95.10 | 90 |

NVPDE | 97.88 | 80 |

The training samples of the SAR images are embedded into two-dimensional Euclidean space by NVPDE, LDE, and PCA to illustrate the classification results with a 2-D data visualization example.

Figure 15 shows the distributions of three class samples after being embedded in two-dimensional Euclidean space by NVPDE, LDE, and PCA. The plus symbol represents the first-class samples in the embedding space, while the $o$ represents the second-class samples, and the asterisk represents the third-class samples.

The experimental results show that samples in the same class do not get evidently close in the embedding space by PCA. After being embedded by LDE, the samples with the same class label get close to some extent, but most samples from different classes overlap with each other, which will restrict the recognition rate. By introducing the neighborhood virtual point in NVPDE, the relationships between neighborhoods are established indirectly, and more discriminating information can be found out. Hence the samples with the same class label get close, and samples from different classes separate from each other in the embedding space, as shown in Fig. 15.

## 3.3.

### Part 3

## 3.3.1.

#### Experimental steps

In this part, LDE and NVPDE will be utilized to extract features of the experimental dataset with various neighbor parameter values. The aim is to evaluate the stability of the proposed algorithm. We set ${k}_{1}=10$ and ${k}_{2}=20$ in LDE and ${k}_{1}=18$, ${k}_{2}=20$, and ${k}_{3}=85$ in NVPDE as the benchmark parameter settings. We then change one of the neighbor parameters gradually while keeping other parameters constant, and we record the corresponding top recognition rates of the two feature extraction methods.

## 3.3.2.

#### Experimental results and discussions

Figures 16 and 17 show the plots of top recognition rates versus the different values of neighbor parameters using LDE and NVPDE. From these figures, we can see that:

1. The variation of the within-class neighboring parameter ${k}_{1}$ has a tiny effect on the recognition performance in LDE, while the between-class neighboring parameter ${k}_{2}$ impacts the recognition performance significantly.

2. The selections of ${k}_{1}$ and ${k}_{2}$ influence the recognition results of NVPDE slightly, and ${k}_{3}$ has an effect on the recognition performance to some extent, but it is much smaller than ${k}_{2}$ in LDE.

Because the curvature and density may vary over the manifold,^{30} as an open problem,^{31} the values of neighbor parameters are likely to influence the result of recognition as in LDE.

In the NVPDE algorithm, the neighborhood virtual point of each sample is computed. In this way, the mean of each neighborhood is calculated, which is able to smooth the neighborhood of each sample and weaken the influence of neighbor parameters on recognition performance. Therefore, the selection of neighbor parameters has a very small effect on the classification results of our proposed method.

## 3.4.

### Part 4

## 3.4.1.

#### Experimental steps

In this part, we take advantage of the AT&T face database to examine the applicability of the proposed method in optical image recognition. LDE and NVPDE are utilized to extract features of the experimental AT&T dataset. The values of the neighbor parameters are ${k}_{1}=3$ and ${k}_{2}=30$ in LDE and ${k}_{1}=3$, ${k}_{2}=3$, and ${k}_{3}=30$ in NVPDE. NNC is used for the final classification.

## 3.4.2.

#### Experimental results and discussions

Figure 18 shows plots of recognition rates versus the dimensions of the feature vectors by LDE and NVPDE. In Fig. 18, the best recognition rate is 90.42% with the corresponding feature dimension 70 for LDE, while the top recognition rate can get to 93.75% with the corresponding feature dimension 60 in NVPDE. This means that the recognition performance of the proposed method outperforms LDE for the AT&T face database. Therefore, the experimental results indicate that the NVPDE method can achieve a satisfactory recognition performance in optical image recognition, as well.

## 4.

## Conclusion

For the issue of feature extraction from high-dimensional SAR images, it is important to establish relationships between samples’ neighborhoods, which will uncover much more discriminating information. In this paper, a new approach to feature extraction was proposed, in which the neighborhood virtual points are employed and the relationships between neighborhoods are taken into account sufficiently. Through this method, classification is better conducted in feature space, and the recognition performance is improved. The experimental results based on the MSTAR dataset demonstrate the effectiveness of our method.

## Acknowledgments

This research was supported by the National Natural Science Foundation of China (No. 61201272).

## References

## Biography

**Jifang Pei** received a BS from the College of Information Engineering at Xiangtan University, Hunan, China, in 2010. He is an IEEE student member and is working toward an MSc degree at the University of Electronic Science and Technology of China (UESTC), Chengdu. His research interests include SAR automatic target recognition and digital image processing.

**Yulin Huang** received his BS and PhD degrees in electronic engineering from the University of Electronic Science and Technology of China, Chengdu, in 2002 and 2008, respectively. He is an IEEE member and an associate professor at UESTC. His ﬁelds of interest include radar signal processing and SAR automatic target recognition.

**Xian Liu** received a BS degree from the Institute of Information Science and Engineering at Hebei University of Science and Technology, China, in 2009. She is an IEEE student member and is working toward a PhD degree at UESTC. Her fields of interest include SAR automatic target recognition.

**Jianyu Yang** received a BS degree from the National University of Defense Technology, Changsha, China, in 1984, and MS and PhD degrees from UESTC in 1987 and 1991, respectively. All his degrees are in electronic engineering. He is a professor at UESTC and a senior editor for the *Chinese Journal of Radio Science*. He is a member of IEEE and the Institution of Engineering and Technology and a senior member of the Chinese Institute of Electronics.