Kernel linear representation: application to target recognition in synthetic aperture radar images

Abstract A method for target classification in synthetic aperture radar (SAR) images is proposed. The samples are first mapped into a high-dimensional feature space in which samples from the same class are assumed to span a linear subspace. Then, any new sample can be uniquely represented by the training samples within given constraint. The conventional methods suggest searching the sparest representations with ℓ 1 -norm (or ℓ 0 ) minimization constraint. However, these methods are computationally expensive due to optimizing nondifferential objective function. To improve the performance while reducing the computational consumption, a simple yet effective classification scheme called kernel linear representation (KLR) is presented. Different from the previous works, KLR limits the feasible set of representations with a much weaker constraint, ℓ 2 -norm minimization. Since, KLR can be solved in closed form there is no need to perform the ℓ 1 -minimization, and hence the calculation burden has been lessened. Meanwhile, the classification accuracy has been improved due to the relaxation of the constraint. Extensive experiments on a real SAR dataset demonstrate that the proposed method outperforms the kernel sparse models as well as the previous works performed on SAR target recognition.


Introduction
Synthetic aperture radar (SAR) has been widely used in many fields, such as environmental monitoring, surveillance, and reconnaissance, due to its ability to work 24-hour a day and its robustness with inclement weather conditions. Automatic target recognition (ATR) is a fundamental topic of SAR image interpretation. It has been studied extensively in the past two decades, yet it is still an open problem. Particularly, it is challenging to perform target recognition in the extended operating conditions, [1][2][3] in which a single operational parameter is significantly different between the images used for training and those used for testing. A typical SAR ATR system identifies the unknown through three sequential stages: 4-6 detection, 7 discrimination, 8 and classification. 9 First, targets as well as various clutter false alarms (e.g., buildings, trees, streetlights, etc.) are detected. Then, natural and manmade clutter false alarms are rejected in the discrimination stage, followed by a classifier to give the identity. Only the final procedure, classification, is studied in this article.
Sparse and redundant signal representations have recently drawn much interest in computer vision, signal and image processing, 10 due to the fact that signals and images of interest can be sparse or compressible with respect to a given dictionary. In Ref. 11, Huang and Aviyente propose the sparse representation (SR) over a redundant basis set for signal classification. The sparse coefficients are obtained by optimizing an objective function that includes the measurement of the reconstruction error and sparsity level. In Ref. 12, Wright et al. generalize the SR technique for robust face recognition. By assuming that a linear subspace can be spanned using the samples belonging to the same class, any new sample can be recovered by a linear combination of the training samples from all classes. To search for the most parsimonious representation, the constraint, l 1 -norm minimization, is imposed on the encoding coefficients. Similarly, an SR technique has been introduced to the ATR framework. In Ref. 13, Chen et al. propose a sparsity-based algorithm for automatic target detection in hyperspectral imagery. The test pixel is linearly reconstructed by the training pixels with a sparsity constraint, and the characteristics of coefficients on reconstruction are used to make a decision. To circumvent pose estimation and the multiple preprocessing procedures, Ref. 14 performs target classification in SAR images with the SR method. They explain the advantage of SR from the perspective of manifolds' learning. In Ref. 9, Patel et al. present a block sparsity-based classification method, in which the inherent block structure of encoding coefficients has been exploited with a group sparsity constraint. Although great performances have been reported in these works, they may be not effective if the dataset is not linearly separable in the original space.
To cover the shortage of the linear SR techniques, some works transform the samples into a new feature space induced by a nonlinear mapping. However, it is infeasible to solve the presented problem directly due to the implicit nonlinear mapping. In Ref. 15 computation in feature space with a dimension reduction strategy. However, these methods are computationally expensive due to optimizing a nondifferential objective function. Moreover, in the feature space, there is actually no need to impose the strong sparsity constraint on the representations. A much weaker constraint, l 2 -minimization, can play the same role but with less computational consumption. 18,19 To improve the performance while alleviating the computational cost, a new classification method named the kernel linear representation (KLR) is presented in this article. All the samples are first mapped into an implicit feature space by a nonlinear mapping. The classification is then implemented with respect to the data in the feature space. In the feature space, it is assumed that the samples belonging to the same class approximately span a linear subspace, thus any new sample can be represented by a linear combination of all the training samples. To uniquely recover the test sample, the conventional methods impose the strong sparsity constraint on the encoding coefficients. The idea in this article is to limit the feasible set of the representation with a much weaker constraint, l 2 -norm minimization. Due to the convexity and differentiability, the presented problem can be solved in closed form. Thus, the complicated procedure to optimize the strong sparsity constraint problem has been circumvented. The unknown is identified according to the characteristics of representations on reconstruction. Compared with the forerunners' works, [15][16][17] the proposed method performs much faster due to the analytic solution, and the accuracy has been improved because of the relaxation of the constraints.
Although the proposed method is of broad interest to object recognition in general, the studies and experimental results in this article are confined to high-resolution SAR target recognition (i.e., MSTAR dataset). The proposed method does not rely on any preprocessing procedure, such as pose estimation, noise reduction, and binary-value. It is robust to small variations in configuration, pose, and depression angles.

Sparse Representation
SR aims to succinctly recover the signal over a given dictionary. It is based on the assumption that a linear subspace can be spanned by samples belonging to the same class. Given sufficient training samples of the i'th class, X i ¼ ½x i;1 ; x i;2 ; : : : ; x i;n i ∈ R m×n i , where m is the data dimension stacked as a column. Any new sample y ∈ R m from the same class will approximately lie in the linear span of the training samples associated with the i'th class, y ¼ x i;1 α i;1 þ x i;2 α i;2 þ : : : þ x i;n i α i;n i , where α i ¼ ½α i;1 ; α i;2 ; : : : ; α i;n i T ∈ R n i is the coefficients. Since the membership of the new sample is initially unknown, a dictionary has been defined by concatenating the n training samples of all k distinct classes, X ¼ ½X 1 ; X 2 ; : : : ; X k ∈ R m×n , where n ¼ P k i¼1 n i . Then, the linear representation of y can be rewritten in terms of all training samples where α ¼ ½α 1 ; α 2 ; : : : ; α k T ∈ R n is the representation vector whose entries are zeros except those associated with the i'th class in theory. Usually, we consider that the underdetermined systems (m < n), and its solution is not unique. The popular methods are to search the sparsest solution by l 0 -norm minimization 12 min α kαk 0 s:t: ky − Xαk 2 ≤ ε; where k · k 0 ∶R n ↦ R counts the nonzero entries and ε is the error tolerance. In Eq. (2), the objective function is to measure the sparsity level, whereas the constrained term is to control the reconstruction error (i.e., fidelity). Due to the nonconvex and nondifferential properties, solving Eq. (2) is nondeterministic polynomial (NP)-hard. Thanks to the development of compressed sensing theories, the solution of the l 0 -norm minimization problem is equal to the one of the l 1 -norm minimization if the solution is sparse enough 20 At present, many algorithms have been presented to solve Eq. (3), as comprehensively reviewed in Ref. 21. Then, the test sample is classified as the class whose training samples can generate the minimum residual min i¼1;: : : k fky − X iαi k 2 2 g:

Kernel Sparse Representation
In the previous works, 9,12,14 although great performance has been reported, they may be not effective if the dataset is not linearly separable originally. To improve the performance, it is natural to cast the data into a new feature space in which the data separability between different classes has been enhanced. Suppose the feature space (denote by I) is induced by the nonlinear mapping, ϕ∶R m ↦ I, and the kernel function, κ∶R m × R m ↦ R, is defined as the inner product in the feature space where h·; ·i is the inner product. Commonly used kernel functions include a polynomial kernel, a sigmoidal kernel, and a Gaussian radial basis function (RBF).

Kernel Linear Representation
The previous works [15][16][17] compute the sparsest representation with an l 1 -norm minimization constraint. The idea here is to limit the feasible set of the representations with a much weaker constraint, l 2 -norm minimization min α kαk 2 s:t: kϕðyÞ − ϕðXÞαk 2 ≤ ε: (8) Which is also equal to the unconstrained problem min α ffðαÞ ¼ kϕðyÞ − ϕðXÞαk 2 þ λkαk 2 g: Obviously, Eq. (9) can be solved in closed form due to the convex and differential objective functions. By unfolding the fidelity term, it can be rewritten as ; where Φ y ¼ ½κðy; x 1 Þ; κðy; x 2 Þ; : : : ; κðy; x n Þ T , and is the kernel Gram matrix. It is apparent that the objective function of Eq. (10) is tractable since it only refers to the matrix operation of finite dimension, Φ y ∈ R n and Φ ∈ R n×n , rather than dealing with a possibly infinitely dimensional dictionary, ϕðXÞ. An important hint of this formulation is that the computation of Φ y and Φ only requires the dot products. It is, therefore, feasible to solve Eq. (10) with Mercer kernel tricks 22 regardless of the nonlinear mapping ϕ. With the kernel trick [Eq. (5)], searching the coefficients over the dictionary, ϕðXÞ, is then converted to compute the representations in terms of the kernel Gram matrix Φ. So, the explicit computation in the feature space has been circumvented. Following the mathematic rules, we conduct the partial differential of the representation By setting the derivative to be zero, ½∂fðαÞ∕∂α ¼ 0, it is easy to obtain the analytic solution where I is the identity matrix whose dimension is the same as kernel Gram matrix. Similarly, the decision is made by the characteristics of the representations on reconstruction min i¼1;: : : ;k where δ i ð·Þ∶R n ↦ R n is the mapping to pick out the coefficients associated with the i'th class.

Validation
The ability to determine whether the input sample is valid or not is crucial for the classifier to work in real-world situations. A typical target recognition system, for example, should reject the civil vehicles, buildings, and trees. In the proposed framework, the validation judgment is in terms of the reconstruction error. That is, the algorithm accepts or rejects a test sample based on how small the minimum residual is. Given a solutionα found by Eq. (12), the residual vector can be built as e ¼ ½e 1 ; e 2 ; : : : ; e k ∈ R k , where e i ¼ kΦ y − Φδ i ðαÞk 2 2 , i ¼ 1; : : : ; k is the residual associated with the i'th class. To measure the quality of the test sample, an index named the normalized minimum residual (NMR) has been defined as For any coefficient vectorα, the smaller the NMRðαÞ is, the better the quality of the test sample is, and hence belief that the test sample is a valid one is higher, vice versa. Given a threshold τ ∈ ð0; 1Þ, a test sample is accepted as valid if NMRðαÞ ≤ τ, and otherwise it is rejected as the outlier. Only the test sample that passes the criterion can be assigned a class label.
The framework of the proposed method has been pictorially summarized in Fig. 1, where it has been divided into three sequential stages. The first stage is devoted to data preparation. All the samples are mapped into the feature space with a nonlinear mapping. In the second stage, the training samples are used to formulate the redundant dictionary to encode an input test sample as a linear combination of them via the l 2 -norm minimization. The last stage contributes to the decision. The identity is assigned to the test sample if it passes the rejection criterion [Eq. (14)], otherwise it is determined to be an outlier.

Experimental Results and Discussions
This section demonstrates the performance of the proposed method on MSTAR database, a collection done using a 10-GHz spotlight SAR sensor with a one-foot resolution. For each target, images are captured at different depression angles over a full 0 to 359 deg range of aspect view. To ensure the accuracy and efficiency, a wide variety of experiments are performed, including configuration variations, pose and depression angle variations, outlier rejection, etc. In all the experiments, the center 80 × 80 pixels patch is used as the input. The cropped images are first subsampled by a factor of ρ, and the subsampled images are then mapped into the feature space. The subsampling factor is chosen from a given interval ρ ¼ f1∕400; 1∕256; 1∕100; 1∕ 64g, which corresponds to sizes 4 × 4, 5 × 5, 8 × 8, and 10 × 10 pixels. Gaussian RBF,

Parameter Setting
In the proposed method, there is a free parameter to be set, the regularizer λ. It is used to balance the fidelity and "sparsity." First, it circumvents the difficulty when the kernel Gram matrix, Φ, is not invertible, so the solution is more stable. Second, it introduces a particular "sparsity" to the representations, and the sparsity is much weaker than the l 1 -norm minimization. To determine λ, five targets, BMP2, BTR70, T72, T62, and BTR60, are employed for several groups of experiments. The number of aspect view images available for different targets is shown in Table 1, and the recognition rates with different parameters are drawn in Fig. 2.
As can be seen from Fig. 2, when λ decreased from 0.08 to 0.006, the accuracy first improves and the subsequently degrades. The peak value happens at λ ¼ 0.02 or 0.03. Though different rates have been obtained, the accuracy varies very slightly. Thus, we can come to the conclusion that the proposed method is insensitive to the regularizer. It is, however, necessary for stable matrix inversion. Hereafter, the parameter will be fixed as λ ¼ 0.02.

Configuration Invariance
This subsection examines the invariance of different algorithms under different configurations. Three basic targets, T72, BMP2, and BTR70, have been utilized. For T72 and BMP2, there are three variants with different serial numbers and small structural modifications, SN_132, SN_812, SN_s7, SN_9563, SN_9566, and SN_c21. Only the standard configurations, SN_132 (T72), SN_9563 (BMP2), and SN_c71 (BTR70), at a 17-deg depression angle are used for training, whereas all configurations at a 15-deg depression angle are used for testing, as detailed in Table 2. The performance obtained by different algorithms is listed in Table 3, where the former item gives the recognition rates, and the latter item shows the run-times in seconds.
As can be seen from Fig. 3, kernel-based methods, KSR, KSRP, KSRDR, and KLR, significantly outperform the linear one, SR. The average improvements of 12.18%, 14.65%, 13.86%, and 19.65% have been obtained by KSR, KSRP, KSRDR, and KLR against SR. This is because the dataset is not linearly separable in the original space, and it can be remedied by mapping the data into a new feature space. The proposed method achieves the best recognition rates, i.e., 0.7728, 0.8593, 0.9406, and 0.9575, and the average improvements of 7.48%, 5.0%, 5.79%, 8.04%, and 4.71% have been obtained against KSR, KSRP, KSRDR, SVM, and KSVM. The good performance can be attributed to two characteristics. First, the high-order structure of data has been exploited with the nonlinear mapping. Second, the discriminative ability of the encoding coefficients has been enhanced due to the relaxation of constraints.  Fig. 2 The recognition rates of kernel linear representation with different parameters.
The computational cost, measured by run-times, is pictorially shown in Fig. 4. It can be observed that the proposed method performs much faster than the linear SR and kernel SR methods, and slower than SVM and KSVM. For the low-dimensional classification (16-D), KLR achieves the highest recognition rate, 0.7728, with a runtime of 4.5 s, and it is 4.4, 7.7, 12.8, and 6.2 times faster than SR, KSR, KSRP, and DSRDR. For the high-dimensional classification (100-D), KLR again obtains the highest recognition rate, 0.9575, with a run-time of 4.5 s, and it is 8, 34, 36, and 6.2 times faster than SR, KSR, KSRP, and DSRDR. This is because the complicated procedure to l 1 -norm minimization has been circumvented.

Comparison with Conventional Approaches to Synthetic Aperture Radar Target Recognition
The past two decades have witnessed great developments in SAR target recognition, and a wide variety of techniques are presented, such as template matching methods, 1 subspace [principal    26 etc. This subsection compares the proposed method with the previous works performed on SAR target recognition. Following the prototype of the above experiment, BMP2, BTR70, and T72 have been utilized. The classification accuracies obtained using different algorithms are shown in Table 4, where TM denotes the template matching method with the templates at 10-deg increments and a mask individualizing the targets.
As can be observed from Table 4, the template matching method achieves a similar performance to subspace (PCA) and transform (DCT) based methods (0.9040 versus 0.9076, 0.9055). All of them outperform another supervised classification technique, LDA. This is because there are only three different classes in the training dataset, which results in very lower dimensional projected data (i.e., 2-D) with LDA. The Fourier domain algorithm outperforms the TM-, PCA-, DCT-, and LDA-based methods. The proposed method, KLR achieves the highest recognition rate, 0.9575, and the improvements of 5.35%, 4.99%, 7.5%, 5.2%, and 4.03% have been obtained compared with TM, PCA, LDA, DCT, and FT. The experimental results are mainly due to the advantage of nonlinear mapping, which maps the data into a feature space where the data separability between different classes has been enhanced. Furthermore, the classification with collaborative representation of all training samples also contributes to the good performance, i.e., KLR recovers the test sample with training samples of all the classes, while the others reconstruct the test sample using several relevant ones.

Outlier Rejection
In the "open world" classification issues, the classifier should be capable of rejecting the unknown objects which do not belong to the training set classes (i.e., outlier). Thus, this subsection considers the rejection of confusers and clutters. Three military vehicles, BMP2 (SN_9563), BTR70 (SN_c21), and T72 (SN_132), are used as the standard targets, whereas D7 (a bulldozer) is specified as the confuser to be rejected. Samples of the standard target and the confuser are illustrated in   Fig. 6. The total number of chip images available for standard targets and the outliers are listed in Table 5. By tuning the threshold τ through a range of values in ð0; 1Þ, a receiver operating characteristic (ROC) curve, which plots the false reject rate against false accept rate for all possible thresholds, can be created. The ROC curve visually reflects the classifier's ability to determine whether a given test sample is in the training dataset. The results of confuser rejection and clutter rejection experiments have been drawn in Figs. 7 and 8, where the performance of four different algorithms, KLR, KSR, KSRP, and KSRDR, have been evaluated. As can be seen from Figs. 7 and 8, it is more difficult to determine the confuser (D7) than the clutter. This is because the confuser D7 is a manmade object. It produces similar scattering centers to three standard targets, as easily observed from Fig. 5. The clutters, however, are mainly composed of buildings, trees, roads, streetlights, etc. Their scattering centers are much more random and dispersive than typical tactical ground targets, as demonstrated in Fig. 6. Moreover, whether in the confuser rejection experiment or in the clutter rejection experiment, the proposed method significantly outperforms all the reference algorithms, KSR, KSRP, and KSRDR.    Fig. 7 Receiver operating characteristic (ROC) curves for confuser rejection.   detailed in Table 6. The recognition rates of different algorithms are listed in Table 7, with the corresponding diagram shown in Fig. 9.

10-Object Recognition
The results conform to the above experiments. Higher accuracies are obtained in highdimensional space (100-D and 64-D) than in low-dimension space (25-D and 16-D). Compared to the linear models (i.e., SR and SVM), better performance has been obtained using kernel models (KSVM, KSR, KSRP, KSRDR, and KLR). This is due to the nonlinear mapping which improves the data separability. As for KLR, the recognition rates of 0.7499, 0.8476, 0.9191, and 0.9428 have been obtained, which outperforms all the baseline methods. Two sides contribute to the experimental results. The improvement in accuracy against KSVM results from the advantage of collaborative representation, whereas the elevation in accuracy than KSR, KSRP, and KSRDR is due to the relaxation of constraints.

Conclusion
Recent years have witnessed a considerable resurgence of interest in sparse signal representation. This popularity comes from the fact that signals in most problems can be well recovered over a small set of basis vectors. However, these methods are computationally expensive due to optimizing the nondifferential objective function. To improve the performance while boosting the speed, a new classification algorithm named KLR is presented in this article, and it is applied to target recognition in high-resolution SAR images. The classification is implemented with respect to the data in a feature space induced by a nonlinear mapping. Since the mapping is of implicit form, it is infeasible to solve the presented problem directly. To produce the unique solution, the conventional methods impose strong sparsity constraint on the representation. Our idea in this article is to limit the feasible set of the representation with a much weaker constraint, l 2 -norm minimization. Thus, the complicated procedure to optimizing sparsity constraint problem has been converted to a simple least-square fitting. Due to the convexity and differentiability, the new problem can be solved in closed form. Therefore, the computational cost has been significantly lessened, and the classification accuracy has been simultaneously improved. Extensive experiments, including configuration invariance, depression angle invariance, outlier rejection, and 10-object recognition, have been carried out on the MSTAR dataset. The experimental results demonstrate that the proposed method performs much faster than various reference algorithms. Moreover, it achieves much higher recognition rates than the kernel sparse models, as well as the conventional approaches to SAR target recognition. The improvement in accuracy against the previous works performed on SAR target recognition is due to the advantage of collaborative representation, whereas the elevation in speed over kernel sparse models results from the relaxation of constraints.
The proposed method refers to matrix inverse for computing the encoding coefficients. Thus, it is difficult to deal with a large-scale classification task due to the bottleneck of realizing a highdimensional matrix inverse. Future attention will be paid to covering the shortage with dictionary learning skills. Another intriguing question for future work is whether the presented framework can be useful for other stages of ATR, for example, automatic target detection and discrimination.