Nearest neighbor classifiers are one of most common techniques for
classification and ATR applications. Hastie and Tibshirani propose a
discriminant adaptive nearest neighbor (DANN) rule for computing a
distance metric locally so that posterior probabilities tend to be
homogeneous in the modified neighborhoods. The idea is to enlongate or
constrict the neighborhood along the direction that is parallel or
perpendicular to the decision boundary between two classes. DANN
morphs a neighborhood in a linear fashion. In this paper, we extend
it to the nonlinear case using the kernel trick. We demonstrate the
efficacy of our kernel DANN in the context of ATR applications using a
number of data sets.
Many classifiers have been proposed for ATR applications. Given a set of training data, a classifier is built from the labeled training data, and then applied to predict the label of a new test point. If there is enough training data, and the test points are drawn from the same distribution (i.i.d.) as training data, then many classifiers perform quite well. However, in reality, there will never be enough training data or with limited computational resources we can only use part of the training data. Likewise, the distribution of new test points might be different from that of the training data, whereby the training data is not representative of the test data. In this paper, we empirically compare several classifiers, namely support vector machines, regularized least squares classifiers, C4.4, C4.5, random decision trees, bagged C4.4, and bagged C4.5 on IR imagery. We reduce the training data by half (less representative of the test data) each time and evaluate the resulting classifiers on the test data. This allows us to assess the robustness of classifiers against a varying knowledge base. A robust classifier is the one whose accuracy is the least sensitive to changes in the training data. Our results show that ensemble methods (random decision trees, bagged C4.4 and bagged C4.5) outlast single classifiers as the training data size decreases.
In ATR applications, each feature is a convolution of an image with a filter. It is important to use most discriminant features to produce compact representations. We propose two novel subspace methods for dimension reduction to address limitations associated with Fukunaga-Koontz Transform (FKT). The first method, Scatter-FKT, assumes that target is more homogeneous, while clutter can be anything other than target and anywhere. Thus, instead of estimating a clutter covariance matrix, Scatter-FKT computes a clutter scatter matrix that measures the spread of clutter from the target mean. We choose dimensions along which the difference in variation between target and clutter is most pronounced. When the target follows a Gaussian distribution, Scatter-FKT can be viewed as a generalization of FKT. The second method, Optimal Bayesian Subspace, is derived from the optimal Bayesian classifier. It selects dimensions such that the minimum Bayes error rate can be achieved. When both target and clutter follow Gaussian distributions, OBS computes optimal subspace representations. We compare our methods against FKT using character image as well as IR data.