The target classification algorithm community is making a special effort to explicitly treat operating conditions
(OCs) in classifier assessments and performance modeling. This is necessary because humans do not intuitively
appreciate what makes classification difficult for computers-it just seems so easy to us. In analyzing OCs, some
OCs are more direct or primitive while others are more abstract or integrating. These more abstract or "Derived
OCs" provide an intermediate step between direct OCs and classifier performance. Similar to the target, sensor,
environment partition of OCs, the AFRL COMPASE Center introduces the "Mossing 3" partition of derived OCs
into "Clarity," "Uniqueness," and "Conformity." Clarity is primarily concerned with the relevant information
content available in the sensor data. Uniqueness is about the inherent separability between the types of objects
to be classified (i.e., the library) and between all those types and objects not known to the classifier. Conformity
is about the relationship between the OCs of the test instances and the OCs represented in the library types
or training data. Furthermore, by analyzing derived OCs from multiple perspectives, informative subpartitions
of the Mossing 3 are created. Clarity measures are well developed, particularly as image quality metrics. The
other partitions are less well developed, but relevant work exists and is brought into context. While derived OCs
and the Mossing 3 partition are not a complete solution to performance modeling, they help bring in powerful
existing technologies and should enrich and facilitate dialogue on classifier performance theory and modeling.
The Defense Advanced Research Projects Agency (DARPA) Video Verification of Identity (VIVID) program has
as its goal the development of the best video tracker ever. This goal is reached through a philosophy of on-the-fly
target modeling and the use of three distinct modules: a multiple-target tracker, a confirmatory identification
module, and a collateral damage avoidance/moving target detection module. Over the two years of VIVID
Phase I, progress appraisal of the ATR-like confirmatory identification module was provided to DARPA by the
Air Force Research Laboratory Comprehensive Performance Assessment of Sensor Exploitation (COMPASE)
Center through regular evaluations. This document begins with an overview of the VIVID system and its
approach to solving the multiple-target tracking problem. A survey of the data collected under VIVID auspices
and their use in the evaluation are then described, along with the operating conditions relevant to confirmatory
identification. Finally, the evaluation structure is presented in detail, including metrics, experiment design,
experiment construction techniques, and support tools.
Reliance on Automated Target Recognition (ATR) technology is essential to the future success of Intelligence, Surveillance, and Reconnaissance (ISR) missions. Although benefits may be realized through ATR processing of a single data source, fusion of information across multiple images and multiple sensors promises significant performance gains. A major challenge, as ATR fusion technologies mature, is the establishment of sound methods for evaluating ATR performance in the context of data fusion. The Deputy Under Secretary of Defense for Science and Technology (DUSD/S&T), as part of their ongoing ATR Program, has sponsored an effort to develop and demonstrate methods for evaluating ATR algorithms that utilize multiple data source, i.e., fusion-based ATR. This paper presents results from this program, focusing on the target detection and cueing aspect of the problem. The first step in assessing target detection performance is to relate the ground truth to the ATR decisions. Once the ATR decisions have been mapped to ground truth, the second step in the evaluation is to characterize ATR performance. A common approach is to vary the confidence threshold of the ATR and compute the Probability of Detection (PD) and the False Alarm Rate (FAR) associated with each threshold. Varying the threshold, therefore, produces an empirical performance curve relating detection performance to false alarms. Various statistical methods have been developed, largely in the medical imaging literature, to model this curve so that statistical inferences are possible. One approach, based on signal detection theory, generalizes the Receiver Operator Characteristic (ROC) curve. Under this approach, the Free Response Operating Characteristic (FROC) curve models performance for search problems. The FROC model is appropriate when multiple detections are possible and the number of false alarms is unconstrained. The parameterization of the FROC model provides a natural method for characterizing both the operational environment and the ability of the ATR algorithm to detect targets. One parameter of the FROC model indicates the complexity of the clutter by characterizing the propensity for false alarms. The second parameter quantifies the separability between clutter and targets. Thus, the FROC model provides a framework for modeling and predicting ATR performance in multiple environments. This paper presents the FROC model for single sensor data and generalizes the model to handle the fusion case.
The Deputy Under Secretary of Defense for Science and Technology (DUSD/S&T), as part of their ongoing ATR Program, has sponsored an effort to develop and demonstrate methods for evaluating ATR algorithms that utilize multiple data sources, i.e., fusion-based ATR. The AFRL COMPASE Center has formed a strong ATR evaluation team and this paper presents results from this program, focusing on the human-in-the-loop, i.e. assisted image exploitation. Reliance on Automated Target Recognition (ATR) technology is essential to the future success of Intelligence, Surveillance, and Reconnaissance (ISR) missions. Often, ATR technology is designed to aid the analyst, but the final decision rests with the human. Traditionally, evaluation of ATR systems has focused mainly on the performance of the algorithm. Assessing the benefits of ATR assistance for the user raises interesting methodological challenges. We will review the critical issues associated with evaluations of human-in-the-loop ATR systems and present a methodology for conducting these evaluations. Experimental design issues addressed in this discussion include training, learning effects, and human factors issues. The evaluation process becomes increasingly complex when data fusion is introduced. Even in the absence of ATR assistance, the simultaneous exploitation of multiple frames of co-registered imagery is not well understood. We will explore how the methodology developed for exploitation of a single source of data can be extended to the fusion setting.
Commercial availability of very high-resolution synthetic aperture radar (SAR) imagery will enable development of automatic target recognition (ATR) algorithms to exploit its rich information content. This availability also permits exploration of both empirical and first principles approaches for predicting ATR performance. This paper describes a recent collection of high resolution SAR imagery. It details the operating conditions represented by the data and provides recommended experiments designed to challenge ATR algorithms and performance prediction. This set of information, along with the imagery, is contained in a Problem Set that will be made available to the community. The imagery is from a Deputy Under Secretary of Defense (DUSD) for Science and Technology (S&T) sponsored collection using the Sandia National Laboratory and General Atomics Lynx Sensor. The Lynx is now available as a commercial off-the-shelf (COTS) sensor. It was designed for use in medium-altitude UAVs and manned platforms. It operates at Ku-band frequency in stripmap, spotlight, and ground moving target indicator modes. Imagery in this collection was collected at 4' resolution and was then also reprocessed to 1' resolution. The collection included several military vehicles with significant variation in target, sensor, and background conditions. Defined experiments in the Problem Set present ATR algorithm development challenges by defining development (training) sets with limited representation of operating conditions and test sets that explore the algorithm's ability to extend to more complex operating conditions. These challenges are critical to military employment of ATR because the real world contains much more variability than it will be possible to explicitly address in an algorithm. For example, neither the storage nor the search through an exhaustive bay of templates is achievable for any realistic application. Thus, advanced developments that allow robust performance in denied conditions will accelerate the transition of ATR to the field. Additional experiments in the Problem Set present challenges in ATR performance prediction. Here, the development imagery provides empirical data to support development of prediction approaches. Test imagery provides an opportunity to validate the prediction technique's ability to, for example, interpolate or extrapolate performance.
The AFRL COMPASE Center has developed and applied a disciplined methodology for the evaluation of recognition systems. This paper explores an element of that methodology related to the confusion matrix as a tabulation of experiment outcomes and its corresponding summary performance measures. To this end, the paper introduces terminology and the confusion matrix structure for experiment results. It provides several examples - from current Air Force programs - of summary performance measures and their relationship to the confusion matrix. Finally it considers the advantages and disadvantages of these summary performance measures and points to effective strategies for selecting such measures.
Reliance on Automated Target Recognition (ATR) technology is essential to the future success of Intelligence, Surveillance, and Reconnaissance (ISR) missions. Although benefits may be realized through ATR processing of a single data source, fusion of information across multiple images and multiple sensors promises significant performance gains. A major challenge, as ATR fusion technologies mature, will be the establishment of sound methods for evaluating ATR performance in the context of data fusion. This paper explores the issues associated with evaluations of ATR algorithms that exploit data fusion. Three major areas of concern are examined, as we develop approaches for addressing the fusion-based evaluation problem: Characterization of the testing problem: The concept of operating conditions, which characterize the test problem, requires some generalization in the fusion setting. For example, conditions such as articulation or model variant, which are of concern for synthetic aperture radar (SAR) data, may be of minor importance for hyperspectral imaging (HSI) methods. Conversely, solar illumination conditions, which have no effect on the SAR signature, will be critical for spectral based target recognition. In addition, the fusion process may introduce new operating conditions, such as registration accuracy. Developing image truth and scoring rules: The introduction of multiple data sources raises questions about what constitutes successful target detection. Ground truth must be associated with multiple data sources to score performance. Performance metrics: New performance metrics, that go beyond simple detection, identification, and false alarm rates, are needed to characterize performance in the context of image fusion. In particular, algorithm developers would benefit from an understanding of the salient features from each data source and how these features interact to produce the observed system performance.
In November of 2000, the Deputy Under Secretary of Defense for Science and Technology Sensor Systems (DUSD (S&T/SS)) chartered the ATR Working Group (ATRWG) to develop guidelines for sanctioned Problem Sets. Such Problem Sets are intended for development and test of ATR algorithms and contain comprehensive documentation of the data in them. A problem set provides a consistent basis to examine ATR performance and growth. Problem Sets will, in general, serve multiple purposes. First, they will enable informed decisions by government agencies sponsoring ATR development and transition. Problem Sets standardize the testing and evaluation process, resulting in consistent assessment of ATR performance. Second, they will measure and guide ATR development progress within this standardized framework. Finally, they quantify the state of the art for the community. Problem Sets provide clearly defined operating condition coverage. This encourages ATR developers to consider these critical challenges and allows evaluators to assess over them. Thus the widely distributed development and self-test portions, along with a disciplined methodology documented within the Problem Set, permit ATR developers to address critical issues and describe their accomplishments, while the sequestered portion permits government assessment of state-of-the-art and of transition readiness. This paper discusses the elements of an ATR problem set as a package of data and information that presents a standardized ATR challenge relevant to one or more scenarios. The package includes training and test data containing targets and clutter, truth information, required experiments, and a standardized analytical methodology to assess performance.
Early in almost every engineering project, a decision must be made about tools; should I buy off-the-shelf tools or should I develop my own. Either choice can involve significant cost and risk. Off-the-shelf tools may be readily available, but they can be expensive to purchase and to maintain licenses, and may not be flexible enough to satisfy all project requirements. On the other hand, developing new tools permits great flexibility, but it can be time- (and budget-) consuming, and the end product still may not work as intended. Open source software has the advantages of both approaches without many of the pitfalls. This paper examines the concept of open source software, including its history, unique culture, and informal yet closely followed conventions. These characteristics influence the quality and quantity of software available, and ultimately its suitability for serious ATR development work. We give an example where Python, an open source scripting language, and OpenEV, a viewing and analysis tool for geospatial data, have been incorporated into ATR performance evaluation projects. While this case highlights the successful use of open source tools, we also offer important insight into risks associated with this approach.
MSTAR is a SAR ATR exploratory development effort and has devoted significant resources to regular independent evaluations. This paper will review the current state of the MSTAR evaluation methodology. The MSTAR evaluations have helped bring into focus a number of issues related to SAR ATR evaluation (and often ATR evaluation in general). The principles from MSTAR's three years of evaluations are explained and evaluation specifics, from the selection of test conditions and figures-of-merit to the development of evaluation tools, are reported. MSTAR now has a more mature understanding of the critical aspects of independence in evaluation and of the general relationship between evaluation and the program's goals and the systems engineering necessary to meet those goals. MSTAR has helped to develop general concepts, such as assessing ATR extensibility and scalability. Other specific contributions to evaluation methods, such as nuances in figure-of-merit definitions, are also detailed. In summary, this paper describes the MSTAR framework for the design, execution, and interpretation of SAR ATR evaluations.
The recent public release of high resolution Synthetic Aperture Radar (SAR) data collected by the DARPA/AFRL Moving and Stationary Target Acquisition and Recognition (MSTAR) program has provided a unique opportunity to promote and assess progress in SAR ATR algorithm development. This paper will suggest general principles to follow and report on a specific ATR performance experiment using these principles and this data. The principles and experiments are motivated by AFRL experience with the evaluation of the MSTAR ATR.
Proc. SPIE. 3370, Algorithms for Synthetic Aperture Radar Imagery V
KEYWORDS: Target detection, Detection and tracking algorithms, Synthetic aperture radar, Software development, Palladium, Target recognition, Automatic target recognition, Algorithm development, System on a chip, Tin
Testing a SAR Automatic Target Recognition (ATR) algorithm at or very near its training conditions often yields near perfect results as we commonly see in the literature. This paper describes a series of experiments near and not so near to ATR algorithm training conditions. Experiments are setup to isolate individual Extended Operating Conditions (EOCs) and performance is reported at these points. Additional experiments are setup to isolate specific combinations of EOCs and the SAR ATR algorithm's performance is measured here also. The experiments presented here are a by-product of a DARPA/AFRL Moving and Stationary Target Acquisition and Recognition (MSTAR) program evaluation conducted in November of 1997. Although the tests conducted here are in the domain of EOCs, these tests do not encompass the `real world' (i.e., what you might see on the battlefield) problem. In addition to performance results this paper describes an evaluation methodology including the Extended Operating Condition concept, as well as, data; algorithm; and figures of merit. In summary, this paper highlights the sensitivity that a baseline Mean Squared Error ATR algorithm has to various operating conditions both near and varying degrees away from the training conditions.
Acoustic sensors can be used to detect, track and identify non-line-of-sight targets passively. Attempts to alter acoustic emissions often result in an undesirable performance degradation. This research project investigates the use of neural networks for differentiating between features extracted from the acoustic signatures of sources. Acoustic data were filtered and digitized using a commercially available analog-digital convertor. The digital data was transformed to the frequency domain for additional processing using the FFT. Narrowband peak detection algorithms were incorporated to select peaks above a user defined SNR. These peaks were then used to generate a set of robust features which relate specifically to target components in varying background conditions. The features were then used as input into a backpropagation neural network. A K-means unsupervised clustering algorithm was used to determine the natural clustering of the observations. Comparisons between a feature set consisting of the normalized amplitudes of the first 250 frequency bins of the power spectrum and a set of 11 harmonically related features were made. Initial results indicate that even though some different target types had a tendency to group in the same clusters, the neural network was able to differentiate the targets. Successful identification of acoustic sources under varying operational conditions with high confidence levels was achieved.