The interpretability of an image indicates the potential intelligence value of the data. Historically, the National Imagery Interpretability Rating Scale (NIIRS) has been the standard for quantifying the intelligence potential based on image analysis by human observers. Empirical studies have demonstrated that spatial resolution is the dominant predictor of the NIIRS level of an image. Today, the value of imagery is no longer simply determined by spatial resolution, since additional factors such as spectral diversity and temporal sampling are significant. Furthermore, analyses are performed by machines as well as humans. Consequently, NIIRS no longer accurately quantifies potential intelligence value for an image or set of images. We are exploring new measures of information potential based on mutual information. Our research suggests that new measures of image “quality” based on information theory can provide meaningful standards that go beyond NIIRS. In our approach, mutual information provides an objective method for quantifying divergence across objects and activities in an image. This paper presents the rationale for our approach, the technical description, and the results of early experimentation to explore the feasibility of establishing an information-theoretic standard for quantifying the intelligence potential of an image.
Maritime situational awareness depends on accurate knowledge of the locations, types, and activities of ocean-bound vessels. Such data can be gathered by analyzing the motion patterns of vessel tracks collected using coastal radar, visual identification, and Automatic Identification System (AIS) reports. We have developed a technique for predicting the types of vessels from abstract representations of their motion patterns. Our approach involves constructing multiple state sequences which represent activities syntactically. From these sequences, we generate multi-state transition matrices, which are the central feature used to train a support-vector machine classifier. Applying this technique to historical AIS data, our model successfully predicts vessel type even in cases where vessels do not follow known routes. Using only location information as the base feature for our model, we circumvent classification issues that arise from vessels' non-compliance with AIS regulations as well as the inability to visually identify vessels.
The Video National Imagery Interpretability Rating Scale (VNIIRS) is a useful standard for quantifying the interpretability of motion imagery. Automated accurate assessment of VNIIRS would benefit operators by characterizing the potential utility of a video stream. For still, visible-light imagery the general image quality equation (GIQE) provides a standard model to automatically estimate the NIIRS of the image using sensor parameters, namely the ground sample distance (GSD), the relative edge response (RER), and signal-to-noise ratio (SNR). Typically, these parameters are associated with a specific sensor and the metadata correspond to specific image acquisition. For many tactical video sensors however, these sensor metadata are not available and it is necessary to estimate these parameters from information available in the imagery. We present methods for estimating the RER and SNR through analysis of the scene, i.e. the raw pixel data. By estimating the RER and SNR directly from the video data, we can compute accurate VNIIRS estimates for the video. We demonstrate the method on a set of video data.
KEYWORDS: Signal to noise ratio, Statistical analysis, Detection and tracking algorithms, Sensors, Image analysis, Image quality, Spatial resolution, Statistical modeling, Performance modeling, RGB color model
Analysis and measurement of perceived image quality has been an active area of research for decades. Although physical measurements of image parameters often correlate with human perceptions, user-centric approaches have focused on the observer’s ability to perform certain tasks with the imagery. This task-based orientation has led to the development of the Johnson Criteria and the National Imagery Interpretability Ratings Scale as standards for quantifying the interpretability of an image. A substantial literature points to three primary factors affecting human perception of image interpretability: spatial resolution, image sharpness as measured by the relative edge response, and perceived noise measured by the signal-to-noise ratio. For maritime and ocean surveillance applications, however, these factors do not fully represent the characteristics of the imagery. Images looking at the ocean surface can encompass a wide range of spatial resolutions. Fog, sun glint, and color distortion can degrade image interpretability. In this paper, we explore both the general factors and the domain specific concerns for quantifying image interpretability. In particular, we propose new metrics to assess the dynamic range and color balance for maritime surveillance imagery. We will present the new metrics and illustrate their performance on relevant image data.
The use of LIDAR (Light Imaging, Detection and Ranging) data for detailed terrain mapping and object recognition is becoming increasingly common. While the rendering of LIDAR imagery is expressive, there is a need for a comprehensive performance metric that presents the quality of the LIDAR image. A metric or scale for quantifying the interpretability of LIDAR point clouds would be extremely valuable to support image chain optimization, sensor design, tasking and collection management, and other operational needs. For many imaging modalities, including visible Electro-optical (EO) imagery, thermal infrared, and synthetic aperture radar, the National Imagery Interpretability Ratings Scale (NIIRS) has been a useful standard. In this paper, we explore methods for developing a comparable metric for LIDAR. The approach leverages the general image quality equation (IQE) and constructs a LIDAR quality metric based on the empirical properties of the point cloud data. We present the rationale and the construction of the metric, illustrating the properties with both measured and synthetic data.
Transmission and analysis of imagery for law enforcement and military missions is often constrained by the capacity of available communications channels. Nevertheless, achieving success in operational missions requires acquisition and analysis of imagery that satisfies specific interpretability requirements. By expressing these requirements in terms of the National Imagery Interpretability Ratings Scale (NIIRS), we have developed a method for predicting the NIIRS loss associated with various methods and levels of imagery compression. Our method, known as the Compression Degradation Image Function Index (CoDIFI) framework automatically predicts the NIIRS degradation associated with the specific image compression method and level of compression. In this paper, we first review NIIRS and methods for predicting it followed by the presentation of the CoDIFI framework and we put our emphasis on the results of the empirical validation experiments. By leveraging CoDIFI in operational settings, our goal is to ensure mission success in terms of the NIIRS level of imagery data delivered to users, while optimizing the use of scarce data transmission capacity.
Target tracking derived from motion imagery enables automated activity analysis. In this paper, we develop methods for automatically exploiting the track data to detect and recognize activities, develop models of normal behavior, and detect departure from normalcy. We have developed methods for representing activities through syntactic analysis of the track data, by “tokenizing” the track, i.e. converting the kinematic information into strings of symbols amenable to further analysis. The syntactic analysis of target tracks is the foundation for constructing an expandable “dictionary of activities.” Through unsupervised learning on the syntactic representations, we discover the canonical activities in a corpus of motion imagery data. The probability distribution of the learned activities is the “dictionary”. Newly acquired track data is compared to the dictionary to flag atypical behaviors as departures from normalcy. We demonstrate the methods with relevant data.
Image compression is an important component in modern imaging systems as the volume of the raw data collected is increasing. To reduce the volume of data while collecting imagery useful for analysis, choosing the appropriate image compression method is desired. Lossless compression is able to preserve all the information, but it has limited reduction power. On the other hand, lossy compression, which may result in very high compression ratios, suffers from information loss. We model the compression-induced information loss in terms of the National Imagery Interpretability Rating Scale or NIIRS. NIIRS is a user-based quantification of image interpretability widely adopted by the Geographic Information System community. Specifically, we present the Compression Degradation Image Function Index (CoDIFI) framework that predicts the NIIRS degradation (i.e., a decrease of NIIRS level) for a given compression setting. The CoDIFI-NIIRS framework enables a user to broker the maximum compression setting while maintaining a specified NIIRS rating.
Quantitative biomarkers for assessing the presence, severity, and progression of age-related macular degeneration (AMD) would benefit research, diagnosis, and treatment. This paper explores development of quantitative biomarkers derived from OCT imagery of the retina. OCT images for approximately 75 patients with Wet AMD, Dry AMD, and no AMD (healthy eyes) were analyzed to identify image features indicative of the patients’ conditions. OCT image features provide a statistical characterization of the retina. Healthy eyes exhibit a layered structure, whereas chaotic patterns indicate the deterioration associated with AMD. Our approach uses wavelet and Frangi filtering, combined with statistical features that do not rely on image segmentation, to assess patient conditions. Classification analysis indicates clear separability of Wet AMD from other conditions, including Dry AMD and healthy retinas. The probability of correct classification of was 95.7%, as determined from cross validation. Similar classification analysis predicts the response of Wet AMD patients to treatment, as measured by the Best Corrected Visual Acuity (BCVA). A statistical model predicts BCVA from the imagery features with R2 = 0.846. Initial analysis of OCT imagery indicates that imagery-derived features can provide useful biomarkers for characterization and quantification of AMD: Accurate assessment of Wet AMD compared to other conditions; image-based prediction of outcome for Wet AMD treatment; and features derived from the OCT imagery accurately predict BCVA; unlike many methods in the literature, our techniques do not rely on segmentation of the OCT image. Next steps include larger scale testing and validation.
Automated video quality assessment methods have generally been based on measurements of engineering parameters such as ground sampling distance, level of blur, and noise. However, humans rate video quality using specific criteria that measure the interpretability of the video by determining the kinds of objects and activities that might be detected in the video. Given the improvements in tracking, automatic target detection, and activity characterization that have occurred in video science, it is worth considering whether new automated video assessment methods might be developed by imitating the logical steps taken by humans in evaluating scene content. This article will outline a new procedure for automatically evaluating video quality based on automated object and activity recognition, and demonstrate the method for several ground-based and maritime examples. The detection and measurement of in-scene targets makes it possible to assess video quality without relying on source metadata. A methodology is given for comparing automated assessment with human assessment. For the human assessment, objective video quality ratings can be obtained through a menu-driven, crowd-sourced scheme of video tagging, in which human participants tag objects such as vehicles and people on film clips. The size, clarity, and level of detail of features present on the tagged targets are compared directly with the Video National Image Interpretability Rating Scale (VNIIRS).
Numerous practical applications for automated event recognition in video rely on analysis of the objects and their associated motion, i.e., the kinematics of the scene. The ability to recognize events in practice depends on accurate tracking objects of interest in the video data and accurate recognition of changes relative to the background. Numerous factors can degrade the performance of automated algorithms. Our object detection and tracking algorithms estimate the object position and attributes within the context of a dynamic assessment of video quality, to provide more reliable event recognition under challenging conditions. We present an approach to robustly modeling the image quality which informs tuning parameters to use for a given video stream. The video quality model rests on a suite of image metrics computed in real-time from the video. We will describe the formulation of the image quality model. Results from a recent experiment will quantify the empirical performance for recognition of events of interest.
Automated event recognition in video data has numerous practical applications. The ability to recognize events in practice depends on accurate tracking of objects in the video data. Scene complexity has a large effect on tracker
performance. Background models can address this problem by providing a good estimate of the image region surrounding the object of interest. However, the utility of the background model depends on accurately representing
current imaging conditions. Changing imaging conditions, such as lighting and weather, render the background model
inaccurate, degrading the tracker performance. As a preprocessing step, developing a set of robust background models
can substantially improve system performance. We present an approach to robustly modeling the background as a
function of the data acquisition conditions. We will describe the formulation of these models and discuss model
selection in the context of real-time processing. Using results from a recent experiment, we demonstrate empirically the
performance benefits from using the robust background modeling.
Ischemia and reperfusion injuries present major challenges for both military and civilian medicine. Improved methods for assessing the effects and predicting outcome could guide treatment decisions. Specific issues related to ischemia and reperfusion injury can include complications arising from tourniquet use, such as microvascular leakage in the limb, loss of muscle strength and systemic failures leading to hypotension and cardiac failure. Better methods for assessing the viability of limbs/tissues during ischemia and reducing complications arising from reperfusion are critical to improving clinical outcomes for at-risk patients. The purpose of this research is to develop and assess possible prediction models of outcome for acute limb ischemia using a pre-clinical model. Our model relies only on non-invasive imaging data acquired from an animal study. Outcome is measured by pathology and functional scores. We explore color, texture, and temporal features derived from both color and thermal motion imagery acquired during ischemia and reperfusion. The imagery features form the explanatory variables in a model for predicting outcome. Comparing model performance to outcome prediction based on direct observation of blood chemistry, blood gas, urinalysis, and physiological measurements provides a reference standard. Initial results show excellent performance for the imagery-base model, compared to predictions based direct measurements. This paper will present the models and supporting analysis, followed by recommendations for future investigations.
Automated event recognition in video data has numerous practical applications for security and transportation. The ability to recognize events in practice depends on precisely detecting and tracking objects of interest in the video data. Numerous factors, such as lighting, weather, camera placement, scene complexity, and data compression can degrade the performance of automated algorithms. As a preprocessing step, developing a set of robust background models can substantially improve system performance. Our object detection and tracking algorithms estimate the object position and attributes within the context of this model to provide more reliable event recognition under challenging conditions. We present an approach to robustly modeling the background as a function of the data acquisition conditions. One element of this approach is automated assessment of the image quality which informs the choice of which background model to use for a given video stream. The video quality model rests on a suite of image metrics computed in real-time from the video, whereas the background models are constructed from historical data collected over a range of conditions. We will describe the formulation of both models. Results from a recent experiment will quantify the empirical performance for recognition of events of interest.
Numerous methods exist for quantifying the information potential of imagery exploited by a human observer. The National Imagery Interpretability Ratings Scale (NIIRS) is a useful standard for intelligence, surveillance, and reconnaissance (ISR) applications. Extensions of this approach to motion imagery provide an understanding of the factors affecting interpretability of video data. More recent investigations have shown, however, that human observers and automated processing methods are sensitive to different aspects of image quality. This paper extends earlier research to present a model for quantifying the quality of motion imagery in the context of automated exploitation. In particular, we present a method for predicting the tracker performance and demonstrate the results on a range of video clips. Automated methods for assessing video quality can provide valuable feedback for collection management and guide the exploitation and analysis of the imagery.
Several methods have been developed for quantifying the information potential of imagery exploited by a
human observer. The National Imagery Interpretability Ratings Scale (NIIRS) has proven to be a useful
standard for intelligence, surveillance, and reconnaissance (ISR) applications. Extensions of this approach to
motion imagery have yielded a body of research on the factors affecting interpretability of motion imagery
and the development of a Video NIIRS. Automated methods for assessing image interpretability can provide
valuable feedback for collection management and guide the exploitation and analysis of the imagery.
Prediction models that rely on image parameters, such as the General Image Quality Equation (IQE), are
useful for conducting sensor trade studies and collection planning. Models for predicting image quality after
image acquisition can provide useful feedback for collection management. Several methods exist for still
imagery. This paper explores the development of a similar capability for motion imagery. In particular, we
propose methods for predicting the interpretability of motion imagery for exploitation by an analyst. A
similar model is considered for automated exploitation.
The adversary in current threat situations can no longer be identified by what they are, but by what they are doing. This
has lead to a large increase in the use of video surveillance systems for security and defense applications. With the
quantity of video surveillance at the disposal of organizations responsible for protecting military and civilian lives comes
issues regarding the storage and screening the data for events and activities of interest.
Activity recognition from video for such applications seeks to develop automated screening of video based upon the
recognition of activities of interest rather than merely the presence of specific persons or vehicle classes developed for
the Cold War problem of "Find the T72 Tank". This paper explores numerous approaches to activity recognition, all of
which examine heuristic, semantic, and syntactic methods based upon tokens derived from the video.
The proposed architecture discussed herein uses a multi-level approach that divides the problem into three or more tiers
of recognition, each employing different techniques according to their appropriateness to strengths at each tier using
heuristics, syntactic recognition, and HMM's of token strings to form higher level interpretations.
We present an image quality metric and prediction model for SAR imagery that addresses automated information
extraction and exploitation by imagery analysts. This effort drarws on our team's direct experience with the development
of the Radar National Imagery Interpretability Ratings Scale (Radar NIIRS), the General Image Quality Equations
(GIQE) for other modalities, and extensive expertise in ATR characterization and performance modeling. In this study,
we produced two separate GIQEs: one to predict Radar NIIRS and one to predict Automated Target Detection (ATD)
performance. The Radar NIIRS GIQE is most significantly influenced by resolution, depression angle, and depression
angle squared. The inclusion of several image metrics was shown to improve performance. Our development of an ATD
GIQE showed that resolution and clutter characteristics (e.g., clear, forested, urban) are the dominant explanatory
variables. As was the case with NIIRS GIQE, inclusion of image metrics again increased performance, but the
improvement was significantly more pronounced. Analysis also showed that a strong relationship exists between ATD
and Radar NIIRS, as indicated by a correlation coefficient of 0.69; however, this correlation is not strong enough that we
would recommend a single GIQE be used for both ATD and NIIRS prediction.
Automated target cueing (ATC) can assist analysts with searching large volumes of imagery. Performance of most
automated systems is less than perfect, requiring an analyst to review the results to dismiss false alarms or confirm
correct detections. This paper explores methods for improving the presentation and visualization of the ATC output,
enabling more efficient and effective review of the detections flagged by the ATC. The techniques presented in this
paper are applicable to a wide range of search problems using data from different sensors modalities. The
information available to the computer increases as ATC detections are either accepted or rejected by the analyst. It
is often easy to confirm obviously correct detections and dismiss obvious false alarms, which provides the starting
point for the automated updating of the visualization. In machine learning algorithms, this information can be used
to retrain or refine the classifier. However, this retraining process is appropriate only when future sensor data is
expected to closely resemble the current set. For many applications, the sensor data characteristics (viewing
geometry, resolution, clutter complexity, prevalence and types of confusers) are likely to change from one data
collection to the next. For this reason, updating the visualization for the current data set, rather than updating the
classifier for future processing, may prove more effective. This paper presents an adaptive visualization technique
and illustrates the technique with applications.
Biometrics, such as fingerprint, iris scan, and face recognition, offer methods for identifying individuals based on a
unique physiological measurement. Recent studies indicate that a person's electrocardiogram (ECG) may also
provide a unique biometric signature. Several methods for processing ECG data have appeared in the literature and
most approaches rest on an initial detection and segmentation of the heartbeats. Various sources of noise, such as
sensor noise, poor sensor placement, or muscle movements, can degrade the ECG signal and introduce errors into
the heartbeat segmentation. This paper presents a screening technique for assessing the quality of each segmented
heartbeat. Using this technique, a higher quality signal can be extracted to support the identification task. We
demonstrate the benefits of this quality screening using a principal component technique known as eigenpulse. The
analysis demonstrated the improvement in performance attributable to the quality screening.
Biometrics, such as fingerprint, iris scan, and face recognition, offer methods for identifying individuals based on a
unique physiological measurement. Recent studies indicate that a person's electrocardiogram (ECG) may also
provide a unique biometric signature. Current techniques for identification using ECG rely on empirical methods
for extracting features from the ECG signal. This paper presents an alternative approach based on a time-domain
model of the ECG trace. Because Auto-Regressive Integrated Moving Average (ARIMA) models form a rich class
of descriptors for representing the structure of periodic time series data, they are well-suited to characterizing the
ECG signal. We present a method for modeling the ECG, extracting features from the model representation, and
identifying individuals using these features.
Several methods have been developed for quantifying the information potential of imagery exploited by a human
observer. The National Imagery Interpretability Ratings Scale (NIIRS) has proven to be a useful standard for
intelligence, surveillance, and reconnaissance (ISR) applications. A comparable standard for automated information
extraction would be useful for a variety of applications, including tasking and collection management. This paper
examines the applicability of NIIRS to automated exploitation methods. In particular, we compare image-based
estimates of the NIIRS to observed performance of an automated target detection (ATD) algorithm. In addition, we
examine other image metrics and their relationship to ATD performance. The findings indicate that NIIRS is not a
good predictor of ATD performance, but methods that quantify the complexity of the clutter hold promise.
Computer vision methods, such as automatic target recognition (ATR) techniques, have the potential to improve the
accuracy of military systems for weapon deployment and targeting, resulting in greater utility and reduced collateral
damage. A major challenge, however, is training the ATR algorithm to the specific environment and mission. Because of
the wide range of operating conditions encountered in practice, advanced training based on a pre-selected training set
may not provide the robust performance needed. Training on a mission-specific image set is a promising approach, but
requires rapid selection of a small, but highly representative training set to support time-critical operations. To remedy
these problems and make short-notice seeker missions a reality, we developed Learning and Mining using Bagged
Augmented Decision Trees (LAMBAST). LAMBAST examines large databases and extracts sparse, representative
subsets of target and clutter samples of interest. For data mining, LAMBAST uses a variant of decision trees, called
random decision trees (RDTs). This approach guards against overfitting and can incorporate novel, mission-specific data
after initial training via perpetual learning. We augment these trees with a distribution modeling component that
eliminates redundant information, ignores misrepresentative class distributions in the database, and stops training when
decision boundaries are sufficiently sampled. These augmented random decision trees enable fast investigation of
multiple images to train a reliable, mission-specific ATR. This paper presents the augmented random decision tree
framework, develops the sampling procedure for efficient construction of the sample, and illustrates the procedure using
The literature is replete with assisted target recognition (ATR) techniques, including methods for ATR evaluation. Yet,
relatively few methods find their way to use in practice. Part of the problem is that the evaluation of an ATR may not go
far enough in characterizing its optimal use in practice. For example, a thorough understanding of a method's operating
conditions is crucial, e.g., performance across different sensor capabilities, scene context, target occlusions, etc. This
paper describes a process for a rigorous evaluation of ATR performance, including a sensitivity analysis. Ultimately, an
ATR algorithm is deemed valuable if it is actually utilized in practice by users. Thus, quantitative analysis alone is not
necessarily sufficient. Qualitative user assessment derived from user testing, surveys, and questionnaires is often needed
to provide a more complete interpretation of an evaluation for a particular method. We demonstrate our ATR evaluation
process using methods that perform target detection of civilian vehicles.
Automatic target detection (ATD) systems process imagery to detect and locate targets in support of intelligence,
surveillance, reconnaissance, and strike missions. Accurate prediction of ATD performance would assist in system
design and trade studies, collection management, and mission planning. Specifically, a need exists for ATD performance
prediction based exclusively on information available from the imagery and its associated metadata. In response to this
need, we undertake a modeling effort that consists of two phases: a learning phase, where image measures are computed
for a set of test images, the ATD performance is measured, and a prediction model is developed; and a second phase to
test and validate performance prediction. The learning phase produces a mapping, valid across various ATD algorithms,
which is even applicable when no image truth is available (e.g., when evaluating denied area imagery). Ongoing efforts
to develop such a prediction model have met with some success. Previous results presented models to predict
performance for several ATD methods. This paper extends the work in several ways: extension to a new ATD method,
application of the modeling to a new image set, and an investigation of systematic changes in the image properties
(resolution, noise, contrast). The paper concludes with a discussion of future research.
Recent investigations indicate cardiovascular function is a viable biometric. This paper explores biometric techniques
based on multiple modalities for sensing cardiovascular function. Analysis of data acquired with an electrocardiogram
(ECG) combined with corresponding data from pulse oximetry and blood pressure indicates that features
can be extracted from the signals, which correspond to individuals. While a person's heart rate can vary with mental
and emotional state, certain features corresponding to the heartbeat appear to be unique to the individual. Our protocol
induced a range of mental and emotional states in the subject and the analysis identifies features of the cardiovascular
signals that are invariant to mental and emotional state. Furthermore, the three measures of cardiovascular
function provide independent information, which can be fused to achieve robust performance compared to a single
The motion imagery community would benefit from standard measures for assessing image interpretability. The National Imagery Interpretability Rating Scale (NIIRS) has served as a community standard for still imagery, but no comparable scale exists for motion imagery. Several considerations unique to motion imagery indicate that the standard methodology employed in the past for NIIRS development may not be applicable or, at a minimum, requires modifications. The dynamic nature of motion imagery introduces a number of factors that do not affect the perceived interpretability of still imagery—namely target motion and camera motion. We conducted a series of evaluations to understand and quantify the effects of critical factors. This paper presents key findings about the relationship of perceived interpretability to ground sample distance, target motion, camera motion, and frame rate. Based on these findings, we modified the scale development methodology and validated the approach. The methodology adapts the standard NIIRS development procedures to the softcopy exploitation environment and focuses on image interpretation tasks that target the dynamic nature of motion imagery. This paper describes the proposed methodology, presents the findings from a methodology assessment evaluation, and offers recommendations for the full development of a scale for motion imagery.
Automatic target detection (ATD) systems process imagery to detect and locate targets in imagery in support of a
variety of military missions. Accurate prediction of ATD performance would assist in system design and trade
studies, collection management, and mission planning. A need exists for ATD performance prediction based exclusively
on information available from the imagery and its associated metadata. We present a predictor based on
image measures quantifying the intrinsic ATD difficulty on an image. The modeling effort consists of two phases:
a learning phase, where image measures are computed for a set of test images, the ATD performance is measured,
and a prediction model is developed; and a second phase to test and validate performance prediction. The learning
phase produces a mapping, valid across various ATR algorithms, which is even applicable when no image truth is
available (e.g., when evaluating denied area imagery). The testbed has plug-in capability to allow rapid evaluation
of new ATR algorithms. The image measures employed in the model include: statistics derived from a constant
false alarm rate (CFAR) processor, the Power Spectrum Signature, and others. We present performance predictors
for two trained ATD classifiers, one constructed using using GENIE ProTM, a tool developed at Los Alamos National
Laboratory, and the other eCognitionTM, developed by Definiens (http://www.definiens.com/products). We
present analyses of the two performance predictions, and compare the underlying prediction models. The paper
concludes with a discussion of future research.
A variety of change detection (CD) methods have been developed and employed to support imagery
analysis for applications including environmental monitoring, mapping, and support to military operations.
Evaluation of these methods is necessary to assess technology maturity, identify areas for improvement,
and support transition to operations. This paper presents a methodology for conducting this type of
evaluation, discusses the challenges, and illustrates the techniques. The evaluation of object-level change
detection methods is more complicated than for automated techniques for processing a single image. We
explore algorithm performance assessments, emphasizing the definition of the operating conditions (sensor, target, and environmental factors) and the development of measures of performance. Specific challenges include image registration; occlusion due to foliage, cultural clutter and terrain masking; diurnal differences; and differences in viewing geometry. Careful planning, sound experimental design, and access to suitable imagery with image truth and metadata are critical.
Detection and mapping of subsurface obstacles is critical for safe navigation of littoral regions. Sidescan sonar data offers a rich source of information for developing such maps. Typically, data are collected at two frequencies using a sensor mounted on a towfish. The major features of interest depend on the specific mission, but often include: objects on the bottom that could pose hazards for navigation, linear features such as cables or pipelines, and the bottom type, e.g., clay, sand, rock, etc. A number of phenomena can complicate the analysis of the sonar data: Surface return, vessel wakes, fluctuations in the position and orientation of the towfish. Developing accurate maps of navigation hazards based on sidescan sonar data is generally labor intensive. We propose an automated approach, which employs commercial software tools, to detect of these objects. This method offers the prospect of substantially reducing production time for maritime geospatial data products.
The motion imagery community would benefit from the availability of standard measures for assessing image interpretability. The National Imagery Interpretability Rating Scale (NIIRS) has served as a community standard for still imagery, but no comparable scale exists for motion imagery. Previous studies have explored the factors affecting the perceived interpretability of motion imagery and the ability to perform various image exploitation tasks. More recently, a study demonstrated an approach for adapting the standard NIIRS development methodology to motion imagery. This paper presents the first step in implementing this methodology, namely the construction of the perceived interpretability continuum for motion imagery. We conducted an evaluation in which imagery analysts rated the interpretability of a large number of motion imagery clips. Analysis of these ratings indicates that analysts rate the imagery consistently, perceived interpretability is unidimensional, and that interpretability varies linearly with log(GSD). This paper presents the design of the evaluation, the analysis and findings, and implications for scale development.
A fundamental problem in image processing is finding objective metrics that parallel human perception of image
quality. In this study, several metrics were examined to quantify compression algorithms in terms of perceived loss
of image quality. In addition, we sought to describe the relationship of image quality as a function of bit rate. The
compression schemes used were JPEG2000, MPEG2, and H.264. The frame size was fixed at 848x480 and the
encoding varied from 6000 k bps to 200 k bps. The metrics examined were peak signal to noise ratio (PSNR),
structural similarity (SSIM), edge localization metrics, and a blur metric. To varying degrees, the metrics displayed
desirable properties, namely they were monotonic in the bit rate, the group of pictures (GOP) structure could be
inferred, and they tended to agree with human perception of quality degradations. Additional work is being
conducted to quantify the sensitivity of these measures with respect to our Motion Imagery Quality Scale.
Motion imagery will play a critical role in future intelligence and military missions. The ability to provide a real time, dynamic view and persistent surveillance makes motion imagery a valuable source of information. The ability to collect, process, transmit, and exploit this rich source of information depends on the sensor capabilities, the available communications channels, and the availability of suitable exploitation tools. While sensor technology has progressed dramatically and various exploitation tools exist or are under development, the bandwidth required for transmitting motion imagery data remains a significant challenge. This paper presents a user-oriented evaluation of several methods for compression of motion imagery. We explore various codecs and bitrates for both inter- and intra-frame encoding. The analysis quantifies the effects of compression in terms of the interpretability of motion imagery, i.e., the ability of imagery analysts to perform common image exploitation tasks. The findings have implications for sensor system design, systems architecture, and mission planning.
Automatic target detection (ATD) systems process imagery to detect and locate targets in imagery in support of a variety of military missions. Accurate prediction of ATD performance would assist in system design and trade studies, collection management, and mission planning. A need exists for ATD performance prediction based exclusively on information available from the imagery and its associated metadata. We present a predictor based on image measures quantifying the intrinsic ATD difficulty on an image. The modeling effort consists of two phases: a learning phase, where image measures are computed for a set of test images, the ATD performance is measured, and a prediction model is developed; and a second phase to test and validate performance prediction. The learning phase produces a mapping, valid across various ATR algorithms, which is even applicable when no image truth is available (e.g., when evaluating denied area imagery). The testbed has plug-in capability to allow rapid evaluation of new ATR algorithms. The image measures employed in the model include: statistics derived from a constant false alarm rate (CFAR) processor, the Power Spectrum Signature, and others. We present a performance predictor using a trained classifier ATD that was constructed using GENIE, a tool developed at Los Alamos National Laboratory. The paper concludes with a discussion of future research.
Automated target recognition (ATR) methods hold promise for rapid extraction of critical information from imagery data to support military missions. Development of ATR tools generally requires large amounts of imagery data to develop and test algorithms. Deployment of operational ATR systems requires performance validation using operationally relevant imagery. For early algorithm development, however, restrictions on access to such data is a significant impediment, especially for the academic research community. To address this limitation, we have developed a set of grayscale imagery as a surrogate for panchromatic imagery that would be acquired from airborne sensors. This surrogate data set consists of imagery of ground order of battle (GOB) targets in an arid environment. The data set was developed by imaging scale models of these targets set in a scale model background. The imagery spans a range of operating conditions and provides a useful image set for initial explorations of new approaches for ATR development.
A major challenge for ATR evaluation is developing an accurate image truth that can be compared to an ATR algorithm's decisions to assess performance. We have developed a semi-automated video truthing application, called START, that greatly improves the productivity of an operator truthing video sequences. The user, after previewing the video selects a set of salient frames (called "keyframes"), each corresponding to significant events in the video. These keyframes are then manually truthed. We provide a spectrum of truthing tools that generates truth for additional frames from the keyframes. These tools include: fully-automatic feature tracking, interpolation, and completely manual methods. The application uses a set of diagnostic measures to manage the user's attention, flagging portions in the video for which the computed truth needs review. This changes the role of the operator from raw data entry, to that of expert appraiser supervising the quality of the image truth. We have implemented a number of graphical displays summarizing the video truthing at various timescales. Additionally, we can view the track information, showing only the lifespan information of the entities involved. A combination of these displays allows the user to manage their resources more effectively. Two studies have been conducted that have shown the utility of START: one focusing on the accuracy of the automated truthing process, and the other focusing on usability issues of the application by a set of expert users.
Motion imagery will play a critical role in future combat operations. The ability to provide a real time, dynamic view of the battlefield, as well as the capability to maintain persistent surveillance, together make motion imagery a valuable source of information for the soldier. Acquisition and exploitation of this rich source of information, however, depends on available communications bandwidth to transmit the necessary information to users. Methods for reducing bandwidth requirements include a variety of image compression and frame decimation techniques. This study explores spatially differential compression in which targets in the clips are losslessly compressed, while the background regions are highly compressed. This study evaluates the ability of users to perform standard target detection and identification tasks on the compressed product, compared to performance on uncompressed imagery or imagery compressed by other methods. The paper concludes with recommendations for future investigations.
The motion imagery community would benefit from the availability of standard measures for assessing image interpretability. The National Imagery Interpretability Rating Scale (NIIRS) has served as a community standard for still imagery, but no comparable scale exists for motion imagery. Several considerations unique to motion imagery indicate that the standard methodology employed in the past for NIIRS development may not be applicable or, at a minimum, requires modifications. The dynamic nature of motion imagery introduces a number of factors that do not affect the perceived interpretability of still imagery - namely target motion and camera motion. A set of studies sponsored by the National Geospatial-Intelligence Agency (NGA) have been conducted to understand and quantify the effects of critical factors. This study discusses the development and validation of a methodology that has been proposed for the development of a NIIRS-like scale for motion imagery. The methodology adapts the standard NIIRS development procedures to the softcopy exploitation environment and focuses on image interpretation tasks that target the dynamic nature of motion imagery. This paper describes the proposed methodology, presents the findings from a methodology assessment evaluation, and offers recommendations for the full development of a scale for motion imagery.
The development of a motion imagery (MI) quality scale, akin to the National Image Interpretibility Rating Scale (NIIRS) for still imagery, would have great value to designers and users of surveillance and other MI systems. A multiphase study has adopted a perceptual approach to identifying the main MI attributes that affect interpretibility. The current perceptual study measured frame rate effects for simple motion imagery interpretation tasks of detecting and identifying a known target. By using synthetic imagery, there was full control of the contrast and speed of moving objects, motion complexity, the number of confusers, and the noise structure. To explore the detectibility threshold, the contrast between the darker moving objects and the background was set at 5%, 2%, and 1%. Nine viewers were to detect or identify a moving synthetic "bug" in each of 288 10-second clip. We found that frame rate, contrast, and confusers had a statistically significant effect on image interpretibility (at the 95% level), while the speed and background showed no significant effect. Generally, there was a significant loss in correct detection and identification for frame rates below 10 F/s. Increasing the contrast improved detection and at high contrast, confusers did not affect detection. Confusers reduced detection of higher speed objects. Higher speed improved detection, but complicated identification, although this effect was small. Higher speed made detection harder at 1 Frame/s, but improved detection at 30 F/s. The low loss of quality at moderately lower frame rates may have implications for bandwidth limited systems. A study is underway to confirm, with live action imagery, the results reported here with synthetic.
The motion imagery community would benefit from the availability of standard measures for assessing image interpretability. The National Imagery Interpretability Rating Scale (NIIRS) has served as a community standard for still imagery, but no comparable scale exists for motion imagery. Several considerations unique to motion imagery indicate that the standard methodology employed in the past for NIIRS development may not be applicable or, at a minimum, require modifications. Traditional methods for NIIRS development rely on a close linkage between perceived image quality, as captured by specific image interpretation tasks, and the sensor parameters associated with image acquisition. The dynamic nature of motion imagery suggests that this type of linkage may not exist or may be modulated by other factors. An initial study was conducted to understand the effects target motion, camera motion, and scene complexity have on perceived image interpretability for motion imagery. This paper summarizes the findings from this evaluation. In addition, several issues emerged that require further investigation:
- The effect of frame rate on the perceived interpretability of motion imagery
- Interactions between color and target motion which could affect perceived interpretability
- The relationships among resolution, viewing geometry, and image interpretability
- The ability of an analyst to satisfy specific image exploitation tasks relative to different types of motion imagery clips
Plans are being developed to address each of these issues through direct evaluations. This paper discusses each of these concerns, presents the plans for evaluations, and explores the implications for development of a motion imagery quality metric.
Growing military requirements and shorter timelines are placing greater demands on imagery analysts. At the same time, advances in sensor technology have vastly increased the quantity and types of imagery data available. Together, these factors are driving toward greater reliance on automated exploitation tools, such as automated target cueing (ATC). Several studies indicate that operational performance depends not only on the accuracy of the ATC algorithm, but also on effectively conveying the ATC information to the user. Sonification, the presentation of information through audio signals, provides a novel method for assisting analysts with visual search tasks. This paper presents a recent proof-of-concept experiment in which analysts search for geometric targets in synthetic, two-band color imagery. The performance results indicate that sonification can enhance performance, particularly through false alarm mitigation. The range of performance across users also suggests that user training may play a big role in effective operational use of sonification methods.
A major challenge for ATR evaluation is developing an accurate image truth that can be compared to an ATR algo-rithm's decisions to assess performance. While many standard truthing methods and scoring metrics exist for stationary targets in still imagery, techniques for dealing with motion imagery and moving targets are not as prevalent. This is par-tially due to the fact that the moving imagery / moving targets scenario introduces the data association problem of as-signing targets to tracks. Video datasets typically contain far more imagery than static collections, increasing the size of the truthing task. Specifying the types and locations of the targets present for a large number of images is tedious, time consuming, and error prone. In this paper, we present an updated version of a complete truthing system we call the Scoring, Truthing, And Registration Toolkit (START). The application consists of two components: a truthing compo-nents that assists in the automated construction of image truth, and a scoring component that assesses the performance of a given algorithm relative to the specified truth. In motion imagery, both stationary and moving targets can be de-tected and tracked over portions of a motion imagery clip. We summarize the capabilities of START with emphasis on the target tracking and truthing diagnostics. The user manually truths certain key frames, truth for intermediate frames is then inferred and sets of diagnostics verify the quality of the truth. If ambiguous situations are encountered in the inter-mediate frames, diagnostics flag the problem so that the user can intervene manually. This approach can dramatically reduce the effort required for truthing video data, while maintaining high fidelity in the truth data. We present the results of two user evaluations of START, one addressing the accuracy and the other focusing on the human factors aspects of the design.
The receiver operating characteristic (ROC) curve is a standard method for quantifying the performance of a detection task where the signal of interest is embedded in noise. ROC analysis has been applied to a variety of signal detection problems including medical imaging, acoustics, and automated target detection (ATD). The free response operating characteristic (FROC) curve generalizes the ROC model for search problems. The FROC model is appropriate when multiple detections are possible and the number of false alarms is unconstrained. The shape of the FROC curve depends on the underlying probability distributions for the signal and noise. The general FROC model is presented and parameter estimation is discussed. An example illustrates the approach.
A major challenge for ATR evaluation is developing an accurate image truth that can be compared to an ATR algorithm's decisions to assess performance. While many standard truthing methods and scoring metrics exist for stationary targets in still imagery, techniques for dealing with motion imagery and moving targets are not as prevalent. This is partially because the moving imagery / moving targets scenario introduces the data association problem of assigning targets to tracks. This problem complicates the truthing and scoring task in two ways. First, video datasets typically contain far more imagery that must be truthed than static collections. Specifying the types and locations of the targets present for a large number of images is tedious, time consuming and error prone. Second, scoring ATR performance is ambiguous when assessing performance over a collection of video sequences. For example, if a target is tracked and successfully identified for 90% of a single video sequence, is the identification rate 90%, or is the single sequence evaluated in its entirety and the vehicle identification simply recorded as correct? In the former case, a bias will be introduced for easily identified targets that show up frequently in a sequence. In the latter case, the bias is avoided but system accuracy could be overstated.
In this paper, we present a complete truthing system we call the Scoring, Truthing, And Registration Toolkit (START). The first component is registration, which involves aligning the images of the same scene to a common reference frame. Once that reference frame has been determined, the second component, truthing, is used to specify target identity, posi-tion, orientation, and other scene characteristics. The final component, scoring, is used to assess the performance of a given algorithm as compared to the specified truth. In motion imagery, both stationary and moving targets can be de-tected and tracked over portions of a motion imagery clip. We present an approach to scoring performance in the context that provides a natural generalization of the standard methods for dealing with still imagery.
Every year, large volumes of imagery are collected for the sole purpose of evaluating Automatic Target Recognition (ATR) algorithms. However, this data cannot be used without adequate truthing information for each image. Truthing information typically consists of the types and locations of the targets present in the imagery. Specifying this information for a large number of images is tedious, time consuming, and error prone. In this paper, we present a complete truthing system we call the Scoring, Truthing, And Registration Toolkit (START). The first component is registration, which involves aligning heterogeneous and homogenous sensor images of the same scene to a common reference frame. Once that reference frame has been determined, the second component, truthing, is used to specify target identity, position, orientation, and other scene characteristics. The final component, scoring, is used to assess the performance of a given algorithm as compared to the specified truth. The scoring module allows statistical comparisons to assess algorithm sensitivity to specific operating conditions (e.g., sensitive to object occlusion).
Reliance on Automated Target Recognition (ATR) technology is essential to the future success of Intelligence, Surveillance, and Reconnaissance (ISR) missions. Although benefits may be realized through ATR processing of a single data source, fusion of information across multiple images and multiple sensors promises significant performance gains. A major challenge, as ATR fusion technologies mature, is the establishment of sound methods for evaluating ATR performance in the context of data fusion. The Deputy Under Secretary of Defense for Science and Technology (DUSD/S&T), as part of their ongoing ATR Program, has sponsored an effort to develop and demonstrate methods for evaluating ATR algorithms that utilize multiple data source, i.e., fusion-based ATR. This paper presents results from this program, focusing on the target detection and cueing aspect of the problem. The first step in assessing target detection performance is to relate the ground truth to the ATR decisions. Once the ATR decisions have been mapped to ground truth, the second step in the evaluation is to characterize ATR performance. A common approach is to vary the confidence threshold of the ATR and compute the Probability of Detection (PD) and the False Alarm Rate (FAR) associated with each threshold. Varying the threshold, therefore, produces an empirical performance curve relating detection performance to false alarms. Various statistical methods have been developed, largely in the medical imaging literature, to model this curve so that statistical inferences are possible. One approach, based on signal detection theory, generalizes the Receiver Operator Characteristic (ROC) curve. Under this approach, the Free Response Operating Characteristic (FROC) curve models performance for search problems. The FROC model is appropriate when multiple detections are possible and the number of false alarms is unconstrained. The parameterization of the FROC model provides a natural method for characterizing both the operational environment and the ability of the ATR algorithm to detect targets. One parameter of the FROC model indicates the complexity of the clutter by characterizing the propensity for false alarms. The second parameter quantifies the separability between clutter and targets. Thus, the FROC model provides a framework for modeling and predicting ATR performance in multiple environments. This paper presents the FROC model for single sensor data and generalizes the model to handle the fusion case.
The Deputy Under Secretary of Defense for Science and Technology (DUSD/S&T), as part of their ongoing ATR Program, has sponsored an effort to develop and demonstrate methods for evaluating ATR algorithms that utilize multiple data sources, i.e., fusion-based ATR. The AFRL COMPASE Center has formed a strong ATR evaluation team and this paper presents results from this program, focusing on the human-in-the-loop, i.e. assisted image exploitation. Reliance on Automated Target Recognition (ATR) technology is essential to the future success of Intelligence, Surveillance, and Reconnaissance (ISR) missions. Often, ATR technology is designed to aid the analyst, but the final decision rests with the human. Traditionally, evaluation of ATR systems has focused mainly on the performance of the algorithm. Assessing the benefits of ATR assistance for the user raises interesting methodological challenges. We will review the critical issues associated with evaluations of human-in-the-loop ATR systems and present a methodology for conducting these evaluations. Experimental design issues addressed in this discussion include training, learning effects, and human factors issues. The evaluation process becomes increasingly complex when data fusion is introduced. Even in the absence of ATR assistance, the simultaneous exploitation of multiple frames of co-registered imagery is not well understood. We will explore how the methodology developed for exploitation of a single source of data can be extended to the fusion setting.
Reliance on Automated Target Recognition (ATR) technology is essential to the future success of Intelligence, Surveillance, and Reconnaissance (ISR) missions. Although benefits may be realized through ATR processing of a single data source, fusion of information across multiple images and multiple sensors promises significant performance gains. A major challenge, as ATR fusion technologies mature, will be the establishment of sound methods for evaluating ATR performance in the context of data fusion. This paper explores the issues associated with evaluations of ATR algorithms that exploit data fusion. Three major areas of concern are examined, as we develop approaches for addressing the fusion-based evaluation problem: Characterization of the testing problem: The concept of operating conditions, which characterize the test problem, requires some generalization in the fusion setting. For example, conditions such as articulation or model variant, which are of concern for synthetic aperture radar (SAR) data, may be of minor importance for hyperspectral imaging (HSI) methods. Conversely, solar illumination conditions, which have no effect on the SAR signature, will be critical for spectral based target recognition. In addition, the fusion process may introduce new operating conditions, such as registration accuracy. Developing image truth and scoring rules: The introduction of multiple data sources raises questions about what constitutes successful target detection. Ground truth must be associated with multiple data sources to score performance. Performance metrics: New performance metrics, that go beyond simple detection, identification, and false alarm rates, are needed to characterize performance in the context of image fusion. In particular, algorithm developers would benefit from an understanding of the salient features from each data source and how these features interact to produce the observed system performance.
In November of 2000, the Deputy Under Secretary of Defense for Science and Technology Sensor Systems (DUSD (S&T/SS)) chartered the ATR Working Group (ATRWG) to develop guidelines for sanctioned Problem Sets. Such Problem Sets are intended for development and test of ATR algorithms and contain comprehensive documentation of the data in them. A problem set provides a consistent basis to examine ATR performance and growth. Problem Sets will, in general, serve multiple purposes. First, they will enable informed decisions by government agencies sponsoring ATR development and transition. Problem Sets standardize the testing and evaluation process, resulting in consistent assessment of ATR performance. Second, they will measure and guide ATR development progress within this standardized framework. Finally, they quantify the state of the art for the community. Problem Sets provide clearly defined operating condition coverage. This encourages ATR developers to consider these critical challenges and allows evaluators to assess over them. Thus the widely distributed development and self-test portions, along with a disciplined methodology documented within the Problem Set, permit ATR developers to address critical issues and describe their accomplishments, while the sequestered portion permits government assessment of state-of-the-art and of transition readiness. This paper discusses the elements of an ATR problem set as a package of data and information that presents a standardized ATR challenge relevant to one or more scenarios. The package includes training and test data containing targets and clutter, truth information, required experiments, and a standardized analytical methodology to assess performance.
New advanced imaging systems will soon be capable of collecting enormous volumes of imagery, placing a significant burden on the imagery analysts (IAs) that exploit these data. ATRs and other image understanding tools offer a way to assist IAs in exploiting large volumes of imagery more effectively and efficiently. The Defense Advanced Research Project Agency (DARPA) Semi-Automated IMINT Processing (SAIP) Program focuses on these technologies to assist IAs in the timely exploitation of SAR imagery. The SAIP system is an integrated set of imagery exploitation tools designed to improve the capability of the IA to support military missions in a tactical environment. To assess the utility of the SAIP technology, a mix of live and playback exercises were conducted. IAs exploited the imagery with the assistance of the SAIP technology. As a benchmark for comparison, the same imagery was exploited in an operational exploitation system without the benefit of SAIP assistance. This paper presents the methodology for assessing exploitation performance and discusses issues related to scoring exploitation performance. The results of a recent assessment event illustrate the issues and provide guidance for future work in this area.
Most Automatic Target Recognition algorithms consist of multiple processing stages, starting with a `detector' to locate objects of potential interest within an image. Then a target `classifier' identifies these objects by assigning them to specific target classes. The classifier uses the localized information in the image to assign each object to one of a number of categories, called targets, or if the object is not classifiable, it might be rejected as not being a target. This paper focuses on the properties associated with certain types of classifiers when applied to synthetic aperture radar (SAR) imagery. A common approach to classification is to construct some type of library of known templates for the targets of interest. The objects flagged by the detector are compared to each template and, based on some figure of merit, the object is classified. A popular classification rule is to calculate the mean squared error (MSE) between the detected object and each template, and assign the object to the target type that minimizes the observed MSE. Although minimization of MSE has some intuitive appeal and is fairly easy to implement, it has undesirable properties when applied to SAR data. In this paper, we investigate the statistical properties associated with MSE classification when the underlying pixel values are drawn from a long-tailed, asymmetric distribution, as is typical for SAR data. More important, however, are the within class sources of variance that arise in realistic scenarios. These sources of variance tend to inflate the MSE, even when the candidate object is compared to the correct template. This paper explores the statistical nature of this problem and illustrates it with a series of example images.
For over 20 years, the National Imagery Interpretability Rating Scale (NIIRS) has served as a standard to quantify the interpretability or usefulness of imagery. The need for a NIIRS arose from the inability of simple physical image quality measures, such as resolution, to adequately predict image interpretability. The NIIRS defines the levels of image interpretability by the types of tasks an analyst can perform with imagery of a given rating level. The NIIRS provides a simple, yet powerful, tool for assessing and communicating image quality and sensor system requirements. While the scale itself is simple, the process of developing the scale is both complex and resource intensive. Rigorous methods are needed to: develop appropriate image interpretation tasks, relate these tasks to the various levels of image quality, and validate that the scale is usable in practice and has the desirable properties of a rating scale. This paper presents three different NIIRS corresponding to three types of imagery. Visible, IR, and Radar. The paper also discusses the methodology used to develop and validate these rating scales.