Prior work has shown that the masked target transform volume (MTTV) clutter metric provides a measure of scene clutter that better correlates to measured probability of detection for human observers than several previously published clutter metrics. Several factors involved in using the MTTV to assess clutter in imagery are discussed here. A previously published modification to the MTTV metric to provide a normalized output value comparable across different image sets regardless of scene size is reviewed. Initial MTTV development required knowledge of a scene's target signature and produced an unbounded metric value. Metric behavior is discussed for the case in which an average of several target signatures is used in place of a specific target signature. This allows the MTTV to be calculated for images that do not contain a target. It is shown that the user may trade computational efficiency with metric accuracy to suit a particular application. The sensitivity of the metric to variations in image noise level, target segmentation error, and viewing distance are also presented.
The Night Vision and Electronic Sensors Directorate's current time-limited search model, which makes use of the
targeting task performance (TTP) metric to describe imager quality, does not explicitly account for the effects of clutter
on observer performance. The masked target transform volume (MTTV) clutter metric has been presented previously,
but is first applied to the results of a vehicle search perception experiment with simulated thermal imagery here.
NVESD's Electro-Optical Simulator program was used to generate hundreds of synthetic images of tracked vehicles
hidden in a rural environment. 12 observers searched for the tracked vehicles and their performance is compared to the
MTTV clutter level, signal-to-clutter ratios using several clutter metrics from open literature, and to the product of target
size and contrast. The investigated clutter metrics included the Schmeider-Weathersby statistical variance, Silk's
statistical variance, Aviram's probability of edge detection metric, and Chang's target structural similarity metric. The
MTTV was shown to better model observer performance as measured by the perception experiment than any of the other
compared metrics, including the product of target size and contrast.
The Night Vision and Electronic Sensors Directorate's current time-limited search (TLS) model, which
makes use of the targeting task performance (TTP) metric to describe image quality, does not explicitly account for
the effects of visual clutter on observer performance. The TLS model is currently based on empirical fits to describe
human performance for a time of day, spectrum and environment. Incorporating a clutter metric into the TLS model
may reduce the number of these empirical fits needed. The masked target transform volume (MTTV) clutter metric
has been previously presented and compared to other clutter metrics. Using real infrared imagery of rural images
with varying levels of clutter, NVESD is currently evaluating the appropriateness of the MTTV metric. NVESD had
twenty subject matter experts (SME) rank the amount of clutter in each scene in a series of pair-wise comparisons.
MTTV metric values were calculated and then compared to the SME observers rankings. The MTTV metric ranked
the clutter in a similar manner to the SME evaluation, suggesting that the MTTV metric may emulate SME
response. This paper is a first step in quantifying clutter and measuring the agreement to subjective human
A perception experiment was performed in an effort to measure the effect of clutter on search performance while
keeping target size, target contrast, and system bandwidth constant. In the NVESD time-limited search (TLS) model,
detection performance is said to only vary with changes in target size and target-to-background contrast, if the imaging
system and the search time limit are left constant<sup>4,8</sup>. The results of this experiment show that changes in scene clutter
produce changes in detection performance when these other factors remain unchanged, thereby making a stronger case
for the inclusion of a clutter metric into the NVESD TLS model. When using real imagery, it is difficult to find good
examples of change in clutter without changes in target size, contrast, noise, or other factors also being present. Using
computer generated imagery of triangles and tilted squares allowed the clutter aspect of search to be experimentally
isolated. When applied to imagery in the perception experiment, the masked target transform volume clutter metric was
shown to correlate well with the average observer response time.
The next generation of night vision goggles will fuse image intensified and long wave infra-red to create a hybrid image that will enable soldiers to better interpret their surroundings during nighttime missions. Paramount to the development of such goggles is the exploitation of image quality measures to automatically determine the best image fusion algorithm for a particular task. This work will introduce a novel monotonic correlation coefficient to investigate how well possible image quality features correlate to actual human performance, which is measured by a perception study. The paper will demonstrate how monotonic correlation can identify worthy features that could be overlooked by the traditional Pearson correlation.
Perception tests establish the effects of spatially band-limited noise and blur on human observer performance. Previously, Bijl showed that the contrast threshold of a target image with spatially band-limited noise is a function of noise spatial frequency. He used the method of adjustment to find the contrast thresholds for each noise frequency band. A noise band exists in which the target contrast threshold reaches a peak relative to the threshold for higher- or lower-noise frequencies. Bijl also showed that the peak of this noise band shifts as high frequency information is removed from the target images.
To further establish these results, we performed forced-choice experiments. First, a Night Vision and Electronics Sensors Directorate (NVESD) twelve (12)-target infrared tracked vehicle image set identification (ID) experiment, second, a bar-pattern resolving experiment, and third, a Triangle Orientation Discrimination (TOD) experiment. In all of the experiments, the test images were first spatially blurred, then spatially band-limited noise was added. The noise center spatial frequency was varied in half-octave increments over seven octaves. Observers were shown images of varying target-to-noise contrasts, and a contrast threshold was calculated for each spatial noise band. Finally, we compared the Targeting Task Performance (TTP) human observer model predictions for performance in the presence of spatially band-limited noise with these experimental results.
The performance of image fusion algorithms is evaluated using image fusion quality metrics and observer performance
in identification perception experiments. Image Intensified (I<sup>2</sup>) and LWIR images are used as the inputs to the fusion
algorithms. The test subjects are tasked to identify potentially threatening handheld objects in both the original and
fused images. The metrics used for evaluation are mutual information (MI), fusion quality index (FQI), weighted fusion
quality index (WFQI), and edge-dependent fusion quality index (EDFQI). Some of the fusion algorithms under
consideration are based on Peter Burt's Laplacian Pyramid, Toet's Ratio of Low Pass (RoLP or contrast ratio), and
Waxman's Opponent Processing. Also considered in this paper are pixel averaging, superposition, multi-scale
decomposition, and shift invariant discrete wavelet transform (SIDWT). The fusion algorithms are compared using
human performance in an object-identification perception experiment. The observer responses are then compared to the
image fusion quality metrics to determine the amount of correlation, if any. The results of the perception test indicated
that the opponent processing and ratio of contrast algorithms yielded the greatest observer performance on average.
Task difficulty (V<sub>50</sub>) associated with the I<sup>2</sup> and LWIR imagery for each fusion algorithm is also reported.
A perception test determined which of several image fusion metrics best correlates with relative observer preference. Many fusion techniques and fusion metrics have been proposed, but there is a need to relate them to a human observer's measure of image quality. LWIR and MWIR images were fused using techniques based on the Discrete Wavelet Transform (DWT), the Shift-Invariant DWT (SIDWT), Gabor filters, Pixel averaging, and Principal Component Analysis (PCA). Two different sets of fused images were generated from urban scenes. The quality of the fused images was then measured using the mutual information metric (MINF), fusion quality index (FQI), edge-dependent fusion quality index (EDFQI), weighted-fusion quality index (WFQI), and the mean-squared errors between the fused and source images (MS(F-L), MS(F-M)). A paired-comparison perception test determined how observers rated the relative quality of the fused images. The observers based their decisions on the noticeable presence or absence of information, blur, and distortion in the images. The observer preferences were then correlated with the fusion metric outputs to see which metric best represents observer preference. The results of the paired comparison test show that the mutual information metric most consistently correlates well with the measured observer preferences.
Corrections are given for cell imbalance in the design and analysis of twelve (12)-target identification (ID) perception tests. Such tests are an important tool in the development of the Night Vision and Electronic Sensors Directorate (NVESD) observer performance model used in NVThermIP to compare electro-optical systems. It is shown that the partitions of the 12-target set previously used in perception experiments exhibit statistically significant cell imbalance. Results from perception testing are used to determine the relative difficulty of identifying different images in the set. A program is presented to partition the set into lists that are balanced according to the collected observer data. The relative difficulty of image subsets is shown to be related to the best-fit <i>V<sub>50</sub></i> values for the subsets. The results of past perception experiments are adjusted to account for cell imbalance using the subset <i>V<sub>50</sub></i> terms. Under the proper conditions, the adjusted results are shown to better follow the <i>TTP</i> model for observer performance.