Proc. SPIE. 9071, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XXV
KEYWORDS: Visual process modeling, Imaging systems, Cameras, Sensors, Quantum efficiency, Interference (communication), Modulation transfer functions, Performance modeling, Systems modeling, RGB color model
The necessity of color balancing in day color cameras complicates both laboratory measurements as well as modeling for task performance prediction. In this proceeding, we discuss how the raw camera performance can be measured and characterized. We further demonstrate how these measurements can be modeled in the Night Vision Integrated Performance Model (NV-IPM) and how the modeled results can be applied to additional experimental conditions beyond those used during characterization. We also present the theoretical framework behind the color camera component in NV-IPM, where an effective monochromatic imaging system is created from applying a color correction to the raw color camera and generating the color corrected grayscale image. The modeled performance shows excellent agreement with measurements for both monochromatic and colored scenes. The NV-IPM components developed for this work are available in NV-IPM v1.2.
This paper describes the sensitivity analysis capabilities to be added to version 1.2 of the NVESD imaging sensor model NV-IPM. Imaging system design always involves tradeoffs to design the best system possible within size, weight, and cost constraints. In general, the performance of a well designed system will be limited by the largest, heaviest, and most expensive components. Modeling is used to analyze system designs before the system is built. Traditionally, NVESD models were only used to determine the performance of a given system design. NV-IPM has the added ability to automatically determine the sensitivity of any system output to changes in the system parameters. The component-based structure of NV-IPM tracks the dependence between outputs and inputs such that only the relevant parameters are varied in the sensitivity analysis. This allows sensitivity analysis of an output such as probability of identification to determine the limiting parameters of the system. Individual components can be optimized by doing sensitivity analysis of outputs such as NETD or SNR. This capability will be demonstrated by analyzing example imaging systems.
Image noise originating from a sensor system is often the limiting factor in target acquisition performance, especially when limited by atmospheric transmission or low-light conditions. To accurately predict target acquisition range performance for a wide variety of imaging systems, image degradation introduced by the sensor must be properly combined with the limitations of the human visual system (HVS). This crucial step of incorporating the HVS has been improved and updated within NVESD’s latest imaging system performance model. The new noise model discussed here shows how an imaging system’s noise and blur are combined with the contrast threshold function (CTF) to form the system CTF. Model calibration constants were found by presenting low-contrast sine gratings with additive noise in a two alternative forced choice experiment. One of the principal improvements comes from adding an eye photon noise term allowing the noise CTF to be accurate over a wide range of luminance. The latest HVS noise model is then applied to the targeting task performance metric responsible for predicting system performance from the system CTF. To validate this model, human target acquisition performance was measured from a series of infrared and visible-band noise-limited imaging systems.
In the past five years, significant progress has been accomplished in the reduction of infrared detector pitch and detector size. Recently, longwave infrared (LWIR) detectors in limited quantities have been fabricated with a detector pitch of 5 μm. Detectors with 12-μm pitch are now becoming standard in both midwave infrared (MWIR) and LWIR sensors. Persistent surveillance systems are pursuing 10-μm detector pitch in large format arrays. The fundamental question that most system designers and detector developers desire an answer to is: "How small can you produce an infrared detector and still provide value in performance?" If a system is mostly diffraction-limited, then developing a smaller detector is of limited benefit. If a detector is so small that it does not collect enough photons to produce a good image, then a smaller detector is not much benefit. Resolution and signal-to-noise are the primary characteristics of an imaging system that contribute to targeting, pilotage, search, and other human warfighting task performance. We investigate the task of target discrimination range performance as a function of detector size/pitch. Results for LWIR and MWIR detectors are provided and depend on a large number of assumptions that are reasonable.
The GStreamer architecture allows for simple modularized processing. Individual GStreamer elements have been
developed that allow for control, measurement, and ramping of a blackbody, for capturing continuous imagery
from a sensor, for segmenting out a MRTD target, for applying a blur equivalent to that of a human eye and a
display, and for thresholding a processed target contrast for "calling" it. A discussion of each of the components
will be followed by an analysis of its performance relative to that of human observers.
In the past five years, significant progress has been accomplished in the reduction of infrared detector pitch and detector
size. Recently, longwave infrared detectors in limited quantities have been fabricated with a detector pitch of 5
micrometers. Detectors with 12 micrometer pitch are now becoming standard in both the midwave infrared (MWIR)
and longwave infrared (LWIR) sensors. Persistent surveillance systems are pursuing 10 micrometer detector pitch in
large format arrays. The fundamental question that most system designers and detector developers desire an answer to
is: "how small can you produce an infrared detector and still provide value in performance?" If a system is mostly
diffraction-limited, then developing a smaller detector is of limited benefit. If a detector is so small that it does not
collect enough photons to produce a good image, then a smaller detector is not much benefit. Resolution and signal-tonoise
are the primary characteristics of an imaging system that contribute to targeting, pilotage, search, and other human
warfighting task performance. In this paper, we investigate the task of target discrimination range performance as a
function of detector size/pitch. Results for LWIR and MWIR detectors are provided and depend on a large number of
assumptions that are reasonable.
This paper presents an image-based system performance model. The image-based system model uses an image metric to
compare a given degraded image of a target, as seen through the modeled system, to the set of possible targets in the
target set. This is repeated for all possible targets to generate a confusion matrix. The confusion matrix is used to
determine the probability of identifying a target from the target set when using a particular system in a particular set of
conditions. The image metric used in the image-based model should correspond closely to human performance. The
image-based model performance is compared to human perception data on Contrast Threshold Function (CTF) tests,
naked eye Triangle Orientation Discrimination (TOD), and TOD including an infrared camera system.
Image-based system performance modeling is useful because it allows modeling of arbitrary image processing. Modern
camera systems include more complex image processing, much of which is nonlinear. Existing linear system models,
such as the TTP metric model implemented in NVESD models such as NV-IPM, assume that the entire system is linear
and shift invariant (LSI). The LSI assumption makes modeling nonlinear processes difficult, such as local area
processing/contrast enhancement (LAP/LACE), turbulence reduction, and image fusion.
Using post-processing filters to enhance image detail, a process commonly referred to as boost, can significantly affect
the performance of an EO/IR system. The US Army's target acquisition models currently use the Targeting Task
Performance (TTP) metric to quantify sensor performance. The TTP metric accounts for each element in the system
including: blur and noise introduced by the imager, any additional post-processing steps, and the effects of the Human
Visual System (HVS). The current implementation of the TTP metric assumes spatial separability, which can introduce
significant errors when the TTP is applied to systems using non-separable filters. To accurately apply the TTP metric to
systems incorporating boost, we have implement a two-dimensional (2D) version of the TTP metric. The accuracy of the
2D TTP metric was verified through a series of perception experiments involving various levels of boost. The 2D TTP
metric has been incorporated into the Night Vision Integrated Performance Model (NV-IPM) allowing accurate system
modeling of non-separable image filters.
Image noise, originating from a sensor system, is often the limiting factor in target acquisition performance. This is
especially true of reflective-band sensors operating in low-light conditions. To accurately predict target acquisition
range performance, image degradation introduced by the sensor must be properly combined with the limitations of
the human visual system. This is modeled by adding system noise and blur to the contrast threshold function (CTF)
of the human visual system, creating a combined system CTF. Current U.S. Army sensor performance models
(NVThermIP, SSCAMIP, IICAM, and IINVD) do not properly address how external noise is added to the CTF as a
function of display luminance. Historically, the noise calibration constant was fit from data using image intensifiers
operating at low display luminance, typically much less than one foot-Lambert. However, noise calibration
experiments with thermal imagery used a higher display luminance, on the order of ten foot-Lamberts, resulting in a
larger noise calibration constant. To address this discrepancy, hundreds of CTF measurements were taken as a
function of display luminance, apparent target angle, frame rate, noise intensity and filter shape. The experimental
results show that the noise calibration constant varies as a function of display luminance. To account for this
luminance dependence, a photon shot noise term representing an additional limitation in the performance of the
human visual system is added to the observer model. The new noise model will be incorporated in the new U.S.
Army Integrated Performance Model (NV-IPM), allowing accurate comparisons over a wide variety of sensor
modalities and display luminance levels.
The TTP (Targeting Task Performance) metric, developed at NVESD, is the current standard US Army model to predict
EO/IR Target Acquisition performance. This model however does not have a corresponding lab or field test to
empirically assess the performance of a camera system. The TOD (Triangle Orientation Discrimination) method,
developed at TNO in The Netherlands, provides such a measurement. In this study, we make a direct comparison
between TOD performance for a range of sensors and the extensive historical US observer performance database built to
develop and calibrate the TTP metric. The US perception data were collected doing an identification task by military
personnel on a standard 12 target, 12 aspect tactical vehicle image set that was processed through simulated sensors for
which the most fundamental sensor parameters such as blur, sampling, spatial and temporal noise were varied. In the
present study, we measured TOD sensor performance using exactly the same sensors processing a set of TOD triangle
test patterns. The study shows that good overall agreement is obtained when the ratio between target characteristic size
and TOD test pattern size at threshold equals 6.3. Note that this number is purely based on empirical data without any
intermediate modeling. The calibration of the TOD to the TTP is highly beneficial to the sensor modeling and testing
community for a variety of reasons. These include: i) a connection between requirement specification and acceptance
testing, and ii) a very efficient method to quickly validate or extend the TTP range prediction model to new systems and
Given the frequent lack of a reference image or ground truth when performance testing Bayer pattern color filter array
(CFA) demosaicing algorithms, two new no-reference quality assessment algorithms are proposed. These new quality
assessment algorithms give a relative comparison of two demosaicing algorithms by measuring the presence of two
common artifacts in their output images. For this purpose, various demosaicing algorithms are reviewed, especially
adaptive color plane, gradient based methods, and median filtering, with particular attention paid to the false color and
edge blurring artifacts common to all demosaicing algorithms. Classic quality assessment methods which require a
reference image, such as MSE, PSNR, and ΔE, are reviewed, their typical usage characterized, and their associated
pitfalls identified. With this information in mind, the motivations for no-reference quality assessment are discussed. The
new quality assessment algorithms are then designed for a relative comparison of two images demosaiced from the same
CFA data by measuring the sharpness of the edges and determining the presence of false colors. Demosaicing algorithms
described earlier are evaluated and ranked using these new algorithms. A large quantity of real images is given for
review. These images are also used to justify those rankings suggested by the new quality assessment algorithms. This
work provides a path forward for future research investigating possible relationships between CFA demosaicing and
color image super-resolution.
This paper presents an image-based model for target identification performance. This model is intended as an alternative
to existing linear models such as NVThermIP. The image-based model allows arbitrary non-linear image processing to
be applied to actual images which are compared using a human perception model. This model simulates an image from
a given sensor and compares the simulated image to a reference high-quality image. For a given target set, the imagebased
model generates a confusion matrix which is used to calculate the average probability of identification. The
perception metric used to compare the images is a multiscale version of the SSIM. The output of the image-based model
is reasonably close to the output of the NVThermIP theory when tested on a standard linear sensor system. The output
also agrees well with data from a human perception test.
This paper presents progress in image fusion modeling. One fusion quality metric based on the Targeting Task
performance (TTP) metric and another based on entropy are presented. A human perception test was performed with
fused imagery to determine effectiveness of the metrics in predicting image fusion quality. Both fusion metrics first
establish which of two source images is ideal in a particular spatial frequency pass band. The fused output of a given
algorithm is then measured against this ideal in each pass band. The entropy based fusion quality metric (E-FQM) uses
statistical information (entropy) from the images while the Targeting Task Performance fusion quality metric (TTPFQM)
utilizes the TTP metric value in each spatial frequency band. This TTP metric value is the measure of available
excess contrast determined by the Contrast Threshold Function (CTF) of the source system and the target contrast. The
paper also proposes an image fusion algorithm that chooses source image contributions using a quality measure similar
to the TTP-FQM. To test the effectiveness of TTP-FQM and E-FQM in predicting human image quality preferences,
SWIR and LWIR imagery of tanks were fused using four different algorithms. A paired comparison test was performed
with both source and fused imagery as stimuli. Eleven observers were asked to select which image enabled them to
better identify the target. Over the ensemble of test images, the experiment showed that both TTP-FQM and E-FQM
were capable of identifying the fusion algorithms most and least preferred by human observers. Analysis also showed
that the performance of the TTP-FQM and E-FQM in identifying human image preferences are better than existing
fusion quality metrics such as the Weighted Fusion Quality Index and Mutual Information.
This paper presents a comparison of the predictions of NVThermIP to human perception experiment results in the
presence of large amounts of noise where the signal to noise ratio is around 1. First, the calculations used in the NVESD
imager performance models that deal with sensor noise are described outlining a few errors that appear in the
NVThermIP code. A perception experiment is designed to test the range performance predictions of NVThermIP with
varying amounts of noise and varying frame rates. NVThermIP is found to overestimate the impact of noise, leading to
pessimistic range performance predictions for noisy systems. The perception experiment results are used to find a best
fit value of the constant α used to relate system noise to eye noise in the NVESD models. The perception results are also
fit to an alternate eye model that handles frame rates below 30Hz and smoothly approaches an accurate prediction of the
performance in the presence of static noise. The predictions using the fit data show significantly less error than the
predictions from the current model.
Contrast enhancement and dynamic range compression are currently being used to improve the performance of infrared
imagers by increasing the contrast between the target and the scene content. Automatic contrast enhancement techniques
do not always achieve this improvement. In some cases, the contrast can increase to a level of target saturation. This
paper assesses the range-performance effects of contrast enhancement for target identification as a function of image
saturation. Human perception experiments were performed to determine field performance using contrast enhancement
on the U.S. Army RDECOM CERDEC NVESD standard military eight target set using an un-cooled LWIR camera.
The experiments compare the identification performance of observers viewing contrast enhancement processed images
at various levels of saturation. Contrast enhancement is modeled in the U.S. Army thermal target acquisition model
(NVThermIP) by changing the scene contrast temperature. The model predicts improved performance based on any
improved target contrast, regardless of specific feature saturation or enhancement. The measured results follow the
predicted performance based on the target task difficulty metric used in NVThermIP for the non-saturated cases. The
saturated images reduce the information contained in the target and performance suffers. The model treats the contrast
of the target as uniform over spatial frequency. As the contrast is enhanced, the model assumes that the contrast is
enhanced uniformly over the spatial frequencies. After saturation, the spatial cues that differentiate one tank from
another are located in a limited band of spatial frequencies. A frequency dependent treatment of target contrast is
needed to predict performance of over-processed images.
The current US Army target acquisition models have a dependence on magnification. This is due in part to the
structure of the observer Contrast Threshold Function (CTF) used in the model. Given the shape of the CTF,
both over-magnification and under-magnification can dramatically impact modeled performance. This paper
presents the results from two different perception studies, one using degraded imagery and the other using field
imagery. The results presented demonstrate the correlation between observer performance and model prediction
and provide guidance accurately representing system performance in under and over-magnified cases.
This paper presents the results of a performance comparison between superresolution reconstruction and dither, also
known as microscan. Dither and superresolution are methods to improve the performance of spatially undersampled
systems by reducing aliasing and increasing sampling. The performance measured is the probability of identification
versus range for a set of tracked, armored military vehicles. The performance improvements of dither and
superresolution are compared to the performance of the base system with no additional processing. Field data was
collected for all types of processing using the same basic sensor. This allows the performance to be compared without
comparing different sensors. The performance of the various methods is compared experimentally using human
perception tests. The perception tests results are compared to modeled predictions of the range performance. The
measured and modeled performance of all of the methods agree well.
Contrast enhancement and dynamic range compression are currently being used to improve the performance of infrared
imagers by increasing the contrast between the target and the scene content, by better utilizing the available gray levels
either globally or locally. This paper assesses the range-performance effects of various contrast enhancement algorithms
for target identification with well contrasted vehicles. Human perception experiments were performed to determine field
performance using contrast enhancement on the U.S. Army RDECOM CERDEC NVESD standard military eight target
set using an un-cooled LWIR camera. The experiments compare the identification performance of observers viewing
linearly scaled images and various contrast enhancement processed images. Contrast enhancement is modeled in the US
Army thermal target acquisition model (NVThermIP) by changing the scene contrast temperature. The model predicts
improved performance based on any improved target contrast, regardless of feature saturation or enhancement. To
account for the equivalent blur associated with each contrast enhancement algorithm, an additional effective MTF was
calculated and added to the model. The measured results are compared with the predicted performance based on the
target task difficulty metric used in NVThermIP.
In this research, a sensor performance measurement technique is developed similar to the Triangle Orientation Discrimination (TOD), but sinusoids are used instead of triangles. Also, instead of infrared systems, the technique is applied to the eye and direct view optics. This new technique is called Contrast Threshold Function Orientation Discrimination (CTFOD) and the result is a "system" contrast threshold function that can be used with Vollmerhausen's Target Task Performance (TTP) metric. The technique is a simple technique that can be measured in the field using a target board where the results provide for the eye, the optics transfer function and transmission, and any atmospheric turbulence effects that are present.
Superresolution processing is currently being used to improve the performance of infrared imagers through an increase in sampling, the removal of aliasing, and the reduction of fixed-pattern noise. The performance improvement of superresolution has not been previously tested on military targets. This paper presents the results of human perception experiments to determine field performance on the NVESD standard military eight (8)-target set using a prototype LWIR camera. These experiments test and compare human performance of both still images and movie clips, each generated with and without superresolution processing. Lockheed Martin's XR® algorithm is tested as a specific example of a modern combined superresolution and image processing algorithm. Basic superresolution with no additional processing is tested to help determine the benefit of separate processes. The superresolution processing is modeled in NVThermIP for comparison to the perception test. The measured range to 70% probability of identification using XR® is increased by approximately 34% while the 50% range is increased by approximately 19% for this camera. A comparison case is modeled using a more undersampled commercial MWIR sensor that predicts a 45% increase in range performance from superresolution.
Direct view optics is a class of sensors to include the human eye and the human eye coupled to rifle scopes, spotter scopes, binoculars, and telescopes. The target acquisition model for direct view optics is based on the contrast threshold function of the eye with a modification for the optics modulation transfer function and the optical magnification. In this research, we extend the direct view model for the application of facial identification. The model is described and the experimental method for calibrating the task of human facial identification is discussed.
This study determines the effectiveness of a number of image fusion algorithms through the use of the following image metrics: mutual information, fusion quality index, weighted fusion quality index, edge-dependent fusion quality index and Mannos-Sakrison’s filter. The results obtained from this study provide objective comparisons between the algorithms. It is postulated that multi-spectral sensors enhance the probability of target discrimination through the additional information available from the multiple bands. The results indicate that more information is present in the fused image than either single band image. The image quality metrics quantify the benefits of fusion of MWIR and LWIR imagery.
This paper describes the modeling of multispectral infrared sensors. The current NVESD infrared sensor model, NVTherm, models single spectral band sensors. The current NVTherm model is being updated to model third generation multispectral infrared sensors. A simple model for the target and its background radiance is presented here and typical results are reported for common materials. The proposed target radiance model supports band selection studies. Spectral atmospheric propagation modeling is accomplished using MODTRAN. Example radiance calculations are presented and compared to data collected for validation. The data supports rejecting the null hypothesis that the model is invalid.
Different systems are optimized for and are capable of addressing issues in the different spectral regions. Each sensor has its own advantages and disadvantages. The research presented in this paper focuses on the fusion of MWIR (0.3-0.5 μm) and LWIR (0.8-12 μm) spectrums on one IR Focal Plane Array (FPA). The information is processed and then displayed in a single image in an effort to analyze possible benefits of combining the two bands. The analysis addresses how the two bands differ by revealing the dominant band in terms of temperature value for different objects in a given scene, specifically the urban environment
This paper describes the use of a rotating test pattern or reticle to measure the Modulation Transfer Function (MTF) of a staring array sensor. The method finds the Edge Spread Function (ESF) from which the MTF can be calculated. The rotating reticle method of finding the ESF of a sensor has several advantages over the static tilted edge method. The need for precise edge alignment is removed. Motion blur is used to simultaneously average out the effect of undersampling and to oversample the edge. The improved oversampling allows reduction of the noise in the generated ESF while keeping a high resolution. A unique data readout technique reads edge data perpendicular to the edge. Perpendicular readout eliminates the need to know or estimate the slope of the tilted edge. This MTF measurement method is validated using simulation and actual data captured by a digital camera. The resulting ESF plots agree well with expected results.