Validation of infrared sensor model with field-collected imagery of unresolved unmanned aerial vehicle targets

Abstract. An infrared (IR) sensor model is validated using experimentally derived peak pixel signal-to-noise ratio (SNR) versus range for detection of either an unresolved or a resolved unmanned aerial vehicle (UAV) target. The model provided estimates the time-averaged peak SNR values for the ranges used in the field collection. A mid-wave infrared (MWIR) camera and a long-wave infrared (LWIR) camera provided the measured data. Commercially available UAVs are flown along a line from the cameras to a clear sky region for background. A laser range finder measures the range at seven stopping points along the path. The data result in five ranges of unresolved target information for the MWIR camera and in four ranges for the LWIR camera. We provide details for using the data collected from the model to match the cameras used in the field collection. Also, the processing used to extract peak SNR versus range from imagery is presented.


Introduction
Low cost, commercially available unmanned aerial vehicles (UAVs) increase the urgency of developing techniques to detect, recognize, and/or identify such targets. Of the 235 commercial counter UAV systems reviewed in the 2018 study by Bard College, 1 the vast majority of UAV systems use radar and radio frequency sensors for detection. This paper addresses detection using infrared (IR) imaging cameras. 2 The IR detection has potential advantages for UAVs with low radar cross sections and for those flying preplanned routes without active communications links.
Sensor models provide cost-effective means of designing new sensors or upgrading existing designs. Sensor models are effective in comparing relative sensor performance of two similar but different sensors. However, models are even more valuable if they are validated, that is, the model predictions are borne out by field measurements. Validation allows models to predict sensor performance of a given sensor.
The sensor model discussed here is a physics-based model for the detection of unresolved targets using either the mid-wave infrared (MWIR) or the long-wave infrared (LWIR) spectral bands. [3][4][5][6] The two cameras used for collecting the imagery are an LWIR camera and an MWIR camera. Both are operated as fixed focal plane array devices, i.e., no scanning or dither. 7 The data collection is described in detail in a prior paper, 8 which has provided radiometrically validated, highly resolved signatures for two commercially available UAVs. The results are directly applicable to the recognition and/or identification tasks, whereas this paper provides data and analyses applicable to the detection task. As the target ranges increase, the targets become unresolved.
The discrimination tasks (detection, recognition, and identification) are discussed in detail in literature from the Army's Night Vision and Electronic Systems Directorate. [9][10][11] Briefly, the recognition and identification tasks require that the target be resolved by the sensor, meaning multiple pixels on the target. Detection can be accomplished with unresolved targets with sufficient signal-to-noise ratio (SNR). Imaging sensors can provide detection capability. However, detection is often linked with searching and the search task often involves a large field of regard. In this paper, we use the sensor SNR as the primary metric for the sensor's ability to detect an unresolved target.
In this paper, we briefly describe the sensor model, the targets, the data collection, and a comparison between the model results and the actual measured SNR as a function of range. 12,13 Experimentally, great lengths are taken to mitigate the influence of clutter, which is typically a hindrance. [14][15][16][17] This effort, overall, is a validation of the proposed model.

Sensor Model
The model that we used to perform these studies is an L3 Technologies model called end-to-end MATLAB Sensor Model (ETEMS), a physics-based SNR model for calculating the performance of staring IR sensors for detection of unresolved UAVs. The model is an SNR model as a function of target, atmospheric, and sensor parameters. It is a basic SNR model but is extensive in that it includes many important sensor parameters such as dark current, read noise, optics emission temperature as well as a target model input and MODTRAN atmospheric transmission, and emission inputs.
Included in the temporal noise sources are the background shot noise from the scene (in this case, the dominant noise factor), shot noise of the thermal emission from the lens, shot noise from the thermal emission of the cold shield (small), dark current shot noise, and readout integrated circuit noise E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 6 9 7 Noise temporal ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi shot 2 bg þ shot 2 lensEmiss þ shot 2 dark þ read 2 q : (1) After the noise has been calculated, the signal is calculated. Using a modeled or measured target signature, the spectral target contrast or broadband target contrast is used. This contrast intensity is then converted to a contrast irradiance as a function of slant range to target. Care is taken to ensure that the target subtends less than one pixel prior to using the target contrast intensity. If the target is not subpixel, then the target signal intensity and the target background column from the target are converted to radiance with the total visible area. Once the target and background are in terms of radiance, then the pixel solid angle is used to calculate the irradiance from each individually. These two quantities are then used to calculate the target contrast irradiance. By using a generic target, the user calculates the target contrast irradiance directly from MODTRAN and blackbody radiances. Once the target contrast irradiance is known at our sensor aperture, the model calculates the signal as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 4 5 9 Signal where A opt is the sensor's entrance pupil area, t int is the integration time, PVF is the pulse visibility factor, E contrast is the target contrast irradiance, h is the Planck's constant, c is the speed of light, λ is the wavelength, τ optics is the transmission of the optics, τ ColdFilter is the transmission of the cold filter, and QE is the quantum efficiency. Here, PVF is the average energy on the detector for all possible spatial phases between the point spread function (PSF) and the detector area. Ensquared energy (not in the equation) is the maximum PVF (centered on the detector).
Once both signal and noise are expressed in terms of electrons, the SNR is simply the ratio of the two, which is easily convertible into other metrics, such as noise-equivalent irradiance, noise-equivalent temperature difference (NETD), among others, but this trade study focuses on SNR E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 5 8 7 In this study, we set the spatial noise to zero and we removed the spatial noise from the collected data by performing a temporal average and removing the spatial noise.

Targets and Sensors
There were two UAV targets observed during data collection. The first was the DJI Phantom 4 Pro (Fig. 1), with a characteristic dimension of 15.7 cm. The second target was the DJI Inspire 1 (Fig. 2), with a characteristic dimension of 20.1 cm. The characteristic dimension was found by outlining the target (segmentation) and then taking the square root  of the outlined target area. This characteristic dimension is considered the "small" dimension because some flux from an unresolved target can spill into background pixels and is thus excluded from the outlined region. The larger dimension is the square root of a box area that encompasses the UAV target (not to include the target blades). The idea of not including the blades of the UAVs stems from the work done by Fudala et al. 8 in which they found the contribution of the blades to be negligible. The large dimension for the DJI Phantom Pro 4, UAV1, is 23.8 cm and the large dimension for the DJI Inspire 1, UAV2, is 36.7 cm. In the LWIR, the Phantom Pro 4 has an average temperature of 35.2°C, where the RSSΔT is 1.44 K. In the MWIR, the Phantom Pro 4 had an average temperature of 36.7°C and a RSSΔT of 1.08 K.
In the LWIR, the Inspire 1 had an average temperature of 43.8°C and an RSSΔTT of 9.82 K. In the MWIR, the Inspire 1 had an average temperature of 39.1°C and an RSSΔT of 10.9 K. The average temperature provides a means for evaluating calibration (compared to the source blackbodies) and calculating intensity, and the RSSΔT is useful for calculating discrimination ranges (for object discrimination of resolved targets). The calibration procedures for these measurements are provided in Ref. 5. There are two IR sensors used in the sensor model validation study. One is an MWIR imager and the other is an LWIR imager. The MWIR imager is a type II superlattice (T2SL) imager that is cryogenically cooled, and the LWIR imager is a VOx uncooled microbolometer imager. The specifications of the imagers are found in Table 1.
These sensors provide very different characteristics for sensor functional comparison. One is an inexpensive uncooled LWIR and the other is a high-performance cryogenic MWIR sensor. In both cases, the calculations and the SNR measurements use the raw output of these sensors while viewing the UAVs, and the fixed pattern noise is removed from the scenes through temporal averaging. The SNR reported is the remaining temporal noise only.

Data Collection, Unresolved Targets
The data were taken with the goal of determining peak SNR versus range. Although seven different ranges were used in the MW and LW, only the final five ranges had unresolved targets in the MW (199, 336, 393, 451, and 508 m). In the LW, four ranges provided a suitable unresolved target (126, 199, 336, and 393 m). This shorter range from the LW camera was attributable to the MW sensor having a detector size approximately one-half as large (in one-dimensional) compared to the LW detector size. The target moved out in range along the line of sight of the sensor but was fixed in range during image capture. Therefore, locating the target in the imagery was aided by seeing the target at closer ranges, knowing that the target remained in a clear sky background region and did not move much (particularly within 100 consecutive frames) either horizontally or vertically. The MW camera provided 60 frames per second and the LW camera provided 30 frames per second. Figure 3 is an entire frame of the image and indicates the region of interest (ROI) for locating the target. The range to the target was provided using two techniques and both were applied to the data reduction. First, a laser range finder was applied to the target for discrete ranges where the UAV was directed to be stationary. Second, a GPS sensor on board provided more data on the distance from the sensor. Both of these data were used to analyze the SNR as a function of range.
The UAV was essentially stationary for frames near the end of the collection for each range and that was where the laser range finder verified the ranges. The seven ranges for which the data were processed were 37, 126, 199, 336, 393, 451, and 508 m, which were called increments 0 through 6. The GPS data (including range) were captured 10 times per second, slower than either camera. Onboard GPS information was processed and proved useful in confirming targets, particularly for the longest ranges (508 m for MW and 393 m for LW). Frames chosen for processing were from the right most section where the target was nearly stationary in the field of view.
Sensor imagery was collected from close range where the UAV was easily detected to long range where the UAV was difficult to find. The data were collected in multiple "increments" where recording is briefly stopped and then restarted with small gaps in the recording. If the UAV moved significantly during the gap in recording, it was very difficult to reacquire the UAV when the SNR was low. There was cloud clutter nearby, and there were other confusers, such as other aircraft or birds within the vicinity of the UAV. Track confusion did occur during the analysis process and  the GPS data (latitude, longitude, and altitude) were used to estimate the image pixel location in Xand Y-dimensions for reacquisition of the UAV and to confirm that we indeed reacquired the UAV. If reacquisition was successful, the maneuvers recorded by GPS and transformed into pixel space were mirrored in the target track locations.

Processing and Imagery
The background in the images included clear sky, a variety of clouds, and land with natural and cultural objects present. Unresolved targets could only be seen with the clear sky background with visual searching. Cloud or land background may have precluded finding the unresolved target.
The flight path along a line from the cameras to the UAV kept the UAV within the clear sky region. This allowed a visible sighting of the target for all but the longest ranges. For the longer ranges, knowing the previous target location in the image led us to a small 7 × 7 ROI. The 7 × 7 ROI worked, but just barely for the longer ranges (later in the collection). The clouds moved during the data collection and generally got within a few pixels of the target. Clouds were not allowed to impinge on the rim of the 7 × 7 ROI where the background pixels were located. When that happened, earlier frames at that same range were selected for processing.
The definition used for peak SNR is as follows: This calculation was made on 100 consecutive frames (collected at either 30 frames per second LW or 60 frames per second MW). The average and the standard deviation of the 100 peak SNR values constituted the peak SNR. Note that each frame was processed separately so the temporal noise was not reduced by frame averaging. We adopted the convention of expressing SNR and signal and noise by using optical power as our power. Our signal and noise were linearly related to optical power and therefore were not squared.
We used a 7 × 7 pixel grid centered on the peak pixel (within a larger ROI), as shown below in Fig. 4 (a real example is shown in Fig. 5). Our investigation included two ways of defining the background pixels. One used pixels from clear sky regions. The other defined background as the rim of the 7 × 7 ROI used in processing. Advantages of using pixels not near a target included the ability to detrend the background data without contamination by the target. Also, the number of background pixels was less constrained than the rim of the ROI. Similarly, the advantage of using the rim of the 7 × 7 ROI included simplicity and using background pixels near the target. The results did not differ significantly between the two methods. Our sensors undersampled the PSF so that a 3 × 3 area was sufficient to contain the target when the target was subpixel in size. For example, a Gaussian PSF was fit to the MW camera data, yielding a sigma of 0.8 pixels.
Some algorithms define the target by using a summation mask over a region around the peak pixel. For example, a 3 × 3 or 5 × 5 mask collects the values from the surrounding pixels. It has been noticed that sometimes the peak pixel value is less than a summation of power in the region immediately surrounding the peak pixel. However, we have defined the peak target pixel value to just be the peak pixel within our ROI, which we have identified in a manual process for each image. We then created a 7 × 7 area ROI to define the background pixels as the rim. The 3 × 3 and 5 × 5 boxes around the peak pixel can be thought of as guard rims to prevent the PSF from spreading target power into the background pixel region. Our pedestal is then the mean value of the 24 pixels that make up the rim of the 7 × 7 ROI. The noise in the denominator of the SNR definition is then the standard deviation of the 24 pixels. A discussion below will identify this noise as the total noise. This is the noise of the data itself, not considering any processing of the data by an algorithm or temporal averaging by a human observer.

Noise Discussion and Results
The effect of noise on observers is well quantified in the literature for imaging. 18 The literature is less abundant with regard to the effect of noise in IR detection systems. Calculating noise directly from images is easy enough to do, but the raw images are not the end product for these algorithms. A common way of treating noise is to classify it into temporal noise and spatial noise (fixed pattern noise). Spatial noise is readily reduced by averaging frames over time.  Temporal noise can be effectively treated by a differencer, subtracting the previous frame from the present frame. If you treat a single frame, the spatial noise in our data is two to three times as large as the temporal noise. The total noise is read directly from the single image, but spatial noise and temporal noise are added in quadrature to match the total noise. The remaining temporal noise for the MWIR is 6.6 digital units and the temporal noise for the LWIR is 5.8 digital units. These values correspond to clear sky noise with the fixed pattern noise removed.
We define the total noise as the standard deviation of the background pixels. One can detrend the background pixels if that is likely to be a part of the image processing. Generally, we notice ∼10% reduction of the total noise with detrending for the imaging sensors used. Our operational definition of temporal noise is the standard deviation of the background pixels after subtracting the previous frame (to remove spatial noise). Spatial noise is then obtained by subtracting the variance of the temporal noise from the variance of the total noise and taking the square root of the resulting difference. Both temporal noise and spatial noise can consist of more basic forms of noise, but those are not readily determined from imagery.
The sensor model provides temporal noise estimates along with peak pixel value for the SNR. The model suggests estimates of the spatial noise and clutter but avoids attempting to model spatial noise or clutter. The suggested noise estimates are based on experience with similar focal planes and common algorithms for reducing spatial noise. Clutter is grouped into four categories: none, low, medium, and high. The low, medium, and high levels of clutter are estimated from field-collected imagery, with no algorithmic processing. The user can lower the clutter based on algorithms on hand. Consequently, the validation is based on an SNR versus range where the noise is temporal. The actual imagery produces the results described above.

Peak SNR Results
Three parameters that yield the peak SNR value are given in the equation above. The three inputs and the resulting SNR values for 100 consecutive frames are represented in Fig. 6. The sensor model uses an average PVF in estimating the peak value by converting the radiant intensity in photons/ sec-steradians to electrons per second from the detectors. The phase of the PSF causes a variation from a maximum value to one-fourth the maximum value depending on the location of the center of the PSF with respect to the center of the detector. The model uses an average PVF obtained by moving the PSF to all possible positions on the detection since all locations are equally probable. Some of the data show an almost periodic variation, in part because of the target drifting (e.g., due to wind) during the 100-frame period. Sometimes the target is more stationary, and the effect is much less. One can imagine the PSF scanning across the detectors yielding a somewhat repetitive pattern in the data. Therefore, the purpose of averaging 100 frames of results is to emulate an average PVF, as used in the model. Keep in mind that the averaging is after the SNR is obtained for each frame. It is not an averaging intended to reduce the temporal noise, although some averaging might well be inherent (e.g., with human observers). With detection systems, the processing is more likely to be an algorithm so temporal noise averaging is not done.

MWIR SNR and Model Results
The sensor model is run for the targets and sensors above. MODTRAN is used to calculate the atmospheric transmission as well as the path radiance associated with the backgrounds. The MWIR results for UAV1 are shown in Fig. 7. The black curve is the measured SNR using the technique described above. The "Model SNR Small UAV1" and "Model SNR Large UAV1" are the two top curves, where one model is associated with the characteristic dimension and the other is the large dimension (both described above). In the model, once the target goes from resolved to unresolved, the intensity is assumed to be associated with a point source. Note that the small target goes from resolved to unresolved at around 340 m (notice first data points are separated between small and large targets; the large target is still resolved). Also, the modeled SNR curves assume a perfect diffraction-limited optical system and uniform detector response. The two top curves do not address both of these errors (finite size target and real sensor PSF).
In addition, the path radiance associated with the background is measured to be different than the MODTRAN prediction. The background radiance in the sensor model is corrected for both the two top curves and the two bottom Fig. 6 Multiplot showing the three factors involved in calculating the peak SNR and the resulting peak SNR for each of the 100 frames used at this range for the LW camera. curves. These errors constitute a 15% to 20% error in the background flux. The two "corrected" curves were associated with the correction of the finite target size and the real PSF. We implemented an average pulse visibility function for the real sensor modulation transfer function. The average PVF (associated with a large number of random PSF positions) dropped from 0.25 for a diffraction-limited system to 0.082 for an aberration-limited system. Also, the finite target size was convolved with the PSF to determine an effective drop in PVF for finite size unresolved targets. When the target was 1.0, 0.5, and 0.25 times the pixel pitch, the average PVF dropped to 0.062, 0.076, and 0.080, respectively. This approach was used to correct the effective average PVF associated with the real sensor MTF and the finite target size (even though unresolved).
The MWIR results for UAV2 are shown in Fig. 8. Unfortunately, when the authors took the UAV2 data, the flight rate for the UAV was fast and the increments associated with the SNR measurements were too far to collect reasonable SNR data. That is, the range for the five planned increments yielded longer range, but an SNR that was too low to measure. We were able to obtain one good measured SNR data point as shown. As in the case with UAV1, the two top curves are shown for the model without real PSF correction and without finite target size correction. The two bottom curves show the corrected curves with the real sensor PSF and the finite target size.
It is also worth mentioning that the measured data are closer to the small target assumption than the large target assumption. This is reasonable since there is clear background coming through the "box" that is the target, and as the target range increases, less flux strikes the detector in regions within the target. The target is less of a "box" than an unresolved target.
In the LWIR case, the results for UAV1 are shown in Fig. 9. In this case, the average PVF for the diffraction limited case is 0.45. The Fλ∕d for the sensor is 0.73 and the optics of the sensor are close to diffraction limited, so no correction is performed for the real MTF measurement as it is performed in the MWIR. The PVF correction is performed for the finite target size as shown in Fig. 9, but there is not much difference in the SNR from the original noncorrected SNR values.
The resolved target was assumed to be resolved and was not corrected. The correction was only applied after the target became smaller than the detector size or detector instantaneous field of view (IFOV). The measured SNR fell right between the large UAV1 dimension and the small UAV1 dimension.
The LWIR results for UAV2 are shown in Fig. 10. There are two curve pairs, where a curve pair corresponds to the small and large versions of UAV2. However, the two pairs vary with background temperature (infinite path radiance associated with the background). The closer measurements correspond with a background radiance of 33.5C and the longer range measurements correspond with a background radiance of 29.5C. The SNR is corrected for these background radiances, but unfortunately only a single-point SNR is retrieved from the data (as shown). If we had been able to reduce the SNR for the longer ranges, the two higher curves would have been applicable for the data.
The UAV flight path is shown in Fig. 11 (the UAV is flown to the area of the background outlined). While the clouds moved slowly and we used the hole in the clouds as shown (the UAV target is shown in the blow-up image to the right), the background temperature in the LWIR varied with the position of the hole (between the clouds) in the sky as well as the bottom of the hole to the top of the hole. This caused the background variation in the LWIR to vary from 33.5C to 29.5C.

Uncertainties and Errors
There are quite a few uncertainties associated with the measurements as well as errors associated with the model   calculations. Figure 9 shows the allocation of uncertainties in the measurements and errors in the modeling. The uncertainties include the target outline (affects both area and radiometry) uncertainty estimated at 3% error. The radiometric uncertainty is derived from the blackbody calibration of the target radiometry as well as spectral errors associated with the target emission and reflection. The transmission uncertainty is associated with the actual transmission of the atmosphere, compared to the MODTRAN calculation of transmission (note this is spectral). There are two path radiance components. The path radiance error is measured in this case and is found to be up to 20% different than the MODTRAN estimate. Since the path radiance is measured, we correct the error, but the measurement uncertainty of the path radiance measurement is estimated at 5%. This error is due to the fact that background temperature can vary greatly from point to point in the sky; however, MODTRAN uses general location information to estimate background path radiance. We measured the MTF of the sensor (and effectively the PSF), but the measurement error of this real MTF is estimated at about 5%. Finally, target size blur, (i.e., the impact of the finite target size on the PVF) we have handled this by taking a small target size and large target size and then comparing those target sizes with the sensor real PSF. This approach has an uncertainty estimated to be the highest at around 10%.
The modeling errors are the path radiance, the real MTF (and not diffraction limited), and the target size blur associated with the finite target size. All of these errors reduce the modeled SNR, where the component errors are shown. They are all biased toward a lower SNR, so the effect is not root sum squared. From a basic model that is corrected, these errors contribute to a 439% increase in SNR. So, for the corrected model, the SNR is 23% of the uncorrected SNR as shown in all worst-case cascaded errors.
The UAV shape results in a signature that contains background as well as target within a rectangular area. This results in targets that are not resolved in the sense of recognition or identification but has energy in more than a single pixel (bleeds over into adjacent pixels). For this study the effective area of the UAV is used to determine an unresolved distance. Once the target becomes smaller than a detector IFOV or pixel, then an assumed Gaussian target size is convolved with the PSF to determine the effect on the average PVF. This factor is used in the correction of the model calculations and is range-dependent, but the largest contributor in this case is around 20% reduction in the SNR. In addition, target size blur is more largely effected by the MWIR band due to camera parameters. The reason can be found by examining how a "point" is imaged by any imaging system through the PSF. For MWIR, the Fλ d value is 1.25, which causes the PSF to effectively fill a single pixel. Therefore, when the PSF is convolved with the actual UAV, the signal is effected greatly. However, for LWIR, the Fλ d value is 0.735, which causes the PSF to be much smaller than a single pixel and in turn has much smaller difference when convolved with the actual UAV.
The path radiance error can be characterized in numerous different ways. In temperature, the error shown is the background measurement correction (versus the MODTRAN model) and the background measurement. For a 2% change in flux that corresponds to a 13.5% change in Celsius temperature (note Kelvin may be a better metric for temperature), then the SNR error is 83%.