Comparison of algorithms for contrast enhancement based on triangle orientation discrimination assessments by convolutional neural networks

Abstract. Within the last decades, a large number of techniques for contrast enhancement has been proposed. There are some comparisons of such algorithms for few images and figures of merit. However, many of these figures of merit cannot assess usability of altered image content for specific tasks, such as object recognition. In this work, the effect of contrast enhancement algorithms is evaluated by means of the triangle orientation discrimination (TOD), which is a current method for imager performance assessment. The conventional TOD approach requires observers to recognize equilateral triangles pointing in four different directions, whereas here convolutional neural network models are used for the classification task. These models are trained by artificial images with single triangles. Many methods for contrast enhancement highly depend on the content of the entire image. Therefore, the images are superimposed over natural backgrounds with varying standard deviations to provide different signal-to-background ratios. Then, these images are degraded by Gaussian blur and noise representing degradational camera effects and sensor noise. Different algorithms, such as the contrast-limited adaptive histogram equalization or local range modification, are applied. Then accuracies of the trained models on these images are compared for different contrast enhancement algorithms. Accuracy gains for low signal-to-background ratios and sufficiently large triangles are found, whereas impairments are found for high signal-to-background ratios and small triangles. A high generalization ability of our TOD model is found from the similar accuracies for several image databases used for backgrounds. Finally, implications of replacing triangles with real target signatures when using such advanced digital signal processing algorithms are discussed. The results are a step toward the assessment of those algorithms for generic target recognition.


Introduction
For remote sensing applications and reconnaissance, acquisition and operation of cameras in different spectral bands are required, and each has its own pros and cons. The best possible choice among devices for procurement is therefore dependent on the imager performance for the desired task, e.g., the detection, recognition, or identification (DRI) of distant targets with a background composed of vegetation, urban structures, and sky. Camera data can be acquired in field trials for characterization of single devices. However, these measurements are time consuming and expensive. Furthermore, the possession of the device is required. Therefore, modeling and image-based simulation of imagers are useful and important for the assessment of imagers. Such tools become even more important when scene-dependent advanced digital signal processing (ADSP) techniques are used in the device, for their impact on performance is difficult to predict. In this paper, the effect of contrast enhancement (CE) algorithms, which have so far mainly evaluated in terms of esthetic perception, is considered. *Address all correspondence to Daniel Wegner, daniel.wegner@iosb.fraunhofer.de Triangle orientation discrimination (TOD) 1 is a well-established image-based approach for the characterization of electro-optical system performance, especially for range performance assessment in remote sensing applications. 2 It models the DRI tasks for real targets by a simplified recognition task. The original idea was that an observer has to determine the orientation of an equilateral triangle directing in four directions (up, down, left, and right) shown on a display, which is fed by an imaging system. The ability to clearly discriminate the orientation is reduced depending on different types of degradation, e.g., optical diffraction blur and sensor noise.
However, due to the resurgence of machine learning, automatic target recognition 3 is becoming increasingly important. The importance of machine vision applications also led to the emergence of scalable compression frameworks 4,5 aimed at high lossy compression while simultaneously preserving image quality for machine and human vision. In contrast to human observers, the performance of these methods does not depend on properties of the display but merely on the digital output of the imaging system. A prominent and frequently used approach for machine vision is convolutional neural networks (CNN). Such CNN models also have been trained on artificial images of triangles to perform the TOD discrimination and validated on acquired camera data. 6 Therefore, these models can be used for automated camera tests in the lab by means of scene projectors. 7 In this paper, CNN models for TOD discrimination are trained and validated on degraded artificial images of single triangles superimposed over natural backgrounds from Open Images V6. 8 In addition, the training data are processed by CE algorithms from Table 1 with equal probabilities. Then, a trained model is validated on images with varying error levels of background σ background and Gaussian noise σ noise . Parts of this work have already been published elsewhere. 32 In Sec. 2, the considered CE algorithms, the model setup, and the training procedure are described. Section 3 shows the accuracies on validation images with varying background and noise levels. Accuracy differences between individual CE algorithms and identity are shown for varying values of the signal-to-background ratio SNR background , signal-to-noise ratio SNR noise , and triangle circumradius r. Finally, results are discussed in Sec. 4, which concludes the paper.

Considered Contrast Enhancement Algorithms
Several methods for CE have been proposed within the past decades. Various modifications to conventional global histogram equalization have been proposed to counteract mean brightness shifts 11,12 leading to annoying artifacts and allowing smooth transitions to identity. 25,26 Also learning-based methods [33][34][35][36][37] as well as methods based on image decomposition 38,39 have been proposed. The decomposition often relies on color information, making the methods inapplicable to single channel image data. However, the scope of this work is limited to easily implemented algorithms, given in Table 1, that operate on single channel image data.

Data Generation
For the training of models for TOD, 1 images of triangles are generated with varying contrasts, sizes, and four orientations, i.e., up, down, left, and right. Misalignment angles uniformly distributed in ½−15 deg; 15 deg are added to the orientation angles of the triangles to make the models more robust to misalignment. Exceeding the maximum rotation angle 15 deg would lead to incorrect labeling because rotations of an equally sided triangle by 30 deg result in other labeled orientations due to the 120 deg rotational symmetry. This rotation is crucial when applying models on real camera data because some misalignment between the field of view of a camera and a target is unavoidable.

Background Overlay
Background images are extracted from OpenImages V6 8 as the random square crops. RGB images are converted to floating-point grayscale images. Mean μ crop and standard error σ crop are calculated within these crops. Then, gray levels I triangleþoverlay ðx; yÞ of an image with single triangle and background overlay are calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 2 0 1 c background is a constant gray level over the entire image. a triangle is an offset value only added for pixels related to the triangle. In Eq. (1), the pixel values of the image crop I crop ðx; yÞ are normalized by subtracting the mean μ crop and dividing by the corrected error, and E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 1 1σ  30 Fahnestock et al. 1983 with thres ¼ 10 −4 . Without this correction, Eq. (1) is not well-defined for uniform image sections because in this case the denominator would be σ crop ¼ 0. Then, the normalized gray levels related to the cropped natural background are scaled to have a specific standard deviation σ set . Therefore, the signal-to-background ratio is expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 6 8 6 SNR background ½dB ¼ 10 log 10 a triangle σ set : (3)

Degradations
Several image degradations representing typical camera effects are applied. Temporal noise is applied as uncorrelated additive Gaussian noise. Fixed pattern noise of a sensor is modeled as line-and column-based additive Gaussian noise. Linear motion blur on the triangle is applied to represent moving targets. Stabilization errors due to camera vibration are applied as linear motion blur and Gaussian blur on the triangle with a background overlay. Blur due to optical diffraction by circular apertures is applied by filters with r ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi . These filters represent Airy disks 40 as the diffraction patterns of a circular aperture. J 1 ðxÞ is the Bessel function of the first kind and first order. To provide optical diffraction blur for varying values of aperture diameter D, detector pixel pitch p, wavelength λ, and focal length f, a dimensionless scaling factor s is introduced as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 4 5 9 A variety of physical parameters (λ, f, D, and p) can be realized by random sampling of s from a uniform distribution in ½0.1; s max . s max ¼ 10 is chosen to limit the proportion of severely degraded images, which aggravate the model training due to little possible accuracy gains compared with statistical fluctuations. Optical diffraction blur is applied by spatial filtering with random 2D kernels of width and height K ¼ 6s max . The kernel size K is limited due to lack of information beyond the borders of images of finite size. The scaling factor s ≤ s max is therefore also limited because higher values lead to radial kernel profiles biased by clipping effects due to the limited kernel size. To reduce aliasing due to the oscillatory form of the Airy disk [Eq. (4)], larger kernels f ij of f os K × f os K pixels are generated with an oversampling factor f os ¼ 8. These extended kernels are downsampled by average pooling with this oversampling factor to give K × K kernels g ij . Normalized filter kernels are then formed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 2 8 1g Several aliasing effects may occur due to small detector fill factors ff < 1 (the ratio between the detector dimension and pitch) or different shapes of detector footprints, e.g., rhombic or circular. These effects can be realized by masking extended kernels of Airy disks f ij with the detector profile before average pooling. However, this option is not used in this work due to the rare use of nonsquare detectors, low signal-to-noise ratio for low fill factors, 41 and faster image generation.

Contrast Enhancement
The algorithms in Table 1 are applied on 50% of the degraded image with equal probabilities. Due to the complex and divergent control flow of some algorithms, these methods are implemented and calculated by a separate application on the CPU. In Fig. 1, pristine training examples, as well as degraded and ADSP processed ones, are shown. These training examples are generated online during training to have a practically infinite amount of training data immune to overfitting. However, the source of background images and the number of possible crops is large but finite.

Model Setup
A conventional CNN architecture shown in Fig. 2 is used for TOD classification on images of dimensions 2 n × 2 n . To facilitate the model training, the input image is normalized by linear shifting and scaling of pixel values to have a mean of 0 and a standard deviation of 1 over spatial dimensions. Uniform input images with a standard deviation of 0 are not scaled. Then, the normalized image is downsampled by a chain of building blocks until the spatial dimensions are reduced to 2. Each building block consists of two 2D convolutional layers with 3 × 3 kernels and rectified linear unit activations (ReLUs) and a subsequent 2 × 2 max pooling layer. Hence, downsampling by a factor 2 is applied per block. The spatial dimensions are reduced, and the number of feature maps, given as  image is fed into a chain of blocks. Each block consisting of two 2D convolutional layers with ReLU activation and subsequent max pooling layer downsamples the spatial dimensions while increasing the number of feature maps. The last block is followed by two dense layers and a final softmax layer to provide probabilities for all directions (left, up, right, and down). n feature-maps ¼ bN 0 · g i c; (7) increases for each block i ≥ 0. b•c is the floor function yielding the largest integer that is smaller than the argument. Two subsequent dense layers and a final softmax layer provide the probabilities for the four orientations. A default configuration with the initial number of filters N 0 ¼ 20, growth factor g ¼ 1.2, and first dense layer size L ¼ 1024 is arbitrarily chosen.

Model Training
The models are trained and evaluated by Python 3.9/Tensorflow 2.8. For optimization, ADAM 42 with a learning rate η ¼ 0.001 is used. Weights are initialized by He normal initialization. 43 Models are trained for N ¼ 1000 iterations. Despite slower training, techniques for acceleration of training, such as batch normalization, 44 weight normalization, 45 or adaptive gradient clipping, 46 were deliberately omitted to achieve smaller models with faster inference, which are compatible for running on edge TPUs. 47 The loss function is cross entropy.
In each iteration, new sample images of triangles with background overlays are generated on the fly during the training for data augmentation. Triangles have a random size, position, and arbitrary orientation angle in [0 deg, 360 deg]. In addition, these images are impaired by the prescribed degradations. 50% of the training images were enhanced with one of the 25 methods for CE with equal probabilities. Corresponding disjoint sets of background images are randomly chosen from the respective partition of the OpenImages V6 database. For evaluation of model performance over the degradation parameters, i.e., SNR background , SNR noise , and triangle size, images are generated in the same way as background images chosen from the test subset of the database.
The question arises how large the percentage of degraded and ADSP processed images in the training data should be to obtain acceptable accuracies on validation sets of pristine and degraded images. 64 × 64 models with different percentages of degraded and processed images in the training data were trained and evaluated on different kinds of validation data. The respective accuracies on the validation data are shown in Fig. 3. It can be observed that models trained Fig. 3 Dependency of the validation accuracy on the composition of training sets. Horizontal: percentage of degraded images in the training data. Vertical: accuracies on different validation sets, each with N ¼ N orientations · N backgrounds · N samples ¼ 400; 000 images, with N backgrounds ¼ 1000, N orientations ¼ 4, N samples ¼ 100. "All degradations" include images enhanced with one of the algorithms in Table 1.
only with pristine images perform very bad on degraded and ADSP processed images with natural backgrounds. A slight increase of the percentage significantly raises the validation accuracies on degraded imagery. On the other hand, models trained with a high percentage of degraded images still perform very well on pristine images. Therefore, to make the best use of the model capacity for the image degradations and ADSP methods, all models mentioned below are trained with 100% degraded images, whereas ADSP is applied with 50% probability.

Dependency of Accuracy on Background Variance
A trained 64 × 64 model is validated on images of a fixed target, a centered triangle with a circumradius of r ¼ 10 pixel. The triangle circumradius r is converted to the often-used square root area S ¼ ffiffiffi ffi A p 1 using the Pythagorean theorem: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 5 6 1 1000 random crops of different background images from OpenImages V6 8 were used. The background variance σ background of gray levels was varied to have different SNR background ∈ ½0;1; 2;3; 4;5; 10;20 dB. White Gaussian noise was added to have SNR noise ∈ ½0;5; 10 dB, where E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 4 5 8 SNR noise ½dB ¼ 10 log 10 a triangle σ noise ; and a triangle is about one-sixth of the dynamic range. In Fig. 4, accuracies over SNR background for different SNR noise and corresponding example images for the lowest SNR noise ¼ 0 dB are shown.
Obviously, accuracies are about 100% for SNR background ≥ 5 dB and SNR noise ≥ 5 dB. The accuracies drop monotonically with decreasing SNR background . A similar behavior can be observed for varying triangle sizes, as shown in Fig. 5. As expected, the accuracies also drop for decreasing triangle sizes. Further variation of the relative triangle position in the subpixel range shows high fluctuations of accuracies for low triangle circumradius r ¼ 1 pixel. This finding is consistent with the problem of recognition near the resolution limit mentioned before. 1

Upscaling of Receptive Field
Models according to the CNN architecture shown in Fig. 2 were trained for various resolutions of receptive fields, i.e., 128 × 128, 256 × 256, 512 × 512, and 1024 × 1024. For 512 × 512 and 1024 × 1024, a reduction of learning rate η ¼ 10 −4 was required to achieve significant model improvements compared with the initial model states. Otherwise, model performances stagnated on average at 25% guessing rates. In Fig. 6, validation accuracies over SNR background are shown for different sizes of receptive fields.
There is a general trend of decreasing accuracies for larger receptive fields. This may be due to the fact that larger receptive fields can contain more objects with high similarity to triangles. Furthermore, the growth factor g ¼ 1.2 for the number of feature maps per block n feature-maps may be insufficiently small to provide enough model capacity for an increasing range of triangle sizes to handle. A surprising fact is the lower accuracies for higher SNR noise ¼ 20 dB compared with SNR noise ¼ 5 dB and receptive fields of 256 × 256 pixels and larger. This might indicate beneficial properties of Gaussian noise by suppressing structures in the background resembling the triangle target.

Comparison of Methods for Contrast Enhancement
A trained 64 × 64 model was validated by images processed with the 25 ADSP algorithms shown in Table 1. Many algorithms only operate on integer pixel values based on gray level distributions, which are widespread for natural images. Hence, to prevent saturation due to clipping, the pixel values are shifted to have a mean of half of the dynamic range, upscaled by a factor 2 8 − 1 ¼ 255 and converted to 8 bit. After ADSP processing, this shifting and upscaling is reverted, and pixel values are converted to floating point numbers.
In Fig. 7, differences of accuracies between single CE algorithms and the identity are shown for varying SNR background and SNR noise . For convenience, the algorithms were ranked with respect to the maximum value, and only the top 10 algorithms are shown.
It can be observed that there are accuracy gains for low SNR background < 5 dB, and the accuracy differences are quite similar for the top 10 algorithms. Because the accuracy is about 100% for SNR background ≥ 5 dB and SNR noise ≥ 5 dB without ADSP processing according to Fig. 4, no significant improvement by CE can be expected for these cases. By contrast, a severe degradation of model performance occurs for some CE algorithms and high SNR noise ¼ 20 dB.
To validate CE algorithms on a variety of degradations, different triangle parameters and degradation parameters were varied and uniformly distributed in the ranges given in Table 2.
N target ¼ 1000 random samples of triangles and degradation parameters were combined with N backgrounds ¼ 1000 natural backgrounds as random crops from Open Images V6. 8 This procedure was repeated N chunks ¼ 10 times with varying random seeds, resulting in different triangles and background images. Model accuracies were calculated on N total ¼ N chunks N targets N backgrounds ¼ 10 7 images with 64 × 64 pixels. The same procedure was repeated, with each of the 25 CE algorithms from Table 1 being applied respectively as the final step. Compared with grid variation of individual triangle and degradation parameters, random sampling of many of these parameters allows for investigation of individual parameters by arbitrary parameter cuts, whereas other parameters are widely distributed. This gives better insights on possible fluctuations on model performances when those parameters are unknown.
As shown in Fig. 8, accuracy differences between each of the 25 CE algorithms and identity were calculated for parameter cuts of the triangle circumradius r and the signal-to-background ratio SNR background . Only accuracies on images with values in r ∈ ½1;3 pixel (left), r ∈ ½14;16 pixel (right), SNR background ∈ ½0;2 dB (left), and SNR ∈ ½18;20 dB (right) were selected. The interquartile ranges (IRQ), shown as orange boxes, contain values between the 25%-percentile and the 75%-percentile. The IRQs are extended by whiskers by 1.5IRQ at both sides at maximum, but they are limited by the respective minimal and maximal values in the data. Outliers are shown as circle markers.  Obviously, a high SNR background and a low triangle circumradius r lead to significant impairment of model accuracies by most of the 25 CE algorithms. Accuracy differences at high triangle circumradius r and high SNR background (right bottom) show small IRQs, as the accuracy is often saturated at 100% for high SNR noise . Hence, accuracies for low SNR noise are rendered as outliers. For high r ∈ ½14;16 pixel and low SNR background ¼ ½0;2 dB (left bottom) only, some CE algorithms show accuracy gains. Also, monotonic transitions of accuracy differences for varying r and SNR background were observed. The reason for the significant impairment at high SNR background could be due to the fact that a narrow gray level distribution of background values leads to excessive enhancement of the background by most CE algorithms, resulting in textures with a low dynamic range, steep edges, and a high similarity to the triangle to be discriminated. On the other hand, a large triangle reduces the number of background pixels and hence their contribution to the gray level distribution of the entire image. Most of the investigated CE algorithms depend on the image gray level distribution.  Fig. 8 Whisker-box plots of accuracy differences of the 64 × 64 model between 25 CE algorithms and identity based on varying cuts for (a) triangle circumradius r ∈ ½1;3 pixel, (b) r ∈ ½14; 16 pixel, SNR background ∈ ½0;2 dB (left), and SNR background ∈ ½18;20 dB (right). Insets show examples of centered triangles for r , SNR background in the corresponding ranges with SNR noise ¼ ∞. For each algorithm, means (green triangle marker), medians (red lines), IQRs (orange boxes) of accuracy differences for the counts in the respective parameter intervals are shown (further details in the text). At high SNR background and low triangle circumradius r , the 25 CE algorithms lead to significant impairment of the model accuracies.

Generalization on Background Images of Different Image Databases
To investigate the ability of the TOD model to generalize to a larger variety of background images, the trained 64 × 64 model was validated on artificial images with a single centered triangle with a fixed circumradius of r ¼ 10 pixel superposed with background images resulting from random crops of images from different image databases. Examples of such image crops with 64 × 64 pixels are shown in Fig. 9 for different image databases: Pascal VOC, 48 ILSVRC2012, 49 FLIR ADAS, 50 OpenImages V6, 8 Stanford dogs, 51 Oxford flowers 102, 52 Caltech 101, 53 and Gaussian noise.
In Fig. 10, the model accuracies over N ¼ N direction · N background images and different SNR background are shown, with N direction ¼ 4 triangle orientations and N background ¼ 1000 different backgrounds from several image databases. In addition, the generated artificial images are impaired by Gaussian noise with a high noise level SNR noise ¼ 0 dB (left) and a low noise level SNR noise ¼ 20 dB. It can be observed that model accuracies are very similar for most of the image databases. In contrast, background images of Gaussian noise yield significantly better accuracies than those of the image databases. Images of FLIR ADAS 50 show accuracies between those of Gaussian noise and the image databases, which may be due to the relatively high noise content in the FLIR ADAS images. This fact indicates that structured backgrounds from the  49 FLIR ADAS, 50 OpenImages V6, 8 Stanford dogs, 51 Oxford flowers 102, 52 Caltech 101, 53 and Gaussian noise. Image are converted to single-channel data by averaging over channels of RGB data. For clarity, example images are shown using the colormap "viridis," and pixel values are scaled to have a mean that equals the center of the dynamic range and a standard deviation of one-fourth of the dynamic range. Compared with other image databases, FLIR ADAS contains images with a high noise content.
image databases have a higher degradational effect on the triangle recognition than Gaussian noise for equal standard deviations of pixel value fluctuations.
The qualitative behavior of the model accuracy is similar for different databases when applying the methods for CE. The same is true for the ranges of SNR background values, for which CE yields an improvement in accuracy. For convenience, only an example for applying CLAHE is shown in Fig. 11.

Variation of Model Size
The used architectures so far were an arbitrary initial choice. One might ask if similar/better accuracies could have been achieved by smaller/larger models. To answer this question, further models were trained based on the default configuration (Sec. 2.6) with modifications of single parameters, i.e., the initial number of filters N 0 , the growth factor g, the dense layer size L, and the number of dense layers. In Fig. 12, validation accuracies for varying initial number of filters N 0 , the growth factor g, the dense layer size L, and the number of extra dense layers in addition to the final dense layer with four units are shown. Models are validated on images with N orientation ¼ 4 orientations, N background ¼ 1000, and N sample ¼ 100 samples, resulting in N total ¼ N orientations · N background · N sample ¼ 400; 000 images. Accuracies denoted with "all degradations" contain images enhanced by one of the CE algorithms (Table 1) with equal probabilities.

Model Complexity
We did benchmarks of our trained TOD model for 64x64 pixels on our machine (a Ryzen 9 3900X processor with an NVIDIA GeForce RTX 2080Ti graphics card and 64GB RAM).  Table 1. Table 3 gives the average running time on an NVIDIA GeForce RTX2080Ti, as well as the number of parameters and floating point operations (FLOPs), which equals twice the number of multiply-accumulate computations (MACs).
The trained TOD models are smaller and faster compared with current machine vision backbones, which are also shown in Table 3. Faster model inference allows for a stronger focus on several image degradations. Compared with the classification of RGB images in the visible spectrum, the TOD models are applicable on single-channel data, and the four triangle classes are symmetric and balanced. Furthermore, the triangle shape and texture are independent of any spectral band, in contrast to many image databases in the visible band. This is crucial, e.g., for range performance assessment of imagers in several infrared spectral bands [long-wavelength infrared (LWIR), mid-wavelength infrared (MWIR), and short-wavelength infrared (SWIR)].

Comparison with Other Image Quality Metrics
Different image quality metrics were proposed for assessment of methods for CE, such as absolute mean brightness error (AMBE), 20 discrete entropy, 12 measure of enhancement (EME) and EME based on entropy (EMEE), 65 QRCM, 66 UIQ, 67 EBCM, 68 and CII. 29 A more detailed overview of further image quality metrics and methods for CE can be found in another work. 69 However, most of these metrics were validated by subjective image quality assessments and may not correlate well with accuracies of models for TOD recognition or other machine vision tasks. To investigate some current nonreference metrics on images used in the evaluation of TOD models, these metrics were calculated for 64 × 64 images with a centered triangle superposed by backgrounds taken from OpenImages V6 scaled to different SNR background and impaired by Gaussian noise with different SNR noise . In addition, these images were enhanced by three CE methods, CLAHE, 13 EHS, 17 and SUACE, 28 which were among the top 10 algorithms in Fig. 7. In Fig. 13, values for nonreference image quality metrics EBCM, 68 EME, EMEE, 65 and entropy 12 over SNR background are shown.
It can be observed that metric values Q 0 are high for low SNR background and low SNR noise , representing high variances of background and Gaussian noise, respectively. The metric values Q 0 decrease monotonically with increasing SNR background and SNR noise . The only exception is EBCM for SNR noise ¼ 20 dB, which we assume is due to the regularization of denominators in the algorithm. Therefore, low metric values represent conditions under which TOD accuracies are high. However, very similar metric values Q 0 could also be observed when the triangle was omitted. This indicates that these metrics are mainly determined by background for triangles with r ≤ 10 pixel. Thus, the metrics are weakly or not at all interrelated with the TOD task performance if the triangle is very small. CEs by the three CE methods generally lead to positive shifts of the metric values Q CE or in other words ΔQ CE ¼ Q CE − Q 0 > 0, which can be interpreted as predominant enhancement of the background and noise, which aggravates TOD recognition. Similar results were found for the evaluation of the full-reference metrics AMBE, 20 CII, 29 QRCM, 66 and UIQ. 67 In summary, the metrics can be meaningful for the assessment of CE, whereas they cannot provide insights if the CE is beneficial for TOD recognition and possibly other classification tasks with small targets.

Conclusion
Accuracies of a sequential CNN model performing TOD discrimination were compared with respect to 25 different methods for CE. The background overlay was crucial because the accuracy is significantly impaired for high background variance and the CE algorithms strongly  Fig. 13 Means of nonreference metric values Q 0 (EBCM, 68 EME, 65 EMEE, 65 and entropy, 12 left column) and metric differences ΔQ CE ¼ Q CE − Q 0 for three CE methods, i.e., CLAHE, 13 EHS, 17 and SUACE 28 (right columns) over N background ¼ 1000 images of 64 × 64 pixels with a centered triangle with circumradius r ¼ 10 pixel for different SNR background ðdBÞ and SNR noise ðdBÞ. ΔQ CE ¼ 0 is shown as a horizontal blue dashed line. Example images with centered triangles superposed with background and impaired by Gaussian noise with SNR noise ¼ SNR background ¼ 3 dB (bottom), (left) without CE and (right) CE by the respective algorithm.
depend on it. Accuracy gains for low signal-to-background ratios SNR background < 5 dB and a sufficiently large triangle r ¼ 10 pixel were shown. Model accuracies on images with randomly sampled triangle and degradation parameters revealed significant impairment by the investigated CE algorithms for a high SNR background and low triangle circumradius r. The strong fluctuations of accuracy differences highlight the difficulty in showing clear superiority of individual algorithms. Models with increased resolution of the receptive field have shown decreasing accuracies, which may indicate that the growth of the number of model parameters was insufficient to represent the increasing range of triangle sizes. Another reason may be a higher number of background artifacts mimicking triangles. Larger images have more pixels. Therefore, their gray level distributions are statistically more stable. Hence, CE algorithms based on these gray level distributions should provide lower variations in the processed images and the associated accuracies. To prove this hypothesis, further investigations on larger receptive fields are required.
Variations of model size parameters, i.e., the number of filters N 0 , the growth factor g, the number of dense layers, and the activation function, have shown that the used default configuration is close to optimal based on the used model architecture and maximal values of degradation parameters used for the generation of training/validation data. Stronger degradations may require larger models for optimal accuracies.
The presented method can be used in an analogous way to assess the impact of other scenebased ADSP on military tasks. Moreover, the trained models can be used together with a test bed with an infrared scene projector for hardware in the loop testing of images including embedded ADSP. Finally, the methodology may be easily extended to more sophisticated classification tasks with real target signatures. In contrast to the triangle, real target signatures also have textures with spatial variations. Therefore, the gray level distribution and the CE based on it depend more strongly on the variations within the target, especially if the target covers lots of image pixels. Features related to these textures may require larger models compared with those investigated in this work.