Lossy compression of noisy remote sensing images with prediction of optimal operation point existence and parameters

Abstract. We address lossy compression of noisy remote sensing images, where the noise is supposed to be spatially uncorrelated (white), additive originally or after a proper variance-stabilizing transformation (VST). In such situations, the so-called optimal operation point (OOP) might exist. The OOP is associated with the parameter that controls compression (e.g., quantization step) for which the compressed image is the closest to the noise-free image according to a certain criterion and is closer than the original noisy image. Lossy compression in the neighborhood of OOP, if it exists, relates to an essential noise filtering effect and some distortions. Then such lossy compression (in the neighborhood of OOP) becomes expedient. However, it may be that OOP does not exist for a given image and the observed noise intensity. In such a situation, it can be reasonable to carry out a more “careful” image compression (with a lower compression ratio). Also, it is expedient to predict the existence of OOP and the compression parameters at this point in advance in order to perform adaptive and automated compression. The OOP existence that can be predicted for some coders based on the discrete cosine transform (DCT) is shown. The proposed prediction procedure is simple and fast. It presumes the calculation of DCT coefficient statistics in nonoverlapping 8×8  pixel blocks for a given image and uses an approximating curve obtained in advance. It is shown that it is possible to predict values for both conventional metrics, such as mean square error or peak-signal-to-noise ratio, and some visual quality metrics for the coder parameters that correspond to a possible OOP. The designed prediction procedure is tested on Hyperion and AVIRIS hyperspectral remote sensing data.


Introduction
Today, Earth monitoring systems find more and more applications. 1,2 Most modern remote sensing (RS) systems are multichannel (e.g., full-polarization, multispectral or hyperspectral). On one hand, the multichannel character of the provided RS data increases their usefulness and information content. On the other hand, the amount of acquired data for multichannel sensors is large or even huge 3 due to several factors. The first is the aforementioned number of sub-bands (channels). The second is the good spatial resolution of existing RS systems and/or a rather large area of imaged terrains.
For these reasons, such RS data often has to be compressed before the transferring downlink and its archiving in on-land centers of data processing or dissemination. [3][4][5][6] Note that there are many data compression techniques that belong to two major groups of lossless and lossy compression. The best lossless compression techniques provide a compression ratio (CR) up to 4.5 and this happens only for hyperspectral images under a condition that spatial and spectral data redundancies are exploited to a full extent. [3][4][5][6] However, a desired CR might be larger 3,7 and then one would have to apply lossy compression methods.
For lossy compression of hyperspectral data, strategies and requirement priorities can be different. 3 According to one strategy, the first-order task can be to provide a CR not less than the desired one; then compressed image quality is the second-order task. This paper concentrates on a different strategy associated with another set of requirements' degradation. It is assumed that the first-order task is to have such "acceptable" losses that they do not result in hyperspectral data degradation, [7][8][9] while it is also desired to provide a higher CR (second-order task).
A common assumption in design and testing lossy compression techniques is that compressed images are noise free. 10,11 Used quantitative criteria such as the mean square error (MSE) or peak-signal-to-noise ratio (PSNR) that characterize compression are then calculated accordingly, i.e., for compressed and original images. RS and other images are often noisy, in fact, all real-life images are noisy, but noise can be invisible or visible by considered image inspection. [12][13][14] For hyperspectral data, noise is visible or, at least, can be noticed in about 20% of the sub-band images. 12,13 The noisiest sub-bands are sometimes removed from consideration, but some of these sub-bands contain useful information which is worth preserving and retrieving. 14 The noise presence in conventional and RS images is often not taken into account in lossy compression. Al-Shaykh and Mersereau 15 were the first to demonstrate the peculiarities of noisy image lossy compression and to stress the necessity of taking noise and its properties into account.
First, Al-Shaykh and Mersereau 15 outlined the phenomenon of image denoising (filtering effect) due to lossy compression. Certainly, this denoising is specific and is less efficient compared to the best image denoising techniques, 16 but the partial effect of noise filtering can be regarded as a favorable phenomenon. In particular, it can lead to improving hyperspectral RS data classification accuracy. 8,9 Second, it has been stressed in Ref. 15 that it is worth carrying out an analysis of noisy image lossy compression using untraditional quantitative criteria (or criteria determined in an unconventional way). While performing simulations [when one has true (noise-free), noisy, and compressed noisy images], it is worth analyzing metrics calculated for compressed and true images. Such metrics may have values which are better than the corresponding metric calculated for noisy (uncompressed, compressed in a lossless manner) and true images. In other words, a compressed image under certain conditions [15][16][17][18][19] can be better (closer to the noise-free image according to a considered metric) than the noisy image. It can take place for different techniques of lossy image compression, e.g., based on discrete cosine transform (DCT) 17,18 and wavelets, 19 for different noise types 15,17,19 and for different used metrics (criteria). 9,20 If a compressed image can be closer to the true image than the noisy one, then a used metric dependence on a parameter that controls compression (PCC) can have an extreme [it is a minimum in the case of, e.g., mean square error (MSE) or a maximum in the case of peak signal-tonoise ratio (PSNR)]. This extreme in a wide sense is called OOP. In a narrow sense, OOP is associated with PCC that provides the extreme and it can be, for example, quantization step (QS), scaling factor (SF) or bits per pixel (bpp), depending on an applied coder. 15,17,19 Lossy compression of noisy images in OOP has certain advantages. On one hand, it is possible to expect that the provided CR is considerably larger than for lossless compression. On the other hand, RS data compressed in OOP can be classified better than the original data or data compressed in a lossless manner. 8,9 There are certain problems in compressing noisy images, in general, and hyperspectral images, in particular, to demonstrate that OOP exists. First, true images are not available in practice. This does not allow considering dependences that determine OOP. Fortunately, there are methods that allow rough estimation of PCC at OOP and provide compression in the OOP neighborhood. 17,18,20 For example, for DCT coders, this can be done by setting the QS or SF proportional to the noise standard deviation with a properly chosen proportionality factor. 18,20 Second, noise in hyperspectral images is rather complicated. [21][22][23] One often has to take into account the signal-dependent noise component [22][23][24] and its different intensity. Fortunately, there are several ways to cope with this, including blind estimation of noise characteristics, 23,24 data normalization in sub-bands before compression, 25 and application of VSTs. 22,26 Third, OOP does not always exist, 15,20 in fact, it exists if noise is rather intensive and the image is not textural (complex). This fact has been ignored earlier in the methods described in literature 21,[25][26][27] for which the OOP existence has been supposed for any subband image. Such compression can introduce losses of an undesired level in compressed data.
Our idea is the following. 28,29 If OOP exists for a given (sub-band) image, it is worth carrying out compression in an OOP neighborhood. If OOP does not exist, it is worth performing a more careful compression (introducing fewer losses). To realize this idea, it is necessary to predict the OOP existence. Such a prediction can be helpful in practice if it is fast and accurate (reliable).
Our recent paper 29 demonstrated that there is an approach that works rather well in predicting OOP existence and the corresponding compression parameters for lossy noisy image compression using four coders. There are several aspects not considered in this paper 29 and the obtained results have some shortcomings. First, a limited number of test images (12) and noise standard deviation values (3) have been used to obtain prediction dependences by curve fitting. Second, fitting has not been considered in detail. Third, the approach has not been tested on real-life RS data. Thus, the present paper can be regarded as an extended version of Ref. 29, where the aforementioned aspects are paid special attention. Its main novelty consists in a specific adaptation of PCC to image/noise properties in hyperspectral data sub-bands. Another contribution is in method verification for real hyperspectral data.
The paper is organized as follows: Sec. 2 revisits the used image-noise model and quantitative criteria (metrics) of lossy compression applied componentwise (or to a single-channel image). OOP and filtering effects at this point are discussed. This section also briefly describes the considered DCT-based coders. Section 3 shows the obtained numerical simulation results for lossy compression of test noisy images. Aspects of curve fitting are discussed and improvements (compared to Ref. 29) are shown. Comparison of coder performance is done. Section 4 contains practical details for componentwise noisy image compression. Results for hyperspectral images after VST are presented. Then the conclusions and directions of future research are given in Sec. 5.

Used Image-Noise Model, Coders, and Lossy Compression Metrics
Our further research is based on several assumptions. First, we concentrate on considering lossy compression of single-channel images or componentwise compression of multichannel (hyperspectral) data. Clearly, joint [three-dimensional (3-D)] compression of hyperspectral data is able to provide better CR, 5,7,21,27 but it is necessary to understand from the beginning how to set the PCC for each (sub-band) image. Second, a simple case of additive white Gaussian noise is considered. Here, one has to keep in mind that if noise is signal dependent (Poissonian, additive and Poissonian, additive and multiplicative, etc.), there is a VST able to convert such noise to the additive one. 22,27,30 This is possible if parameters of signal-dependent noise are estimated with appropriate accuracy and modern existing methods. 23,24 Third, major attention is paid to consider DCT-based coders for which it is simpler to provide compression in an OOP neighborhood 17,18,20,21 and for which we are more experienced in their design and performance analysis. Finally, we will focus on analyzing two metrics: conventional PSNR and visual quality metric PSNR-human visual system (HVS)-masking (M). 31 It might seem that these metrics (especially the latter one) do not have a direct relation to RS data processing. However, one has to take into account the following. On one hand, the criteria used in RS data compression are application oriented. 8,32 On the other hand, metrics exploited in compression of conventional images are also used in lossy compression of hyperspectral images and this relates to HVS metrics as well. 33 One reason is that HVS metrics highly correlate with such important properties as edge, detail, and texture feature preservation by lossy compression techniques.
Then the noisy image model in a n'th sub-band image is where ij denotes image indices in the spatial domain, η ij ðnÞ is zero mean additive white and Gaussian (AWGN) with variance equal to σ 2 0 ðnÞ, I true ij ðnÞ denotes the true image value in the ij'th pixel of the n'th sub-band image; i ¼ 1; : : : ; I Im ; j ¼ 1; : : : ; J Im , I Im and J Im determine the image size. As has been noted above, even if Eq. (1) is not valid for original images, it becomes adequate after the corresponding VSTs. Besides, let us start from considering single-channel images supposing the componentwise processing of multichannel images. Hence, let us omit the index n in this and the next sections.
In fact, only the noisy image I noisy ij , i ¼ 1; : : : ; I Im ; j ¼ 1; : : : ; J Im is available in practice. Usually, for analysis purposes, one has a true image I true ij , i ¼ 1; : : : ; I Im ; j ¼ 1; : : : ; J Im and noise is generated and added artificially. After applying a considered lossy compression technique to the noisy image, a compressed image is obtained, which will be further denoted as I c ij , i ¼ 1; : : : ; I Im ; j ¼ 1; : : : ; J Im .
Then having all three images, it is possible to calculate any metric Metr for any pair of them. Recall that only a metric Metr nc for the pair of images fI noisy ij g and fI c ij g can be calculated in practice. But in simulations, it is also possible to calculate a metric Metr tc for the images fI true ij g and fI c ij g. For any metric Metr nc , its dependence on CR is clear. It becomes worse for any coder if CR increases (bpp reduces, QS or SF increases). The reason is that the introduced distortions characterized by Metr nc increase if CR becomes larger.
The behavior of a metric Metr tc can be specific. 15,20 It might happen that the metric Metr tc has an extreme (optimum) for a certain PCC of a used coder. Then one might have bpp OOP or QS OOP or SF OOP . To get some ideas of dependences and OOP that might be observed for them, let us present several typical dependences. For this purpose, consider three test images, namely, Airfield, Aerial, and Frisco [see Figs. 1(a)-1(c), respectively]. These images differ sufficiently from each other in complexity, where we associate complexity with the percentage of pixels that belong to heterogeneous image regions.
The AWGN has been added with two variance values equal to 100 and 200 to these three images and they have been compressed by the coder AGU. 34 This coder uses DCT in 32 × 32 pixel blocks and deblocking after the decompression. Recall that QS serves as PCC for this coder. The behavior is controlled by two quality metrics: conventional PSNR tc and PSNR-HVS-M tc , both determined between the true and compressed images. Both quality metrics are expressed in dB and their larger values relate to a better quality. The obtained dependences are presented in Fig. 2.
As is seen [ Fig. 2(b)], four out of six curves PSNR-HVS-M tc ðQSÞ are monotonously decreasing. According to this metric, the quality of compressed images steadily reduces if QS increases and, respectively, CR increases as well. This means that for these test images and noise intensities, OOP does not exist and a trade-off between the reduction of compressed image quality and the reached CR should be found. Meanwhile, there are also two dependences in Fig. 2(b) that have maxima (both are observed for the test image Frisco that has a simple structure).
The curves PSNR tc ðQSÞ in Fig. 2(a) are somewhat specific. Five out of six curves have OOPs according to its definition, i.e., the extreme metric value is better than for the uncompressed (original, noisy) image-this value is practically the same as for QS ¼ 1. One dependence has a local maximum in the neighborhood of QS about 40 (the image Airfield, σ 2 0 ¼ 100, QS ≈ 4.0σ 0 ). Formally, this is not OOP since PSNRðQS ¼ 1Þ is larger than PSNR for QS ≈ 4.0σ 0 .
Analysis of the dependences, PSNR tc ðQSÞ and PSNR-HVS-M tc ðQSÞ, presented in Fig. 2 shows the following. If OOP is observed, it happens when QS OOP ≈ ð3.5: : : 4.0Þσ 0 . The same observations can be found for the coder AGU applied to other images and noise variance values in the paper. 20 The positions (arguments) of OOP slightly differ depending upon an image and noise variance, but QS OOP ≈ ð3.5: : : 4.0Þσ 0 can be considered as the recommended value to attain OOP (if it exists) for the AGU coder. 20 This means that the compression in OOP or its close neighborhood can be provided without having a noise-free image if σ 0 is known in advance or accurately pre-estimated.
It has already been mentioned that OOP appears due to the filtering effect provided by lossy compression applied to noisy images. Let us explain why it happens and give one illustration. For the very beginning, recall how DCT-based filtering is carried out. 35,36 The basic operations are direct DCT in blocks, DCT coefficient thresholding (assigning them zero values if the magnitudes are smaller than a set hard threshold), and inverse DCT. An additional operation is filtered value averaging for overlapping blocks. Noise removal is achieved due to the thresholding of DCT coefficients with small magnitudes, which are supposed to be related to noise.
DCT-based lossy compression 14,34,37 has similar operations at the initial processing stages. Direct DCT in blocks is carried out. Then DCT coefficient quantization is applied. For DCT coefficients with small magnitudes, this quantization (which leads to zero values after performing it) is very similar to hard thresholding and this explains the filtering effect for lossy compression. The quantization of larger magnitude DCT coefficients introduces distortions; besides, blocks used in lossy compression are usually nonoverlapping. These two factors explain why the filtering effect caused by lossy compression is less efficient than the corresponding orthogonal transform based denoising. The explanation for wavelet-based compression and denoising is almost the same. OOP exists if the positive effect of noise removal is more pronounced than the negative effect of losses introduced by quantization.
An example of the filtering effect is shown in Figs. 1(d) and 1(e). The noisy image Frisco Fig. 1(e). The noise suppression effect is obvious and the information content is sufficiently preserved.
The dependences presented in Fig. 2 and the example given in Fig. 1 indicate that OOP might or might not exist. Thus, it is expedient to predict that does OOP exists for a given coder, metric, image, and noise variance (assumed known or pre-estimated in advance). This is quite a general task that is addressed here. Since there is a wide variety of image compression techniques and coders, our "narrower" task is to demonstrate that the prediction task can be solved for two metrics (conventional and visual quality) and several DCT-based coders with QS or SF serving as the PCC. The proposed approach, although based on DCT, could be useful for wavelet-based coders (this has not been checked yet).
It has been shown earlier 15,17,20 that OOP might exist for different coders, in particular, for the standard JPEG, for the DCT-based AGU 34 and advanced DCT (ADCT) 37 coders with QS serving as PCC, and for the wavelet-based JPEG2000 or SPIHT coders with CR controlled by bpp. [38][39][40] OOP might also exist for coders with SF used as PCC, and the examples are the recently proposed modifications of AGU and ADCT called AGU-M and ADCT-M. 41 Below, the following four DCT-based coders are considered: AGU, 34 ADCT, 37 AGU-M, and ADCT-M. 41 The AGU coder has been briefly described above. The ADCT coder uses DCT in blocks of different sizes obtained as the result of partition scheme optimization. The ADCT coder outperforms AGU as well as JPEG2000 and SPIHT 37 (see also Ref. 42). Both AGU and ADCT employ uniform quantization of DCT coefficients. Similar to AGU, ADCT provides compression in the OOP neighborhood for QS ≈ 3.5σ 0 . 20 The modifications of these coders called AGU-M and ADCT-M have only one difference with respect to their counterparts. They employ different quantization for different DCT coefficients (similar to the standard JPEG). The quantization factors are calculated as SF multiplied by the quantization table (matrix). For AGU-M, there is a fixed 32 × 32 pixel matrix. For ADCT-M, there are matrices for all possible sizes of the blocks obtained by the partition scheme. The coders, AGU-M and ADCT-M, were designed to improve compressed image visual quality. Our studies have shown that to get into an OOP neighborhood, one has to apply the fixed SF OOP equal to 2.0σ 0 .
Thus, the recommendations of how to set QS OOP or SF OOP if OOP exists can be given. If OOP does not exist, QS or SF should be smaller. This will be analyzed later more in detail.
Let us discuss what can be indicators of the existence of OOP. One standard metric frequently used in lossy compression analysis is MSE, defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 1 2 If OOP (according to this metric) exists, then MSE tc has to be smaller than σ 2 0 . This can be also treated in another manner-the ratio MSE tc ∕σ 0 has to be less than unity in OOP and its neighborhood.
Another standard metric is PSNR which is strictly connected with MSE tc as PSNR tc ¼ 10 log 10 ½ðI max − I min Þ 2 ∕MSE tc ; where I max , I min are the maximal and minimal image values (the difference I max − I min is assumed to be fixed and equal to 255 if one deals with 8-bit images). In a similar manner, for the original (noisy) image, one has E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 6 2 0 PSNR or ¼ 10 log 10 ½ðI max − I min Þ 2 ∕σ 2 0 : Additionally, we propose using a parameter that can be called improvement or the difference of PSNR defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 5 6 3 ΔPSNR ¼ PSNR tc − PSNR or ¼ 10 log 10 ðσ 2 0 ∕MSE tc Þ: Positive ΔPSNR (expressed in dB) in the neighborhood of (possible) OOP indicates that OOP exists and it is expedient to carry out compression with the recommended PCC. Otherwise, if MSE tc ∕σ 0 is larger than unity or ΔPSNR is smaller than zero, then in our opinion, compression with a smaller CR is desired.
The decision on the existence of OOP can also be based on an analysis of metrics other than MSE or PSNR. HVS-metrics have become popular. 32,43,44 Therefore, let us consider the visual quality metric PSNR-HVS-M. 31 This metric takes into consideration two important peculiarities of the HVS: different sensitivity to distortions in low and high spatial frequencies as well as masking effects. For characterizing the visual quality of single-channel images, PSNR-HVS-M is among the best. 45 This parameter is expressed in dB; OOP is supposed to exist if ΔPHVS is positive. Before starting to consider the prediction of the OOP existence, we would like to note the following. Recently, approaches to predict the denoising efficiency for two DCT-based filters and the metrics ΔPSNR and ΔPHVS have been proposed and tested. [47][48][49] For carrying out such predictions, one needs two very simple operations. First, some simple statistic parameter has to be calculated for a given image. Calculations are made in nonoverlapping 8 × 8 pixel blocks of a source noisy image to obtain one of two probabilities: either probability P 2σ that absolute values of DCT coefficients in blocks are less than 2σ 0 or probability P 2.7σ that absolute values of DCT coefficients exceed 2.7σ 0 . The second operation is to calculate a predicted metric value using the available analytical dependence of this metric on P 2σ or P 2.7σ . Such analytical dependences may be obtained for a considered metric in advance. Using a set of test images and different values of noise variances, the dependences of the considered metrics on P 2σ (and on P 2.7σ ) are obtained by fitting curves to the corresponding scatter plots.
Keeping in mind similarities between denoising and the lossy compression of noisy images, a similar approach has been proposed to predict the OOP existence and its compression parameters in our paper. 29 The advantage of this approach is its high computational efficiency. It uses DCT in 8 × 8 pixel blocks that is a standard operation in image processing. 10 Moreover, blocks are nonoverlapping and for large size images, it is enough to have 500 nonoverlapping blocks chosen randomly to estimate P 2σ or P 2.7σ with an appropriate accuracy. 48,49 All other operations needed to calculate P 2σ or P 2.7σ are very simple (logical and arithmetic).

Simulation Methodology and Result Analysis
Recall the basic aspect of the prediction approach. We assume that some statistical parameter is calculated for a given image. The requirements for this parameter are the following. First, it should be simple enough and computed quickly (desirably faster than the compression, see discussion at the end of Sec. 2). Second, these parameters should be able to "jointly describe" image complexity and noise intensity. This requirement assumes that the probability of OOP existence increases for a larger noise intensity and lower complexity of a compressed image. Third, the prediction has to be carried out with appropriate accuracy. We mean here that the STD of the prediction error should be small, about 0.2 dB, for the metrics ΔPSNR and ΔPHVS.
We do not know at the moment what statistical parameter is the best for our purpose and try not to invent new statistical parameters that can be used in the prediction (this is outside the scope of this paper). Therefore, let us analyze the applicability of statistical parameters that have been already used with a certain success. 47 Let us use the aforementioned probabilities P 2σ and P 2.7σ . More formally, the probability P 2σ is defined as the mean of the local probabilities estimated for the considered image blocks as Consider now the methodology of obtaining and processing simulated data. In Ref. 29, 12 grayscale test images have been used. Eight of them can be considered as the standard optical test images: Baboon, Stream&Bridge, Man, Lenna, Tiffany, F − 16, Peppers, Boats. These images have different complexity starting from rather simple ones as Lenna and ending with the complex structure (textural) test image Baboon. Four other test images have more relation to RS. They are the images Frisco, Aerial, and Airfield, as shown in Fig. 1, and a quite popular image San Diego. The results presented in Ref. 29 have already shown that the observed tendencies are the same for both groups of images. The same tendencies for two groups of the test images have been earlier found in Ref. 48. Thus, the image content does not influence the results. The main requirement is to have a rather large number of test images with properties varying in wide limits. 48 Another aspect is what levels of noise to study. In Ref. 29, we considered three noise levels characterized by variances equal to 25, 100, and 225. This makes the probabilities P 2σ and P 2.7σ vary in the limits from 0.53 to 0.93 and from 0.03 to 0.35, respectively. Meanwhile, both probabilities can potentially vary from 0 to 1. Thus, since different values of these probabilities can be met in practice, we had to consider wider limits of probability variation in the simulated noisy images. For this purpose, we have also added noisy images corrupted by AWGN with σ 0 (STD) equal to 0.5, 1, 2, and 20.
Let us present the results obtained for rather small noise STD values (0.5, 1.0, and 2.0) that are at the same level as the "internal noise" STD of the test images. The simulation data are given for the AGU coder in Table 1. The values of P 2σ vary in the limits from 0.07 (very close to zero) to over 0.6 (for simple structure test images if σ 0 ¼ 2.0). In turn, the values of P 2.7σ vary from 0.23 to about 0.9 (i.e., almost unity for σ 0 ¼ 0.5). The values of MSE tc ∕σ 0 are larger than unity (are close to 2) and ΔPSNR is about −3 dB, respectively, for almost noise-free images (if STD equals to 0.5 or 1.0). Only if the noise STD is equal to 2, might MSE tc ∕σ 0 differ from 2 and be sufficiently smaller (this happens for the simplest structure images).
Analysis of ΔPHVS values shows that they are even smaller than ΔPSNR and are from −2.5 to −4.6 dB. Obviously, the visual quality of images compressed in the considered way (with QS equal to 3.5σ 0 ) does not improve. Moreover, according to the analyzed metrics, the visual quality becomes worse. However, an interesting fact is that for all cases considered in Table 1, the introduced distortions are not visually seen 46 since PSNR-HVS-M nc exceeds 47 dB (see data in the lines for each STD in Table 1). CR values in these cases are small (see CR values in the corresponding column). For σ 0 ¼ 0.5, CR is about 2.0 and it does not exceed 4.5 for σ 0 ¼ 2.0 (when QS ¼ 3.5σ 0 ¼ 7) and simple structure images. Thus, for these three values of noise variance, one practically deals with near-lossless image compression. 7 Let us now present data for four other values of noise STD: 5, 10, 15, and 20. They are given in Table 2 and grouped for noise STD. For each group, we put the images in such an order that the values of P 2.7σ (they are in the second column) are reduced.
Analysis of data presented in Table 2 shows the following: 1. Now the values of P 2σ and P 2.7σ (together with data in Table 1) practically cover the entire interval of their possible variation (from almost zero to almost unity). 2. In each group, there is a tendency of increasing the probability P 2σ and decreasing the probability P 2.7σ . This is not surprising since probabilities P 2σ and P 2.7σ are mutually dependent. 3. The values of P 2σ are the smallest and the values of P 2.7σ are the largest for the most complex images corrupted by nonintensive noise (this has also been seen for data in Table 1). Accordingly, P 2σ increases and P 2.7σ decreases for simple structure images and/or for intensive noise. 4. This means that both considered statistical parameters are able to jointly indicate noise intensity and image complexity. This is the property we need from statistical parameters to be used in prediction. 5. OOP most probably does not exist for complex structure images corrupted by nonintensive noise (then MSE tc ∕σ 0 is larger than unity, ΔPSNR and ΔPHVS are negative). Then one has to decide what QS to set (one possible decision could be to set QS smaller than QS ¼ 3.5σ 0 ). 6. According to the metrics MSE tc ∕σ 0 and ΔPSNR, there is a large probability that OOP exists if P 2σ exceeds 0.81 or P 2.7σ is less than 0.11. More insight will be given to this threshold later. 7. According to ΔPHVS, OOP exists with a high probability if P 2σ exceeds 0.88 or P 2.7σ is less than 0.05; this happens only for simple structure images and if noise is intensive. 8. Obviously, OOP (ΔPSNR reaches a few dB and ΔPHVS exceeds 1 dB) occurs only for simple structure images and high intensity noise. 9. Noise in original images is visible and distortions due to lossy compression are visible as well; this happens if PSNR-HVS-M nc is of the order 30 to 40 dB and ΔPHVS is from −1 to −3 dB; this takes place for P 2σ in the limits from 0.55 to 0.85 or for P 2.7σ in the limits from 0.06 to 0.35. 10. While for σ 0 ¼ 5, CR varies in the limits from 3.86 to 7.79, it is considerably larger for σ 0 ¼ 20 and can be larger than 20. This is due to a large QS value being used. Tables 1 and 2 could be presented in another manner. In particular, it was also possible to separately put data for each test image. Such a study is also possible by a joint analysis of data in Tables 1 and 2 for any particular test image. Consider, for example, the test image Airfield. If σ 0 equals to 0.5, 1, or 2 (see data in Table 1), P 2σ is small, MSE tc ∕σ 0 is about 2.0 and, respectively, ΔPSNR is about −3 dB. The metric ΔPHVS is about −4 dB, i.e., formal lossy compression leads to image quality degradation. However, the quality of the original images is so high that noise in them cannot be noticed by visual inspection. Introduced losses cannot be noticed as well. Hence, it  is possible to ignore these losses. For σ 0 ¼ 5 (see Table 2), we have P 2σ ¼ 0.60 and noise becomes visible. MSE tc ∕σ 0 is still larger than unity, but starts decreasing, and ΔPSNR and ΔPHVS are both negative. This means that lossy compression leads to considerable (visually noticeable) image quality degradation and it is desirable to avoid or to diminish this effect. Similar effects are observed for σ 0 ¼ 10, one has quite large P 2σ ¼ 0.78 but MSE tc ∕σ 0 is still larger than unity, while ΔPSNR and ΔPHVS are both negative. This means that the compression with QS OOP ¼ 3.5σ 0 is undesired. The situation starts radically changing for σ 0 ¼ 15 (P 2σ ¼ 0.84). The ratio MSE tc ∕σ 0 occurs to be less than unity, ΔPSNR is positive while ΔPHVS is still negative. Thus, OOP exists, at least for the metrics MSE tc ∕σ 0 and ΔPSNR. Finally, for σ 0 ¼ 20, we have an obvious OOP according to the metrics MSE tc ∕σ 0 and ΔPSNR. The observations and conclusions for other test images are similar. If σ 0 increases, P 2σ becomes larger, P 2.7σ reduces, MSE tc ∕σ 0 decreases, ΔPSNR, ΔPHVS, and CR increase. Because of these, it is more probable that OOP exists.

Data in
Such dependences are observed not only for the 8-bit images considered above. Suppose that one has multiplied all values of I n ij , i ¼ 1; : : : ; I Im ; j ¼ 1; : : : ; J Im and I true ij , i ¼ 1; : : : ; I Im ; j ¼ 1; : : : ; J Im by some constant C > 1, then more than 8 bits are needed to represent data, noise standard deviation, dynamic range, and QS OOP increase by C times, P 2σ , P 2.7σ , ΔPSNR, and ΔPHVS do not change. Thus, the conclusions are valid for other than 8 bit data representations as well.
Consider now other approaches to the aggregated presentation of the obtained data. Figure 3 shows the scatter plot (ΔPSNR versus P 2σ ) for the AGU coder with four fitted polynomials, of the second, third, fourth, and fifth orders. Fitting has been carried out with the standard MATLAB tools; several parameters characterizing fitting are presented, and the coefficient of determination R 2 and root mean square error (RMSE) are the main ones. 50 It is usually considered that an R 2 larger than 0.9 relates to good fitting and, together with a small RMSE, this indicates the possibility of accurate prediction. The obtained polynomials are also presented by scatter-plots for providing careful analysis.
Analysis of the data shows the following. A higher order of the used polynomial results in a better approximation according to all analyzed quantitative criteria. It seems enough to use the fifth order polynomial since it reflects the behavior of the considered dependence correctly and accurately. The fitted curves allow accurate determining of the point where ΔPSNR becomes positive. The best fitted curves take place for a P 2σ about 0.80 to 0.82, and this is in agreement with conclusion 6 mentioned above.  Let us present data for the prediction of ΔPSNR based on P 2.7σ . Note that the tendencies for the polynomial order are the same as for the scatter plot of ΔPSNR versus P 2σ (see data in Table 3). For this reason, we present only the data for the fifth order polynomial in Fig. 4(a). The fitting is not as good as for the scatter plot ΔPSNR versus P 2σ (R 2 is slightly smaller and RMSE is larger). This means that it is better to use the statistical parameter P 2σ for the prediction.
Let us now analyze the results obtained for the ΔPHVS prediction. Data for the prediction based on P 2σ are presented in Table 3 (the lower part). As is seen, better results are obtained by increasing the polynomial order where the fifth order seems to be enough [see also the scatter plot and the fitted curve in Fig. 4(b)] Obviously, OOP exists only if P 2σ exceeds 0.9. Analysis of data obtained for P 2.7σ has shown that again, the prediction is less accurate (R 2 reaches 0.946 and RMSE attains 0.47 for the fifth order polynomial).
Let us consider data for the ADCT coder (the recommended QS ¼ 3.5σ 0 ). Some data in table format are presented in Ref. 29. It confirms conclusions 1 and 2 given above. Here, we prefer to represent the best results as scatter plots for the fifth order polynomial and the prediction based on probability P 2σ (the results for the probability P 2.7σ are again worse). These scatter-plots are depicted in Fig. 5.
The scatter plot in Fig. 5(a) is very similar to the scatter plot in Fig. 3(d). The only difference is that the fitted curve crosses the zero level for a smaller P 2σ (about 0.75). This means that the probability of having OOP for the ADCT coder in practice is larger than for the AGU coder. Other conclusions for the AGU coder practically coincide with the conclusions 3 to 5 for the AGU coder given above.
The scatterplot in Fig. 5(b) is very similar to the one in Fig. 4(b). The conclusions 7 and 8 given above for AGU are, in general, valid for the ADCT coder as well. The difference is that the fitted curve crosses zero level at a smaller P 2σ (about 0.87). This means that the ADCT coder provides a slightly better visual quality of the compressed noisy images (under the condition of the same QS).
The results for the AGU-M coder have been obtained for SF ¼ 2.0σ 0 . Some quantitative data are given in Table 3 in the paper. 29 Here, we prefer to present the obtained results as scatter-plots (see Fig. 6). The best fitting has been provided by the fourth (for ΔPSNR) and fifth (for ΔPHVS) order polynomials. Therefore, we give the fitted curves only for these cases.
Analysis shows the following. OOP does not occur for the complex structure images corrupted by nonintensive noise. According to the metric ΔPSNR, there is a high probability that OOP exists if P 2σ exceeds 0.8 [this is seen in Fig. 6(a)] or if P 2.7σ is less than 0.11. According to the metric ΔPHVS, OOP very likely exists if P 2σ exceeds 0.9 [analyze the zero crossing in Fig. 6(b)] or if P 2.7σ is less than 0.05.
There are no obvious advantages of the AGU-M coder compared to the AGU and ADCT coders. The results are similar for ΔPSNR if P 2σ > 0.7 and for ΔPHVS if P 2σ > 0.5. However, there are essential differences for a small P 2σ , especially for the metric ΔPSNR. It might seem a serious drawback of the AGU-M coder that ΔPSNR for P 2σ < 0.5 is about −6 to −12 dB. But this is not a problem. Large values of ΔPSNR happen for the cases of nonintensive noise when PSNR or is high (over 40 dB, see Table 1). Then, PSNR tc is also large, and the introduced distortions are not seen. The values for ΔPHVS of about −3 to −6 dB can be regarded as large (sufficient loss of visual quality) if the PSNR-HVS-M nc is rather low (e.g., 30 to 35 dB). But ΔPHVS of about −3 to −6 dB occurs for noisy images with P 2σ < 0.5, i.e., for nonintensive noise. Then the visual quality is very high and distortions in compressed data cannot be noticed. Consider now the results for the ADCT-M coder (SF ¼ 2.0σ 0 ). For brevity, let us present only the scatterplots (Fig. 7) and the fitted polynomials of the fourth (for ΔPSNR) and fifth (for ΔPHVS) orders. Analyzing the metric ΔPSNR, we can conclude that OOP exists if P 2σ exceeds 0.80 [see the fitted curve in Fig. 7(a)] or if P 2.7σ is lower than 0.10. If the metric ΔPHVS is considered, compression in an OOP neighborhood is quite probable if P 2σ exceeds 0.88 [analyze the fitted curve in Fig. 7(b)] or if P 2.7σ is less than 0.06. The results are, in general, very close to the corresponding data for the AGU-M coder (see Fig. 6). Compared to the AGU and ADCT coders, the ADCT-M coder does not have obvious advantages.
Completing our analysis for the test images and simulated AWGN, we can state the following. The proposed prediction methodology is fast and accurate enough. The AGU coder and, especially, the ADCT coder provide, on average, better results for the considered cases of noisy image lossy compression than the AGU-M and ADCT-M coders, especially when oriented on visual quality. There could be three types of practical situations.
The "first one" takes place for low intensity noise which is invisible in the original images. It is difficult to expect that OOP exists, and formal lossy compression (according to quantitative criteria) leads to a worse image quality. This is usually not a problem since the introduced distortions are small and they are either invisible or hardly noticeable. Then it seems reasonable to carry out compression at OOP according to the earlier recommendations. Such a decision can be undertaken if, e.g., P 2σ < 0.6.
The "second situation" is that ΔPSNR and ΔPHVS are negative (or ΔPSNR is positive but small while ΔPHVS is negative). This happens if 0.6 < P 2σ < 0.8. Then if lossy compression is performed in an OOP neighborhood, there is some filtering effect but the introduced distortions are at the same level or more. In such a case, it seems reasonable to carry out "a more careful" lossy compression. An example and a more thorough discussion will be given in the next section.
The "third practical situation" relates to P 2σ exceeding 0.8. It seems obvious that it is worth carrying out lossy compression in an OOP neighborhood. Recall here that we assume that there is no limitation on CR provided by our approach to lossy compression. That is, we discuss how to make compressed data appropriate for further use (having a high enough quality) simultaneously wishing to provide a CR adaptive to image and noise properties and sufficiently larger than that for lossless compression.

Real-Life Processing Results
For the aforementioned second situation (0.6 < P 2σ < 0.8), it seems a reasonable decision to have a smaller CR than that reached for QS or SF recommended CRs for attaining OOP. One possible decision could be to apply a smaller QS or SF, e.g., QS ¼ 2.0σ 0 for the AGU and ADCT coders and SF ¼ 1.2σ 0 for the AGU-M or ADCT-M coders. It is expected then that noise suppression is smaller, but distortions are also less than for QS ¼ 3.5σ 0 or SF ¼ 2.0σ 0 .
Let us see what happens in this case. Figure 8(a) presents the noisy image Airfield (σ 0 ¼ 10). The noisy image has been compressed by the AGU coder with QS ¼ 3.5σ 0 (in OOP), and the result is demonstrated in Fig. 8(b). There is a noise-filtering effect that can be observed in homogeneous image regions (see the wide road in the left lower part). The introduced distortions are seen as well (sharp edges are partly smeared). The provided CR ¼ 6.35, ΔPSNR and ΔPHVS are both negative (they are equal to −0.64 and −2.61 dB, respectively, see data in Table 2), P 2σ ¼ 0.78, i.e., we deal with the second situation and, according to our recommendation, it is reasonable to use a smaller QS.
The image compressed by the same coder but with QS ¼ 2.0σ 0 is shown in Fig. 8(c). The visual analysis allows stating that the compressed image quality is higher. This conclusion is confirmed by analysis of quantitative metrics: ΔPSNR and ΔPHVS are equal to −0.23 dB and −1.10 dB, respectively, i.e., they are sufficiently smaller than in the previous case. The negative outcome of such a decision is that CR reduces to 4.0. Certainly, in practice, one has to take into account both the quality aspects of the compressed images and the CR. The final decision is to be undertaken by a user.
Along with a hard decision to apply a smaller and fixed QS or SF if 0.6 < P 2σ < 0.8, soft adaptation is also possible. One variant is to use Only this variant has been applied to the componentwise compression of real-life data. Based on the obtained data and the analysis carried out, we can propose the following automatic procedure for RS sensing data lossy compression. The first operation is estimation of the noise characteristics, e.g., noise variance for AWGN for each image to be compressed if noise characteristics are a priori unknown (the case when noise is not additive is shown below). Then the statistical parameter P 2σ has to be calculated. It can be substituted into the corresponding expression to calculate a given metric (e.g., ΔPSNR) for a given coder (it can be needed if a decision on the PCC setting is undertaken by a human or is based on ΔPSNR and/or ΔPHVS automatic analysis). The determination of ΔPSNR and/or ΔPHVS could be unnecessary if the decision is undertaken on the base of analysis (comparison of the thresholds 0.6 and 0.8) of P 2σ , as shown at the end of Sec. 3.
The procedure described above can be applied componentwise if multichannel images have to be compressed. Let us show this for Hyperion hyperspectral data. We have to take into account several aspects here. First, the noise in such images is not additive (it is signal dependent 23,24,27 ). This shortcoming can be overcome by applying a special algorithm of a generalized Anscombe    20 and 28 show that this is true. Third, for real-life RS data, we do not have a true image, so no metric Metr tc can be calculated. Thus, we can only calculate the predicted values of ΔPSNR and/or ΔPHVS based on estimated P 2σ or P 2.7σ and look at how these predicted values behave, depending upon the sub-band index. Fourth, there are sub-bands in Hyperion RS data, which are usually not exploited (with indices 1 to 12, 58 to 76, 224 to 242). We also do not present data for these sub-bands.
Let us start by analyzing the values P 2σ and P 2.7σ for real-life data (the set EO1H1800252002116110KZ). The plots are given in Fig. 9. Analysis shows that both probabilities vary practically from zero to unity. Therefore, our decision undertaken above to obtain fitting curves for the entire intervals of probability variation was correct (see the beginning of Sec. 3). Joint analysis of the plots P 2σ and P 2.7σ shows that they behave in an "opposite manner" where the sum P 2σ þ P 2.7σ is close to unity.
Analysis also shows that there is a very small percentage of sub-bands for which P 2σ > 0.8, i.e., for which compression in an OOP neighborhood providing visible effects of noise filtering can be carried out. These are sub-bands with indices 13, 14, and 127. There are also many subbands (most sub-bands of the optical range, 14 < n < 21, and some sub-bands of the infrared range (n ¼ 83, 125, 127, 173, 175, 177 to 181) for which "careful" compression is desirable. Recall here that n from 13 to 57 relates to the optical range and from 83 to 224-to the infrared range of the sensor Hyperion. Fig. 9 Dependences of (a) P 2σ and (b) P 2.7σ on sub-band index n for real-life Hyperion data set EO1H1800252002116110KZ (sub-bands with indices 1 to 12, 58 to 76, 224 to 242 are usually not exploited).
These conclusions also follow from the analysis of the predicted values of ΔPSNRðnÞ for the AGU coder shown in Fig. 10. As is seen, the obtained prediction results are very similar and they usually differ by not more than by 1 dB. The predicted values for ΔPHVSðnÞ based on P 2σ are depicted in Fig. 11. They confirm the conclusions and recommendations given above. The number of sub-bands for which quality improvement of images compressed in the OOP neighborhood is expected is quite small; this is probable only for the sub-bands 13 and 14 of the optical range.
The dependence CRðnÞ for fixed QSðnÞ ¼ 3.5 (recommended for OOP, noise STD equals to unity after the generalized Anscombe transform) is presented in Fig. 12 (blue line). This dependence correlates with all dependences presented in Figs. 10 and 11. It means that CR is quite large (exceeds 10) for sub-bands where noise is quite intensive (or, more correctly, where the input PSNR or SNR are quite small). In other sub-bands, noise is negligible. Because of this, compression is carried out by "carefully" introducing relatively small and practically invisible distortions. As a result, CR is in the limits 4 to 10.
Even more "careful" compression with a slightly smaller CR takes place if QSðnÞ is set adaptively according to Eq. (10) [CR for such sub-bands is shown by the red color, it can be considerably smaller than for QSðnÞ ¼ 3.5]. Note that the direct VST has been carried out as described in the paper. 29 It might seem that the obtained values of CR are only slightly larger than those for the advanced lossless compression techniques. However, we feel that the proposed approach can be modified to 3-D compression applied in groups of sub-bands. 21,29 Then an increase of CR by about twice is to be expected without additional degradation of the compressed image quality. Figure 13 shows the original (uncompressed) image in the 14th sub-band and the image compressed using the proposed procedure with VST (after decompression and inverse VST). It is seen that noise is partly suppressed and the introduced distortions are not essential. In    The proposed approach to lossy compression has been also tested on the hyperspectral images acquired by airborne sensor AVIRIS. A peculiarity of this sensor is that it has four ranges with 32 sub-bands in the optical range and 64 sub-bands in the three latter infrared ranges. The dependences of P 2σ ðnÞ and ΔPSNRðnÞ for the standard test image Lunar Lake are shown in Fig. 15. As is seen, there are many sub-bands for which P 2σ ðnÞ > 0.8 and for which  0.6 < P 2σ ðnÞ < 0.8. The obtained dependence of CRðnÞ is given in Fig. 16, where the curve for a fixed QSðnÞ ¼ 3.5 is again shown in blue color and the red dots indicate CR values obtained for a QSðnÞ determined according to the adaptive algorithm [Eq. (10)] for the AGU coder.
The conclusions that follow from the analysis of data in Fig. 16 are the following. There are sub-bands where the noise intensity is high (e.g., optical range sub-bands with n ¼ 1, 2, 3, 4) for which P 2σ is large and the compression is carried out, providing OOP and CR > 10. Meanwhile, there are also sub-bands where the compression is "careful" and, thus, the CR is only about 4.
In our opinion, the proposed approaches and the obtained results can be also useful for applications other than RS. First, the results for the optical test images (see Tables 1 and 2) do not differ from the results obtained for the test RS images, i.e., the observed dependences and tendencies are general enough. Second, the results can be interesting for the researchers dealing with the compression of medical images and video where the prediction of ΔPSNR or ΔPHVS can be exploited to properly set QS or SF.

Conclusions and Future Work
We have demonstrated that the existence of OOP for a given image corrupted by AWGN can be quite accurately predicted if the variance of AWGN is a priori known or accurately pre-estimated. Such a prediction can be performed for several quality metrics. Moreover, the metric values for OOP can be predicted and used in further analysis.
Prediction curves have been provided (dependences) for four lossy compression techniques based on DCT and have been compared. The prediction is very fast and is quite accurate (the prediction error rarely exceeds 1 dB) and this happens if the fourth or fifth order polynomials are used for fitting.
Practical recommendations for componentwise lossy compression are given. If OOP exists, then the lossy compression in the OOP neighborhood is expedient. If, according to the prediction, OOP does not exist, then "more careful" compression is necessary. This can lead to less distortion being introduced and a better quality of compressed data. Also, it seems possible to carry out an adaptive setting of QS or SF depending on image local content. This is one possible direction for future research.
The proposed approach can be put into the basis of componentwise compression of multichannel RS images, as has been shown for hyperspectral RS data of Hyperion and AVIRIS sensors. We expect that 3-D compression of such data collected into groups might increase CR sufficiently. This is another possible direction for future research.