Introduction and Objective
Remote sensing provides large amounts of data to the scientific community in terms of spatial, spectral, radiometric, and temporal resolutions1 that can be used in a vast range of applications in agriculture,2 climate change research,3 urban management,4 ecosystems monitoring,5 and thematic mapping,6 among others. It is a very interesting field for evaluating the quality of lossy compression techniques.
Lossy data compression facilitates accessing, sharing, and transmitting huge spatial datasets in environments with limited storage or with limited bandwidth. JPEG2000 is one of the possible lossy compression algorithms, and is currently an ISO standard.7 However, lossy data compression procedures modify the original information,8 and therefore rigorous studies are needed to understand the effects and consequences of this manipulation. Some previous works have analyzed these alterations in remote sensing images in different fields and with varying objectives: spectral analysis,9 digital classification,10 texture analysis,11 stereoscopy,12 and multivariate regression,13 among others.
The present work aims at studying the effect of JPEG2000 lossy compression by determining the differences between the spatial pattern domains in the original image and the compressed image, and more particularly, the impact on the geostatistical usage of remotely sensed imagery. Comparing the geostatistical properties of compressed images before and after compression is a different and novel approach, explored here instead of the usual comparison of the global parameter peak signal to noise ratio (PSNR).14
Exploring and describing the spatial variation in images is one of the main applications of geostatistics in remote sensing15 as it can provide parameters for describing spatial patterns,16 measurements for spatial autocorrelation,17 procedures for downscaling images,18 tools for estimating continuous variables,19 data for radiometric coregionalization analysis,20 and optimal sampling designs for ground surveys.21 The variogram is an appropriate tool often used in geostatistics to carry out exploratory analyses.22 Section 2.4 of the present work outlines this geostatistic tool and its specific characteristics when it is applied to remote sensing images. It is, however, a particularly slow procedure for processing large amounts of remotely sensed data. This computational constraint makes it necessary to use distributed environments, such as high performance computing (HPC). HPC provides methods and infrastructures for distributing computation in order to reduce the total execution time. Some examples of the benefits of HPC applied to remote sensing are hyperspectral image modeling,23 fire monitoring,24 meteorological applications,25 image processing,26 some classification methods27 and web grid environments,28 among others.
In this work, parallel computing plays an important role in allowing us to perform spatial pattern analyses within an acceptable period of time. Section 2.5 details the architecture, design paradigm and language, and programming libraries, as well as results in time execution performance in order to generate variogram analyses in an efficient scientific environment.
Many different cases were studied in order to obtain robust conclusions regarding the main questions raised: is the degradation similar at short and far distances applying different compression ratios? Do the analyzed compression methods modify patterns at different directions? Do possible pattern alterations depend on three or two dimension compression methods? Are the short and large remote imagery series behaviors similar in spectral and time dimension?
The rapid generation of variograms allowed applying these massive analyses to:
• Types of sensor images: multispectral series (Landsat), hyperspectral series (CASI), and large time series (MODIS) in a spatial resolution range from 3 to 250 m.
• Landscape regions with different spatial patterns.
• Scenes with different season phenology.
• A wide range of compression ratios.
This complete test bed is detailed in Sec. 2.1.
These different scenarios allow studying the possible spatial pattern alterations generated by the lossy compression methodologies and exploring the magnitudes and properties of these hypothetical changes. The conclusions of the work provide recommendations in each specific case about applying JPEG2000 lossy compression algorithms to remote sensing images for geostatistical analysis purposes.
The proposed methodology is a chain of image processing, lossy compression procedures, and geostatistical analyses in an HPC environment. Figure 1 shows the different stages and elements of this methodology as well as the position of the parallel task within the processing chain. These stages are detailed in the following subsections.
In this study, four scenes of multispectral (seven bands) Landsat-5 Thematic Mapper (TM) images have been used.29 The images selected for this work correspond to the dates 13-4-2006, 02-7-2006, 19-08-2006, and 11-9-2006, and cover two areas of about (spatial resolution of 20 m) of the 197-031 and the 198-31 path-rows. They thus provide a set of images that is very representative of the main phenological cycle in this Mediterranean region. These two areas, in the Ripollès and Penedès counties, (Fig. 2, pink squares) are located in Catalonia, a region of approximately situated in the northeast Iberian Peninsula at the extreme southwest of Europe (map location Fig. 2). The main landscape of the Ripollès region is deciduous forest, while the Penedès region has a vineyard landscape. Landsat images were chosen because they are the remote sensing images most used for research purposes in different applications,30 and they are also used in a large variety of applied projects.31
The compact airborne spectrographic imager (CASI) is an optic sensor for hyperspectral scanning based on a small CCD bar. In this study, two images (provided by the Institut Cartogràfic de Catalunya, ICC) with 72 spectral bands from red to near infrared (409.15 to 955.70 nm and an average bandwidth of 7.6 nm) were used. 32 The two scenes selected correspond to 18-05-2007 and 19-06-2007. For these two dates, two different study areas of at a spatial resolution of 3 m (Fig. 2, yellow rectangles) called Vallès and Selva located in the Montseny region, a mountain landscape in Catalonia. CASI images were chosen as an interesting test bed for three-dimensional compression methodologies (hyperspectral images are large radiometric third dimension examples).
A moderate resolution imaging spectroradiometer (MODIS) Surface Reflectance Daily L2G Global 250 m product33 was used in this study as an example of a large time series dataset used for analyzing the geostatistical patterns of compressed and noncompressed images. The time series include 73 selected cloud free images from 2007 (from 5-5-2007 to 30-09-2007) from the same regions as the two Landsat images analyzed (Penedès and Ripollès), but covering a more extensive area. Higher spatial resolution MODIS bands were used, i.e., the near infrared and red spectral bands. The region covered is a square of centered in the square of the corresponding Landsat dataset. The study area of MODIS images was expanded in order to obtain a number of pixels with a resolution of 250 m large enough for statistical purposes, in this case .
In summary, this remote sensing material was selected because Landsat images are used most widely, CASI serves to extend the third spectral dimension, and MODIS serves to extend the third time dimension tests. Both CASI and MODIS provide large datasets, which improves the lossy compression analyses.
Different image processing methods were applied for each type of image depending essentially on the previous image processing level and on their particular characteristics.
Landsat images were acquired at the L1G processing level, and were then georeferenced and radiometrically corrected with MiraMon GIS software34 according to the methodologies detailed in Refs. 35 and 36, and applied to the SatCat server.37 The optical reflectance range of the corrected images extends from 0 to 10,000, representing the % of reflectance*100, and so numbers were written with two decimal places in a short integer. This factor and type of data (signed short integer) are also used to represent the possible range of ground temperatures in °C (derived from the radiometric correction) of the TM thermal band. All images use a value to represent nodata regions, which in this case are normally caused by sensor problems or by post-processing artifacts. Defining the nodata value as is not arbitrary because nodata values that are defined too far from the valid data range can negatively affect normalization compression procedures.38 Consistent treatment of nodata values is one of the keys to performing correct image processing and subsequent data analysis.
However, CASI images were acquired geometrically georeferenced using the SISA system developed by ICC.39 The flight orientation was SW-NE, which resulted in the geometrically corrected image being very large (), and therefore the CASI images were rotated with respect to the north projection in order to reduce the large amount of nodata pixels. The value range, data type and nodata definition in the CASI images were unified to match those of the Landsat images, also using the MiraMon software.
Finally, the MODIS images were downloaded from the Warehouse Inventory Search Tool40 as a georeferenced reflectance product. It was only necessary to clip them to the study regions already detailed in the previous section and unify the nodata values.
The reflectance product obtained was compressed at a wide range of different quality levels with the following compression ratios (CR): , , , , , , and (from soft to hard compression). Two methodologies, based on the JPEG2000 lossy compression procedures, were evaluated in this work:
• The band-independent fixed-rate (BIFR) method is an independent compressor in which no inter-band redundancy is exploited and the bit-rate is split equally among all bands. BIFR is therefore a two-dimensional (space) compression method.
• Three-dimensional discrete wavelet transform (3d-DWT) is an inter-band decorrelation technique that exploits the spectral (or temporal) redundancy between multiple bands or scenes: the third dimension. This transform is applied to the input image to obtain a spectrally (or temporally) decorrelated image.
Both compression procedures were applied using the Kakadu software,41 which is a complete implementation of the JPEG2000 standard, Part 1, completed with a great deal of Part 2 and 3.42 The BOÍ software,43 an implementation of the JPEG 2000 (Part 1) standard, was used for calculating the PSNR quality compression indicator in reference to noncompressed images.
Geostatistics is a subset of statistics specialized in the analysis, interpretation, and inference of geographically referenced data.44 It was initially developed by Matheron45 and defined as a set of methods for studying the spatial distribution of a variable from a statistical approach in order to estimate the corresponding unknown value of a particular location or simulate its variability. When this variation has a spatial structure, the regionalized variable theory makes it possible to take spatial properties into account following a stochastic approach, and assumes a constant local mean and a stationary variance of the differences between regions separated by a given distance and direction.46 The variogram (also referred to as semivariogram) is based on these previous assumptions and, as defined in Eq. (1), it plots the variance function depending on sample distance separation and, consequently, the spatial patterns analyzed. This expression implies, for remote sensing images, that pixel values variance is only a function of distance and direction between pixels; then, there is no dependence on their particularly locations and, in statistics sense, this variance pattern is stationary for all study regions. Therefore, the variogram describes the pattern distribution of image spatial resolution, and it can identify a study region, a subscene of remote sensing image in this work.1: variogram definition. Note that the dependence is on distance () and on the squared difference of values (pixel values), and independent of particular positions ( and ).
The sampled variogram can be modeled and fitted by a continuous function to identify structural parameters that characterize the spatial pattern for any quantitative variable distribution, including, of course, those of remotely sensed images, as carried out here in this study and in other previous studies47 (Fig. 3). These parameters are:
• Nugget: the variance near the origin, so at very low distances, it represents the fluctuations at scales smaller than the sampling interval (lag distance), and the component of the nonspatially correlated error.
• Range: the distance at which the variogram reaches saturation. It means that variance becomes stabilized despite distances increase and it defines the limit distance of autocorrelation.
• Sill (partial): the variance at the variogram saturation excluding nugget variance, it means a measure for spatially correlated variability.
• Total sill: sum of the partial sill and nugget. It represents the global ( correlated) variability at the autocorrelation threshold.
For each specific geostatistical application, the key role parameter depends upon the focus of a particular spatial pattern analysis. For example, range is the most suitable for autocorrelation analysis, nugget for determining stochastic noise and sill for variability’s measurements.
The total sill was chosen as the main quality measure shown in Sec. 3 for analyzing alterations in the spatial patterns at studied compression ratios and with different methodologies. Different selected plots were used for three sensors for several dates, bands, and regions, which made it possible to determine the complete variogram structure by comparing the theoretical plot in Fig. 3 with the experimental plots in Sec. 3.
Most variogram analyses are used to explore the spatial patterns of irregularly distributed samples.48 Variogram analysis is usually a previous step to interpolation (kriging), which is a process with a high computational cost.49
Nevertheless, the variogram analysis of remotely sensed images has higher computing demands to those of irregularly distributed samples. Indeed, a large number of pixels are involved even in geographically small scenes with a high or medium spatial resolution, and therefore a very large number of pairs are necessary to obtain the variogram. Table 1 compares four examples of the amount of data involved and the time necessary for executions running on a single, normal PC (Intel(R) Xeon (R) 3.0 GHz and 1 GB of RAM).
Comparison of processing time of variogram analyses between two irregularly distributed point measurements and remote sensing images. The sparse irregular distribution corresponds to a network of weather stations, and the dense, irregular, distribution to a GPS network of altimetry measurements, both detailed in Ref. 44; the remotely sensed image corresponds to two (Landsat and CASI) of the three examples used in this work.
|Type||Extension (km2)||N data||N. pairs examined||Time (s)|
|Sparse point irregular sampling||69,284.96||100||1070||26|
|Dense point irregular sampling||0.216||2855||1,627,350||946|
|Remotely sensed image: medium resolution||225||562,500||16,493,978,252||43,850|
|Remotely sensed image: high resolution||6.5||725,145||33,554,502,751||57,727|
In this work, a large number of spatial pattern analyses have been executed. All spectral bands (the number of bands for Landsat, CASI, and MODIS are 7, 72, and 2, respectively) on several scene dates (4, 2, and 73) and in two different study regions were analyzed. In addition, the images were compressed at several compression ratios (eight in all cases) and compared with noncompressed images. This implied 504 analyses for Landsat, 2592 for CASI, and 2628 for MODIS. Performing all these analyses would be a very time-consuming process; a single CASI analysis takes 57,727 s (see Table 1). Therefore, a distributed environment was the most suitable solution for a complete and efficient study.
In order to reduce the execution time, the authors carried out a parallel implementation of the variogram analyses. The master/worker programming paradigm was used in which the master process schedules, distributes, and coordinates the tasks executed by the worker processes. The code was written in ANSI-C language, including MPI (message passing interface) as a message-passing library (version 1.2.7) that is used by the master process to communicate with the worker processes. Since MPI has become a standard de-facto message passing library,50 this solution () guarantees portability to different computer platforms.
The distributed load design51 is not specifically defined and optimized for the characteristics of the images used in this work. However, it is a flexible design, tested with a wide range of image dimension that maintains a good balance and scalability suitable for other studies with a wide range of image dimensions and cluster properties. The scalability solution (adapting point samples analysis from Ref. 49 to remote sensing image analysis in the present work) can be achieved by defining a relatively small load unit: of two image rows of pixels. Each worker analyzes all possible combinations of nonrepeated pairs between two rows of the image and returns the partial variance and distance of the pairs involved to the master. The master distributes tasks (rows of data) to workers, accumulates partial results and manages input (remotely sensed image) and output (resulting variogram) tasks.
Two types of results are presented in this section: geostatistical and computational. The geostatistical results are a set of variograms that compare images compressed at different ratios and noncompressed images for three image types (Landsat, CASI, and MODIS) focusing on the spectral or time dimension and possible anisotropy. The total sill parameters of the theoretical variogram are shown in different tables, while the experimental variograms are shown in plots.
The computational results are focused on performance analyses of the parallel solution. The example shown in Sec. 3.2 corresponds to Landsat executions, but the performance results are very similar for all the image types analyzed.
As explained above, lossy compression mechanisms modify the original data information. Figure 4 illustrates the effects of compression ratios on quality visualization of the images. In this figure, the central image is a noncompressed CASI example, the top image sequence corresponds to three BIFR compressed images; from left to right, it is shown increasing compression ratios and decreasing quality. The bottom image sequence corresponds to the same sequence improved using 3d-DWT techniques. This composition summarizes most of the results presented in this section, and the following tables and figures quantify these effects in a geostatistical sense using the variogram spatial pattern analysis.
The first plot of Fig. 5 is a representative result of the alterations in the spatial pattern caused by the lossy compression (BIFR in this case) mechanisms. The main alteration to the spatial pattern due to lossy compression is the reduction in data variability at high compression ratios (more than ), although at lower CRs (from to ) the variograms have very similar shapes and structural parameters (nugget, range and sill). Table 2 is a summary of the behavior of the total sill, which is the most representative parameter in this study, for two dates and two regions for the BIFR method. Comparing the sill between compressed and noncompressed images represents a measurement of the loss of detail and variability caused by a compression method for all Landsat spectral bands. Table 3 shows the same information in relation to the 3d-DWT compression method. As seen in Fig. 6, there are only small differences between the two methods for Landsat images with the highest CR. Therefore, the following tables and figures corresponding to Landsat images only show the results for the BIFR method. The next cases (CASI and MODIS) demonstrate that 3d-DWT is a more useful method for image series that are larger than Landsat image series.
Two examples (Ripollès region on 02-07-2006, and Penedès region on 11-9-2006) of the variation in the variogram parameter total sill at different compression ratios for the BIFR method.
Two examples (Ripollès region on 02-07-2006, and Penedès region on 11-9-2006) of the variation in the variogram parameter total sill at different compression ratios for the 3d-DWT method.
The sequenced plot in Fig. 5 shows that there are no significant differences in the variogram shape between bands, although there are differences in band range variances, and consequently in structural parameters depending on the region of the landscape.
Time comparison of variogram results (total sill) for BIFR compression of Landsat images from the Penedès region on the 1-B and nIR bands.
Furthermore, lossy compression may produce different variance alterations in relation to the spatial direction. In order to analyze this possible anisotropy modification, variance pattern at different azimuth angles has been studied. Figure 8 shows that the lossy compression mechanisms used in this work maintain the expected directional patterns at all compression ratios; thus, this result proves that the previous variograms were always omnidirectional.
The main characteristic of CASI images in relation to Landsat images is their higher spectral resolution (much more data in the spectral domain). The higher spatial and radiometric resolution are also notable, but they are not the focus of the present study. It is therefore an interesting test bed for exploiting three-dimensional compression methods. As it is impossible to show the results for 72 bands for two regions and two dates, the following figures display representative results for selected bands for different regions and dates. For example, Table 5 shows the total sill parameter of the 3d-DWT method for four selected bands (10, 21, 34, and 58) in the scene (19-06-2007), while Fig. 9 plots variograms at different compression ratios for three selected bands. Figures 10 and 11 show the variogram analyses of the scene for the two regions applying the 3d-DWT compression method.
Total sill parameters for the k4p scene in the Selva region and for the complete sequence of compression ratios at four representative spectral bands.
Table 6 shows the total sill parameter for the other scene ( on 18-05-2007) in the two study regions (Selva and Vallès) comparing the BIFR and 3d-DWT compression methods at selected CRs and bands (10, 21, 34). This table comparison demonstrates that spatial patterns have better fidelity with the 3d-DWT method than the BIFR method for CASI images. Figure 12 shows that 3d-DWT plots are closer to the noncompressed patterns, and that BIFR significantly alters the spatial patterns, especially for high compression ratios.
Comparison between total sill parameters for the BIFR and 3d-DWT methods applied to the c3p scene in two regions at four representative compression ratios and three spectral bands.
The MODIS test bed allows a much larger amount of data to be explored in the time domain than the Landsat images because the MODIS daily revisit period makes it possible to obtain a large cloud-free time series. These series are especially appropriate for analyzing the pattern alterations of three-dimensional compression methods versus two-dimensional ones.
The results for the MODIS images are similar to those obtained for CASI images, but both are different from the Landsat images. In these cases, the 3d-DWT method improves the quality of BIFR compressions. Table 7 shows that, at low compression ratios, 3d-DWT and BIFR maintain the spatial variability patterns of the original images, and that, at high compression ratios, BIFR loses quality compared to the 3d-DWT method. The plots in Figs. 13 and 14 (corresponding to the same region but to different spectral bands) show a reduction in the total sill parameter at increasing compression ratios for the 3d-DWT method, while Fig. 15 shows that 3d-DWT produces the pattern variability more faithfully than BIFR.
Comparison between the BIFR and 3d-DWT methods of the total sill for four selected dates in the Penedès region for two spectral bands at representative compression ratios.
In summary, these results quantify spatial pattern alterations for lossy compression images depending on compression ratios, compression methods, and series dimension through analyzing differences between structural variogram parameters (focusing on total sill) of compressed and noncompressed images. These results shows that, for compression ratios higher than , the variogram is clearly degraded, reducing variability and thus, some capabilities or accuracy in related applications. This degradation could be partially solved, in some cases, by 3d-DWT, but this method needs a large amount of redundant data for significantly improve the BIFR method. These improvements are similar for time series and spectral series.
This significant alteration (CR higher than ) could be relevant in several geostatistical studies of remote sensing images, such as geostatistical applications referred in Sec. 1 of this work, and for example, in landscape scale classification based on variogram analysis.52 In this example, if someone uses compressed remote sensing images instead of original images, and this compression is not done with the appropriate parameters, it could significantly alter classification results and, consequently, its main goal of determining the optimal support size (i.e., optimal spatial resolution) for characterizing forest ecosystems.
All the variogram analyses were processed using the IBM cluster of the research laboratory of the Computer Architecture and Operating Systems Department at the Universitat Autònoma de Barcelona. The IBM cluster is formed by 32 nodes, each with two Dual Core Intel(R) Xeon (R) 3.0 GHz processors with 12 Gbyte of RAM, and communicated with an integrated dual gigabit ethernet. Table 8 shows the average time of 14 independent executions using a different number of workers, while Fig. 16 shows the graphical representation and corresponding speedup evaluation.53
Execution time and speedup using n (0 to 24) workers.
|No. workers||Time (s)||Speedup|
The proximity between the empirical speedup behavior and the theoretical linear speedup confirms that the parallel design and implementation provide a satisfactory solution for the computational problem at hand. These performance results demonstrate the validity of the proposed parallel solution, and the significant time reduction evidences the benefits of using distributed environments for processing large amounts of data, such as remotely sensed images. In fact, they provide an exhaustive test bed allowing running many executions with different parameters at several compression ratios in different regions on different dates, etc., and obtaining faster results and more agile comparisons.
The main conclusions of this study on the alterations in the spatial pattern produced by JPEG2000 compression methodologies applied to remote sensing images are drawn hereafter:
• Low compression ratios (less than in the present study) maintain the radiometric image variability in all distance analyses, and therefore generate very similar variograms. This general behavior is slightly different for the 3d-DWT and BIFR methods: if the third dimension is large enough (CASI and MODIS) 3d-DWT is slightly more accurate to the noncompressed images than BIFR (less than a 0.5% reduction in variability).
• High compression ratios (over in the present study) alter all spatial patterns of remote sensing images, but 3d-DWT is considerably better than BIFR for large three-dimensional images; 3d-DWT maintains about 25% more radiometric variability than BIFR.
• For Landsat images, seven spectral bands as a third spectral dimension are not enough to exploit three-dimensional compression, and therefore the BIFR method is preferable due to its simplicity.
• Despite the fact that pattern variability depends on the spectral band, all variogram alterations are very similar in all spectral bands and spatial directions.
Regarding to the computational methods developed in this work to reduce the execution time required for variogram analyses of remote sensing images:
This work was partially supported by the Spanish government, by FEDER, and by the Catalan government, under Grants TIN2009-14426-C02-02, TIN2007-64974, PTA2010-3619-I, and SGR2009-1511. The authors are especially grateful to the Institut Cartogràfic de Catalunya, ICC, for providing the CASI data. Xavier Pons is a recipient of an ICREA Acadèmia Excellence in Research grant (2011 to 2015). Finally, the authors also wish to thank to the anonymous reviewers for their constructive comments that helped to improve the manuscript.
Lluís Pesquer received his BS degree in physics in 1994 at the Universitat de Barcelona (UB) and a MS degree in computational science in 2009 at the Universitat Autònoma de Barcelona (UAB). He is researcher at CREAF since 2000 and he is currently the main developer of analysis tools, remote sensing modules and geodesy libraries of MiraMon GIS software. His main research interests are: spatial analysis, geostatistics, parallel computing in geoprocessing environments, remote sensing and applied geodesy to GIS.
Xavier Pons received his BS degree in biology in 1988, a MS degree in botany in 1990, a MS degree in geography in 1995, and a PhD degree in remote sensing and GIS in 1992, all from the Universitat Autònoma de Barcelona (UAB). His main work has been done in radiometric and geometric corrections of satellite imagery, in cartography of ecological and forest parameters from airborne sensors, in studies of the spectral response of Mediterranean vegetation and in GIS development, both in terms of data structure and organization and in terms of software writing. He has recently worked in descriptive climatology models, in modeling forest fire hazards and in analysis of landscape changes from long series of satellite images. He is Full professor at the Department of Geography of the Autonomous University of Barcelona and coordinates research activities in GIS and Remote Sensing.
Ana Cortés received both her first degree and her PhD in computer science from the Universitat Autònoma de Barcelona (UAB), Spain, in 1990 and 2000, respectively. She is currently associate professor of the Computer Architecture and Operating Systems Department at UAB. Her areas of interest are high performance computing, distributed and grid computing and also scheduling and load balancing in parallel systems. Her current research interest is focused on high performance computing applied to dynamic data driven applications systems for forest fire spread prediction.
Ivette Serral received her degrees in environmental sciences, MSc in remote sensing and GIS, both at the Universitat Autònoma de Barcelona (UAB), she is researcher at CREAF since 2005. The main objective of her work is to develop and manage GIS for the public administration: the Catalan marine and coastal information system, the Catalan healthcare information system, the Andorran environmental information system, etc. She developed new methodological tools in reference systems integration and multicriteria studies and Web geodata portals. She has been collaborating with secondary schools to disseminate GIS science and applications.