Spatiotemporal image-fusion model for enhancing the temporal resolution of Landsat-8 surface reflectance images using MODIS images

Abstract Our aim was to evaluate a spatiotemporal image-fusion model (STI-FM) for enhancing the temporal resolution (i.e., from 16 to 8 days) of Landsat-8 surface reflectance images by utilizing the moderate-resolution imaging spectroradiometer (MODIS) images, and assess its applicability over a heterogeneous agriculture dominant semiarid region in Jordan. Our proposed model had two major components: (i) establishing relationships between two 8-day MODIS composite images acquired at two different times (i.e., time 1 and time 2); and (ii) generating synthetic Landsat-8 surface reflectance images at time 2 as a function of Landsat-8 images available at time 1 and the relationship constructed in the first component. We evaluated the synthetic images with the actual Landsat-8 images and observed strong relations between them. For example: the coefficient of determination ( r 2 ) was in the range: (i) 0.72 to 0.82; (ii) 0.71 to 0.79; and (iii) 0.78 to 0.83; for red, near-infrared (NIR), and shortwave infrared ( SWIR 2.2   μ m ) spectral bands, respectively. In addition, root mean square error (RMSE) and absolute average difference (AAD) values were: (i) in between 0.003 and 0.004, and 0.0002, respectively, for red band; (ii) 0.005 and 0.0003, respectively, for NIR band; and (iii) 0.004 and in between 0.0001 and 0.0002, respectively, for SWIR 2.2   μ m band. The developed method would be useful in understanding the dynamics of environment issues (e.g., agriculture drought and irrigation management), which require both relatively high spatial (i.e., 30 m) and high temporal resolution (i.e., 8 days) images.


Introduction
Since the 1970s earth observation remote sensing satellites have been providing an enormous amount of information in the form of images for monitoring various environmental phenomena. 1hese images can be categorized into several classes on the basis of their spectral characteristics, such as (i) visible (i.e., in the range 0.4 to 0.7 μm); (ii) near-infrared (NIR: 0.7 to 0.9 μm); (iii) shortwave infrared (SWIR: 1.0 to 2.5 μm); (iv) thermal infrared (TIR: 3 to 14 μm); and (iv) microwave (i.e., 3 mm to 3 m).In general, different satellites may acquire images over a particular spectral range; however, these images may differ significantly in both spatial and temporal resolutions.Usually the relatively high spatial resolution images have low temporal resolution, and vice versa. 2 For example, the moderate-resolution imaging spectroradiometer (MODIS) and Landsat satellites have similar spectral ranges and also provide consistent surface reflectance [3][4][5] while they vary in their spatial and temporal resolutions (see Table 1 for details).It is interesting to note that some environmental issues (e.g., agricultural drought, irrigation management, and grassland) require high spatial (e.g., 30 m) and high temporal (e.g., weekly) resolutions [6][7][8] due to the rapid changes in these applications.In order to address this, a new data fusion research area (called as spatiotemporal data fusion) has emerged during the past several years.These techniques have been applied and successfully predicted synthetic high resolution images for different environmental issues such as vegetation indices, 5,9 evapotranspiration, 10 urban heat island, 11 forest change detection, 12 and surface temperature. 24][15][16][17][18] In addition, some researchers have used other data, such as: (i) Landsat images and medium resolution imaging spectrometer (MERIS) having spatial resolution in the range 300 to 1200 m with 3 days temporal resolution; 19,20 and (ii) HJ-1 CCD satellite images having 30-m spatial resolution with 4 days temporal resolution and MODIS. 21In general, these techniques can be broadly divided into three groups, (i) the spatial-temporal adaptive reflectance-fusion model (STAR-FM)-based techniques, (ii) unmixingbased fusion techniques, and (iii) sparse representation-based fusion techniques.Some of the example cases are briefly described in Table 2.
In general, the above mentioned techniques were complex and required relatively long processing times.In order to address these, we developed a new data fusion technique called spatiotemporal image-fusion model (STI-FM) and demonstrated its effectiveness in enhancing the temporal resolution of Landsat-8 surface temperatures using MODIS data. 24Here, we proposed to evaluate the applicability of the STI-FM with required modifications in generating a synthetic Landsat-8 image [i.e., synth-Lðt 2 Þ] for the red, NIR, and SWIR spectral bands by integrating a pair of MODIS images [i.e., Mðt 1 Þ and Mðt 2 Þ] and a Landsat-8 image [i.e., Lðt 1 Þ].The rationale behind choosing these spectral bands were that these had been widely used in the calculation of vegetation greenness and wetness conditions and are effective in monitoring these parameters in short time periods, such as the plants' growing seasons.Our aim with regard to this paper was to implement this technique over a heterogeneous agriculture-dominant semiarid region in Jordan, Middle East.

General Description of the Study Area
The country of Jordan is located in the Middle East.It is divided into three major geographic areas (i.e., Jordan Valley, Mountainous Heights Plateau, and Eastern Desert) [see Fig. 1(a)].Our study area [Fig.1(b)] is located in the northwestern part of the Mountainous Heights Plateau, where the elevation varies in the range 600 to 1100 m above mean sea level.Geographically, it is located between 31°47′N to 32°29′N and 35°37′E to 36°00′E covering approximately 3500 km 2 [see Fig. 1(b)].In terms of climate, it experiences Mediterranean climatic conditions with a hot and dry summer (i.e., average temperature ∼25°C with no rainfall during May to August); and cool and wet winter (i.e., average temperature ∼5 to 7°C with 250 to 650 mm rainfall during

Data Requirements and Its Preprocessing
In this study, we employed remote sensing data acquired by two satellite systems: (i) Landsat-8 Operational Land Imager freely available from United States Geological Survey (USGS); and (ii) MODIS freely available from National Aeronautics and Space Administration (NASA).For both satellites, we selected the spectral bands of red (i.e., in the range 0.62 to 0.67 μm), NIR (i.e., 0.84 to 0.88 μm), and SWIR (i.e., 2.10 to 2.29 μm).For Landsat-8, we obtained five almost cloudy-free images acquired between June and August 2013 at 30-m spatial resolution.On the other hand, we obtained nine MODIS-based 8-day composite of surface reflectance products (i.e., MOD09A1) at 500-m spatial resolution for the same period.The 8-day composite images would lessen the probability of cloud-contamination of the daily MODIS products. 27The selected dates are presented in Fig. 2.

Preprocessing of MODIS images
The acquired MODIS surface reflectance products were originally provided in sinusoidal projection.We used the MODIS reprojection tool (MRT 4.1) 28 to subset the images into the spatial extent of the study area and reproject them into the coordinate system of Landsat-8 images [i.e., Universal Transverse Mercator (Zone 36N-WGS84)].Then, we coregistered these images using Landsat-8 images to allow for accurate geographic comparisons and to reduce the potential geometric errors (e.g., position and orientation) as effects of spatial miss-registration can influence the derived information.Finally, we also checked the cloud-contaminated pixels by using the quality control band (i.e., 500 m flag; another layer available in the MOD09A1 dataset) and excluded them from further analysis.

Preprocessing of Landsat-8 images
The Landsat-8 images were available in the form of digital numbers (DN).These DN values were converted into top of atmosphere (TOA) reflectance using the following equation illustrated in Ref. 29: where ρ TOA is the band-specific Landsat-8 TOA reflectance; M is the band-specific multiplicative rescaling factor; A is the band-specific additive rescaling factor; and θ SE is the local sun elevation angle at the scene center.The values of A, M, and θ SE were available in the metadata file of each image.
In order to transform the TOA reflectance into surface reflectance, we employed a simple but effective atmospheric correction method that would not require any information about the atmosphere conditions during the image acquisition period.This was done using MODIS surface reflectance images based on the fact that MODIS and Landsat have consistent surface reflectance values. 3,4,13,30This was accomplished in three distinct steps.In the first step, we employed an averaging method over a moving window of 17 × 17 pixels (i.e., approximately the equivalent of 500 × 500 m) for up-scaling pixels of Landsat-8 ρ TOA images into the spatial resolution of MODIS images (i.e., 500 m).This was done in order to make both the Landsat-8 and MODIS images similar in the context of their spatial resolutions and to increase the spectral reliability. 313][34][35] In the second step, we determined linear relationships between the up-scaled Landsat-8 and MODIS images for each of the spectral bands by generating scatter plots between them.The coefficients of the linear relationships (i.e., slope and intercept) were then used with the original Landsat-8 ρ TOA images (i.e., 30 m) in order to generate Landsat-8 surface reflectance images in the scope of the third step.It would be worthwhile to mention that the use of the Landsat Ecosystem Disturbance Adaptive Processing System atmospheric correction algorithm was not applicable for Landsat-8 due to the absence of climate data records. 36Finally, we employed the Landsat-8 quality assessment (QA) band for determining the cloud-contaminated pixels and excluded them from further analysis.In order to establish relations between the two MODIS images [i.e., Mðt 1 Þ and Mðt 2 Þ], we performed the following steps: • We calculated a ratio image [i.e., Mðt 2 Þ∕Mðt 1 Þ] using the pair of MODIS images for each of the spectral bands of interest in order to determine the rate of the temporal change between the two dates.• We plotted Mðt 1 Þ by Mðt 2 Þ in order to assess the trends of changes in surface reflectance between the two images; this could be determined using a pixel-wise linear regression analysis.
• Based on the results of the previous two steps, the ratio image was classified into three clusters of land cover change on the basis of assuming that approximately AE15% variation (rate of change) in surface reflectance (i.e., albedo) would be common for various natural surfaces (e.g., conifer forests, deciduous forest, agriculture crops, grass, etc.). 37,38These clusters included: (i) negligible changes [i.e., rate of change is within AE15%; We considered that this was an important issue that was not taken in consideration in other spatial and temporal fusion models.For each of the three clusters, we produced cluster-specific scatter plots between Mðt 1 Þ and Mðt 2 Þ, and performed linear regressions (see Fig. 4 for details).
In generating the synthetic Landsat-8 surface reflectance image [i.e., synth-Lðt 2 Þ] at 8-day intervals, we employed the Landsat-8 image acquired at time 1 [i.e., Lðt 1 Þ] in conjunction with the classified image and cluster-specific linear regression models derived from the previous steps.In order to perform this combination, we applied different conditional linear functions based on MODIS classified image to assign the surface reflectance value of each pixel in the synthetic image instead of using one linear equation for the entire scene.For example,  in order to generate synthetic Landsat-8 image of the day of year (DOY) 185, we first calculated the linear regression between MODIS image on DOY 169 to 176 and DOY 185 to 192, and then we applied the regression coefficients to Landsat-8 LST image of DOY 169.Here, our image-fusion model assumed that the linear relationship between two MODIS images should be applicable and comparable to the linear relationship between the two corresponding Landsat-8 images as they would have consistent values.Therefore, the use of this linear regression would be practical for generating synthetic Landsat-8 images.

Validating STI-FM
Upon producing the synth-Lðt 2 Þ images, we evaluated them with the actual Lðt 2 Þ images acquired at 16-day intervals as the actual Landsat-8 were only available at every 16 day temporal resolution.In this case, we used two methods: (i) qualitative evaluation that involved visual examination and (ii) quantitative evaluation using statistical measurements, such as coefficient of determination (r 2 ), root mean square error (RMSE), and absolute average difference (AAD).The equations for these statistical measures are as follows:  MODIS surface reflectance at time 1, M(t 1 ) Negligible change; M(t 2 ) M(t 1 ) Negative change; M(t 2 )<M(t 1 ) Positive change; M(t 2 )>M(t 1 ) Fig. 4 Conceptual relationships between the two MODIS images at two different times.
4 Results and Discussion that the negligible change clusters revealed the highest correlation values because no significant changes occurred in the study area during the two 8-day composite MODIS images of interest at 16-day intervals.Note that we were unable to compare our findings as there were no similar studies found in the literature so far.Although the use of 8-day composites of MODIS might reduce the cloud-contamination, it might bring another issue.For example, two consecutive MODIS images might potentially be apart in the range of 2 to 16 days as the 8-day composite were generated based on the minimum-blue criterion, 27 which coincided with the most clear-sky day during the composite period of interest.The quantification of the impact of such an 8-day composite against daily data was, in fact, beyond the scope of this paper, but might be an interesting issue for further exploration.

Evaluation of the Synthetic Landsat-8 Surface Reflectance Images
Prior to conducting quantitative evaluations, we performed qualitative evaluations by comparing the actual and synthetic Landsat-8 images.In these cases, we generated pseudocolor composite images by putting the NIR, red, and SWIR spectral bands in the red, green, and blue color planes of the computer; and such an example is shown in Fig. 6.In general, we observed that the visual clues (e.g., location, shape, size, and texture in particular) were reproduced in the synthetic images with negligible differences in comparison to that of the actual images.However, the tones (i.e., the DN representing the surface reflectance values) had some differences.These might happen due to the use of 500-m spatial resolution MODIS surface reflectance images in calculating the Landsat-8 surface reflectance values at 30-m resolution.
Figure 7 shows the relationship between the actual Landsat-8 surface reflectance images and the synthetic Landsat-8 surface reflectance images for red, NIR, and SWIR 2.2 μm spectral bands for the DOY 169, 185, 201, and 217.It demonstrated that strong relations existed between the actual and synthetic images for the spectral bands of interest over the period of study.In the context of linear regression analysis, the r 2 , slope, and intercept values were in the range: (i) 0.72 to 0.82, 0.86 to 0.98, and 0.0009 to 0.042, respectively, for red spectral band; (ii) 0.71 to 0.79, 0.80 to 0.87, and 0.166 to 0.0800, respectively, for NIR spectral band; and (iii) 0.78 to 0.83, 0.91 to 0.94, and 0.0096 to 0.0367, respectively, for SWIR 2.2 μm spectral band.In the context of RMSE analyses, they were: (i) in between 0.003 an d0.004 for red spectral band; (ii) 0.005 for NIR spectral band; and (iii) 0.004 for SWIR 2.2 μm spectral band.In addition, the AAD values were 0.0002, 0.0003, and between 0.0001 and 0.0002 for the red, NIR, and SWIR 2.2 μm spectral bands, respectively.
It would be worthwhile to note that our findings were quite similar or even better in some cases compared to other studies.For example: (i) Gao et al. 13 18 used ESTDFM and observed r 2 values of 0.73 and 0.82, and AAD values of 0.009 and 0.0167 for red and NIR spectral bands, respectively.It is also important to mention that the proposed model would be applicable for other satellite systems that would have similar spectral and orbital characteristics, such as other Landsat series, MODIS, MERIS, and ASTER.In addition, it is also interesting to point out that the model is a relatively simple and easily reproducible approach, which might satisfy most of the user's needs; such a simple and less sophisticated method might be the most suitable for different applications.Although our results demonstrated strong relations between actual and synthetic Landsat-8 images, some issues would be worthwhile to consider for further improvements, such as: • In this study, we used MODIS surface reflectance images at 500-m spatial resolution.
However, it would be possible to use such images acquired at 250-m spatial resolution in the case of red and NIR spectral bands in particular, which might enhance the quality of the synthetic image. 31• One of the major requirements for the input images [i.e., Lðt 1 Þ, Mðt 1 Þ, and Mðt 2 Þ] was to be free from cloud-contamination.However, it might not be possible to have images completely free from such contamination.In such events, we might use the cloud-infilling algorithm described in Chowdhury and Hassan 39 that required images acquired at current and previous dates.• Although the use of MODIS surface reflectance products to generate Landsat-8 surface reflectance images led to the prediction of reasonable synthetic Landsat-8 surface reflectance images, it would be interesting to use climate data records and compare its outcome with the method adopted here.However, such climatic records were not available for Landsat-8 images at the time of this study. 40

Concluding Remarks
In this study, we demonstrated the applicability of the STI-FM technique for enhancing the temporal resolution of Landsat-8 images from 16 to 8 days using 8-day MODIS based surface reflectance images and demonstrated its implementation over heterogeneous agriculture-dominant semiarid region in Jordan.Our results showed that the proposed method could generate synthetic Landsat-8 surface reflectance images for red, NIR, and SWIR spectral bands with relatively strong accuracies (r 2 , RMSE, and AAD values were in the range 0.71 to 0.83; 0.003 to 0.005; and 0.0001 to 0.0003, respectively).In general, our method would be considered as a simple one because it would not require any correction parameters or high quality land-use maps in order to predict the synthetic images.Despite the accuracy and simplicity, we would recommend that the proposed method should be thoroughly evaluated prior to adoption in other environmental conditions except for semiarid regions like ours.

Fig. 1
Fig. 1 (a) Map of Jordan illustrating three major geographic regions; (b) a Landsat-8 image at 30-m spatial resolution covering the study area as shown by the black polygon.

Fig. 3
Fig. 3 Schematic diagram of the proposed spatiotemporal image-fusion model for enhancing the temporal resolution of Landsat-8 surface reflectance images.

2 6 4 PX
ðA ðtÞ − A ðtÞ ÞðS ðtÞ − S ðtÞ Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P ðA ðtÞ − A ðtÞ Þ 2 jS ðtÞ − A ðtÞ j;(4)where AðtÞ and SðtÞ are the actual and the synthetic Landsat-8 surface reflectance images; A ðtÞ and S ðtÞ are the mean values of the actual and the synthetic Landsat-8 images; and n is the number of observations.

Figure 5 Fig. 5
Figure 5 shows the relation between 8-day composites of MODIS images acquired in two different dates for the spectral bands of red, NIR, and SWIR 2.13 μm during the period June 2 to August 12, 2013.It revealed that a strong relation existed for each of the clusters (i.e., negligible change, negative change, and positive change) over all of the spectral bands during the period of observation.For example: the r 2 , slope, and intercept values were in the range: (i) 0.85 to 0.95, 0.95 to 1.07, and 0.003 to 0.01, respectively, for the negligible change cluster; (ii) 0.81 to 0.94, 0.77 to 0.94, and 0.0005 to 0.04, respectively, for the negative change cluster; and (iii) 0.79 to 0.91, 1.05 to 1.21, and 0.01 to 0.08 respectively, for the positive change cluster.Regression analysis showed

Figure 6 aFig. 6
Fig. 6 Example comparison between pseudocolor composite images by putting the NIR, red, and SWIR spectral bands in the red, green, and blue color planes of the computer, respectively, for actual and synthetic Landsat-8 images during June 18, 2013.Note that the panels [(a 1 ), (b 1 )], [(a 2 ), (b 2 )], [(a 3 ), (b 3 )], and [(a 4 ), (b 4 )] show enlarged images for agricultural land, forest, water body, and urban area, respectively, for both actual and synthetic images.Note that the synthetic image was produced using an image pair of Landsat-8 and MODIS acquired in the DOY 153 and one MODIS image acquired in the DOY 169.

Table 1
Comparison between spectral, spatial, and temporal resolutions of MODIS and Landsat-8 images.

Table 2
Description of some of the spatiotemporal data fusion techniques implemented over the visible and shortwave infrared spectral bands.Mðt 1 Þ, Mðt 2 Þ, and Mðt 3 Þ] and Landsat images taken at two times [i.e., Lðt 1 Þ andLðt 3 Þ] in order to predict synth-Lðt 2 Þ.Lðt 1 Þ with Mðt 1 Þ] and [Lðt 3 Þ with Mðt 3 Þ]; and (iv) generating synth-Lðt 2 Þ by integrating the outcomes of steps (ii) and (iii), and Mðt 2 Þ November to February); and two transition seasons (i.e., spring during March and April; and fall during September and October).In general, about 70% of the annual evaporation is observed in the dry season (i.e., May-August) with an annual potential evaporation of ∼1900 mm.In our 25,26) unmixing of the three MODIS images using those endmembers; (iii) predicting two synthetic images by calculating the sum of Lðt 1 Þ and Lðt 3 Þ with the corresponding difference images of unmixed MODIS images; and (iv) the two predicted images were temporally weighted to generate the final predicted image Used MODIS images taken at three times [i.e., Mðt 1 Þ, Mðt 2 Þ, and Mðt 3 Þ] and Landsat images taken at two times [i.e., Lðt 1 Þ and Lðt 3 Þ] to generate synth-Lðt 2 Þ.The technique consisted of four steps: (i) enhancing the three MODIS images to the equivalent spatial resolution of Landsat image (i.e., 30 m) through a sparse representation technique; (ii) building a dictionary pair using the two counterpart MODIS and Landsat images [i.e., Mðt 1 Þ and Lðt 1 Þ; Mðt 3 Þ and Lðt 3 Þ] and the other MODIS image [i.e., Mðt 2 Þ]; (iii) predicting the difference image of Landsat data using the learned dictionary pair; and (iv) reconstructing the predicted Landsat image using different weighting parameters study area, the major agriculture activities include: (i) agricultural cereal crops (i.e., wheat and/or barley grown between November and July) account for ∼65.5% of the area; (ii) orchards (i.e., olives, apple, nectarine, etc.) occupy ∼25.5%; and (iii) grazing and forestry account for ∼9.0%.25,26

Table 2 (
Continued).Then two transition images taken at t 1 and t 2 and the Landsat image [i.e., Lðt 1 Þ] were used to predict the synth-Lðt 2 Þ by using a high pass modulation technique Acquisition dates of moderate-resolution imaging spectroradiometer (MODIS) and Landsat-8 imagery used in the study.Figure3shows a schematic diagram of the STI-FM framework.It consisted of two major components, (i) establishing the relationships between MODIS images acquired at two different times [i.e., Mðt 1 Þ and Mðt 2 Þ], and (ii) generating the synthetic Landsat-8 surface reflectance images at time two [i.e., synth-Lðt 2 Þ] by combining the Landsat-8 images acquired at time 1 [i.e., Lðt 1 Þ] and the relationship constructed in the first component and its validation.
55plemented STARFM over boreal forest and obtained AAD values of 0.004, 0.0129, and 0.0078 for red, NIR, and SWIR 2.2 μm spectral bands, respectively; (ii) Roy et al.14applied a semiphysical fusion model over two study sites in the United States (Oregon and Idaho) and got AAD values of 0.015, 0.22, and 0.28 for Oregon site for red, NIR, and SWIR 2.2 μm spectral bands, respectively; (iii) Zhu et al.15applied ESTARFM over heterogeneous regions and achieved AAD values of 0.0095 and 0.0196 for red and NIR spectral bands, respectively; (iv) Walker et al.5used STARFM to generate synthetic Landsat ETM+ surface reflectance images over dry-land forests; and found that the r 2 values were 0.85 and 0.51 for red and NIR spectral bands; (v) Song and Hang 23 employed a sparse representation-based synthetic technique over boreal forests and found r 2 values of 0.71 and 0.90; RMSE values of 0.02 and 0.03; and AAD values of 0.01 and 0.21 for red and NIR spectral bands, respectively; and (vi) Zhang et al.