Cloud shadow removal for optical satellite data

Abstract. An improved cloud shadow removal algorithm for high spatial resolution optical satellite data over land is presented. The method is based on the matched filter method, which consists of the calculation of a covariance matrix and the corresponding zero-reflectance matched filter vector and the computation of the shadow function. The new additions consist of the usage of an improved cloud shadow map and further evaluations performed on the shadow function. The performance of the cloud shadow removal algorithm incorporated in the software package Python-based atmospheric correction (PACO) is compared to the deshadowing algorithm in atmospheric correction on a set of 25 Sentinel-2 scenes distributed over the globe covering a wide variety of environments and climates. Furthermore, an evaluation of the relative ratio between clear and shadow pixels with and without deshadowing is performed. The visual, spectral, and statistical results show that the new additions performed on the deshadowing algorithm can improve the cloud shadow removal performance used so far.


Introduction
For optical remote sensing of the Earth's surface, clouds and their shadows have always been a major disadvantage, since a lot of remote sensing applications are impacted by their presence. These applications involve, for example, radiation, image classification, the calculation of surface reflectance or land surface temperature, vegetation indices, etc. 1,2 The annual cloud coverage of the Earth lies around ∼70%. 3 Therefore, it is inevitable that no observations of a specific location on Earth will be continuously cloud-and cloud shadow-free, and the information that can be extracted from a scene will have a high percentage of degradation. 4 This means that scientists will have to find ways to work around or with the presence of clouds and cloud shadows. This is especially crucial for land applications for which the amount of usable data per scene and specific timing is of high importance, for example for crop yield estimation. 5 Furthermore, the use of cloud and cloud shadow free images enables to determine ground properties of the Earth's surface [6][7][8][9] and facilitates crop monitoring tasks. 2 Even geological applications 10 are disturbed if clouds and their shadows cover parts of high spatial resolution optical satellite data. This proves how important it is nowadays to have a correct and exact masking of clouds and cloud shadows as preprocessing step for atmospheric correction (ATCOR) and shadow removal of multi-spectral imagery. Hence, in the past years, more and more cloud and cloud shadow detection and removal approaches have been developed [11][12][13][14][15][16][17][18] and used to enable various applications. *Address all correspondence to Viktoria Zekoll, vz13110@my.bristol.ac.uk To undergo a cloud shadow removal algorithm, the cloud and their shadows have to be detected and mapped. The detection of clouds can be done by studying each satellite scene separately using a mono-temporal approach [19][20][21][22][23][24][25][26] or through a multi-temporal methodology 27,28 and hence studying a time series of images. For the detection of the correct location and geometry of a cloud shadow, the direction of observation is crucial since they represent projections of clouds in an image. 1 In this paper, a mono-temporal cloud shadow detection approach is used as preprocessing step for cloud shadow removal, called thresholds, indices, projections (TIP) method. 29 Due to remotely sensed optical imagery of the Earth's surface being contaminated by cloud and cloud shadows, the surface information underneath a cloud covered region cannot be retrieved with optical sensors. The surface information underneath a cloud shadow, on the other hand, can be retrieved since the ground reflected solar radiance is a small non-zero signal. Now the total radiation signal that is measured at the sensor is composed out of a direct beam and a diffuse, reflected skylight component. This means that even if there is no direct solar beam arriving at the sensor from the shadow region, there will still be some information arriving from the reflected diffuse flux. 30 The proposed shadow removal method works with this knowledge and uses the estimate of the fraction of direct solar irradiance for a fully or partially shadowed pixel as basis for the removal algorithm. The aim is to provide an improved cloud shadow removal algorithm based on the current version of the matched filter proposed by ATCOR. 30 As opposed to the IDL ATCOR algorithm, the new cloud shadow removal algorithm is implemented into the Python-based atmospheric correction (PACO) software.
In this paper, the multispectral instruments (MSIs) of the Copernicus Sentinel-2 satellites are used. 31 The MSIs are sensors on-board of the satellite, which allow free access to the data and a high revisit time. 32 The 13 spectral bands of a Sentinel-2 scene are composed of 4 bands at 10 m, 6 bands at 20 m and 3 bands at 60 m spatial resolution. Furthermore, the PACO atmospheric processor is used. This is the python-based version of ATCOR. For PACO, the input data are in L1C radiances in units of [mW∕ðcm 2 Ã sr Ã μmÞ]. If the input scene is given in terms of top of atmosphere (TOA) reflectance, it has to be converted into TOA radiance. PACO performs the ATCOR using the spectral information in all bands by resampling them to a 20-m cube, yielding an image cube of 13 bands with a size of 5490 × 5490 pixels. This so called "merged cube" will be a Sentinel-2 TOA cube considered in the rest of this study.

Radiation Components and Surface Reflectance
The radiation signal in the solar region (0.35 to 2.5 μm) arriving at the sensor is due to four different components. 33,34 • Path radiance, L path : from photons that did not have contact with the ground and are scattered into the field-of-view of the sensor. • Ground reflected radiation from a pixel, L ground : the fraction of the diffuse and direct solar radiation incident on the pixel that is getting reflected from the surface. • Reflected radiation from the surrounding, L adj : the fraction of the solar radiation reflected from the neighborhood and scattered by the air volume into the field-of-view of the sensor. This radiation is also called adjacency radiation. • Reflected terrain radiance from opposite mountains, L terrain . Figure 1 shows the four different components arriving at the sensor. For a full evaluation of the radiation component in rugged terrain, please refer to Ref. 34 and Sec. 6.2 in Ref. 30.
From the four components, only the reflected radiation from a pixel contains the necessary information about the viewed pixel. Hence, in ATCOR, it is important to remove the other components and to retrieve the correct ground reflectance from the pixel of interest.
If we now combine all four components of the radiation to get the total radiation arriving at the sensor we can write 3 Methods

MF method: surface reflectance and covariance matrix
Deshadowing is the compensation process, which uses an estimate of the fraction of direct solar irradiance for a fully or partially shadowed pixel. The MF method needs at least one channel in visible and at least one spectral band in the NIR. The bands used in the MF are: blue, green, red, near-infrared (NIR), short-wave infrared 1 (SWIR1), and short-wave infrared 2 (SWIR2), if existing.
The method starts with the calculation of the surface reflectance image cube, ρ. The surface reflectance, ρ, is computed with the assumption of fully solar illumination, excluding water and clouds. Then the covariance matrix, CðρÞ, is calculated where ρ represents the surface reflectance vector of the three selected bands [see Eq. (2)] E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 7 ; 3 1 4 The MF vector, V MF , is tuned to a certain target reflectance spectrum, ρ t , to be detected. ρ is the scene-average spectrum without water and cloud pixels. For the shadow target, a target reflectance spectrum of ρ t ¼ 0 is selected, which will give the simplified version of the shadow MF vector, V sh , as follows: 35 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 7 ; 2 2 7

MF method: unscaled shadow function
The MF shadow vector, V sh , can be applied to the non-water and non-cloud part of the scene to give the un-normalized values, Φ, also called unscaled shadow function. Φðx; yÞ gives a relation measure of the fractional direct illumination for each pixel x; y [see Eq. Φðx; yÞ ¼ V T sh ðρðx; yÞ − ρÞ:

MF method: rescaling and scaled shadow function
The MF calculates a minimum RMS shadow target abundance for the entire scene. The values of Φ can be both, positive and negative. Therefore, Φ is rescaled into the physical range from 0 (full shadow) to 1 (full direct illumination). The histogram of the unscaled shadow function is used for rescaling and an illustration can be found in Fig. 3 of Ref. 36. The first peak of the histogram of Φ, Φ 2 , represents the shadow pixels. On the other hand, the highest peak of the histogram, ϕ max , represents the fully illuminated areas.
The rescaling of Φ is done by linear mapping of the Φ values from the unscaled interval (Φ min , Φ max ) onto the physically scaled interval (0,1). Hence, the scaled shadow function, Φ Ã , is calculated as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 4 ; 6 0 9 The scaling and normalizing of the MF vector into the (0,1) interval is based on the assignment of: • min and max direct sun fraction in the shadow regions (a min and a max ; the defaults are a min ¼ 0.20 and a max ¼ 0.95). 36 • Corresponding shadow thresholds Φ min and Φ max obtained from the normalized histogram of Φ.
The normalized and scaled shadow function is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 4 ; 4 7 1

MF method: iteration
The potential shadow pixels are those which satisfy ϕ n < 1, but as can be seen from Eq. (6), the value strongly depends on Φ max . Therefore, an iterative strategy is applied and the exact steps of the ATCOR MF iteration method can be found in Ref. 36.

Deshadowing reflectance equation
The scaled shadow function, Φ n , represents the fraction of the direct illumination for each pixel in the surface reflectance vector, ρ. The MF method tries to find the core shadows and then subsequently expands these core regions. This enables a smooth shadow to clear transition.
The scaled shadow function is only applied to the pixels in the final mask. The core shadow mask is defined by the pixels with Φðx; yÞ < Φ T , where Φ T is a threshold that is set in the neighbourhood of the local minimum of the histogram. The final deshadowing is performed by multiplying the direct illumination, E dir , with the pixel-dependent Φ n . This reduces the direct solar term and increases the brightness of a shadow pixel, since it is located in the denominator of the deshadowing equation [see Eq. (7)] E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 4 ; 2 1 6

Proposed Method: Cloud Shadow Removal MF Method with New Additions
The proposed cloud shadow removal algorithm was created for data acquired by satellite/airborne sensors in multispectral and hyperspectral imagery. It is computed to create a fully automated shadow removal algorithm and based on the main concept of the MF method. 30,36 The MF method was first implemented from the IDL ATCOR version into the python-based PACO software. The new cloud shadow removal algorithm incorporates the main equations from the MF method used by the ATCOR deshadowing algorithm after. 30 To improve the results, additions have been added and are proposed in this paper.
Zekoll: Cloud shadow removal for optical satellite data Figure 2 shows a flow chart of the eight main steps performed during the new cloud shadow removal algorithm. The steps that have to be performed are: the calculation of the surface reflectance; the evaluation of the cloud shadow map with the TIP method after; 29 the calculation of the MF vector, which gives the unscaled shadow function, Φ; the calculation of the scaled shadow function, Φ Ã ; the calculation of the minimum direct sun fraction, a min and the normalized scaled shadow function, Φ n ; and the final step of the shadow removal.
The first step of the new method is the calculation of the surface reflectance for the bands required to perform the MF evaluation (blue, green, red, NIR, SWIR1, and SWIR2, if available). Constant atmospheric conditions, a standard atmosphere and a fixed visibility are assumed. Hence, the default of the visibility is set to 30 km, which corresponds to an aerosol optical thickness at 550 nm of 0.32 for sea level.
The new additions are the cloud shadow map from the TIP method developed in Ref. 29 and a newly defined iteration over the scaled minimum shadow function, a min .
The TIP cloud shadow detection is based on thresholds, indices and projections and has been able to improve the results of the previous ATCOR cloud shadow map calculation. The detailed evaluation of the TIP cloud shadow map calculation can be found in Ref. 29. Having included a better cloud shadow map will improve the results of the shadow removal algorithm.
The second newly implemented addition is performed within the computation of the matched filter vector Φ. The normalization of Φ into the physical range between 0 and 1 is evaluated with the minimum and maximum direct sun fraction in the cloud shadow area a min and a max and their corresponding shadow thresholds Φ min and Φ max . Φ min and Φ max are obtained from the normalized histogram of Φ. The default values of a max is set to 0.95, and as opposed to the previous method, the starting default value of a min is set to 0.01. Then the normalized scaled shadow function is calculated using Eq. (6).
The proposed iteration calculates the mean reflectance of all shadow pixels (Φ sh ) and all sun-lit pixels (Φ sun ) for all bands, j, and terminates when the overall absolute difference of these two values, P DðjÞ, is less than the difference of the previous iteration, P DðjÞ previous [see Eq. (8)]

X
DðjÞ ¼ absðΦ sh ðjÞ − Φ sun ðjÞÞ < X DðjÞ previous : When this limit is reached, then the two signals are as closely as possible. If the condition is not reached for a min in the range 0.01 to 0.3, then the iteration stops and takes the upper limit of a min ¼ 0.3 as default value. The iteration condition is performed for the total reflectance, hence taking into account all bands.
The final calculation of the reflectance is done using Eq. (7) with the corresponding scaled shadow function.

Results
The results of the deshadowing algorithm presented in this paper are first shown through three selected scenes from the data set where a visual and spectral comparison is performed (see Sec. 4.2). Additionally, a metric quantitative comparison for all evaluated scenes is shown and discussed in Sec. 4.3.

Data and Material for Training Set
To test the new cloud shadow removal method on a set of data, 25 Sentinel-2 (S2) scenes are chosen. A list of the investigated Sentinel-2 A and B scenes is given in Table 1. The scenes were selected to cover a wide variety of regions over the entire globe (see Fig. 3). This enables to validate the shadow removal algorithm for different continents, climates, seasons, weather conditions, and land cover classes. Furthermore, they have been selected to represent flat and mountainous sites with a cloud cover from 3% to 80% and they include the presence of cumulus, thin and thick cirrus clouds. The land cover types represented are: desert, urban, cropland, grass, forest, wetlands, and sand and coastal areas. The range of solar zenith angles is from 27 deg to 67 deg.

Cloud Shadow Removal Results
In the following section, three scenes (scene ID 18, 16, and 10 from Table 1) are chosen to show the results of the cloud shadow removal algorithm. For each scene, the deshadowing results are given for a subset to better evaluate the results spectrally and visually. In each subset, a clear pixel and a shadow pixel are selected that are located close by and represent the same ground properties. The visual and spectral comparison is done between the original scene, the new presented cloud shadow removal algorithm from PACO and the cloud shadow removal algorithm as given by ATCOR. Figure 4 shows the first scene results to be analyzed in this paper. It is scene number 18 of Table 1 To compare the new method visually with the previous version, Fig. 4 provides the subset from the original scene and the two deshadowed subsets from PACO (new version) and ATCOR (old version) in the middle row, respectively. For each subset the same zoom is provided (see Fig. 4 bottom row). In this image zoom, a clear pixel and a cloud shadow pixel are chosen. To not only provide visual results, the spectra of the clear pixel, the cloud shadow pixel, the PACO deshadowed cloud shadow pixel, and the ATCOR deshadowed cloud shadow pixel are presented in Fig. 5.

Netherlands, Amsterdam (scene ID 18)
The black curve of Fig. 5 represents the reflectance spectrum obtained from the cloud shadow pixel without correction. The pink curve represents the chosen clear pixel close by. The orange curve represents the reflectance spectrum of the cloud shadow pixel after deshadowing with the new PACO version. Finally the blue curve represents the cloud shadow pixel after being deshadowed with ATCOR. As can be seen from Fig. 5 both methods nicely deshadow the Table 1 Sentinel-2 level L1C test scenes. Information on scene climate, main surface cover, and rural/urban. (SZA = solar zenith angle).

Morocco, Quarzazate (scene ID 16)
In order to prove the promising results of the cloud shadow removal algorithm presented in this paper, a second scene is illustrated in Fig. 6. Figure 6 shows scene number 16 located in Quarzazate, Morocco. The scene was taken on the 30'th of August and has a zenith angle of 27.2 deg. The top left image of Fig. 6 shows the RGB ¼ 665∕560∕490 nm true color composite of Quarzazate and the top right image the chosen subset from this scene.
The same evaluation as done in Sec. 4.2.1 for scene ID 18 is performed on the Morocco example. Figure 6 shows the scene subsets of the original, PACO deshadowed and ATCOR deshadowed scene in the middle row and a zoom of the subset in the bottom row.
In Fig. 7, the reflectance spectra for the clear and cloud shadow pixel of the original scene, the PACO deshadowed scene, and the ATCOR deshadowed scene are given. The reflectance spectra again show how the ATCOR deshadowing algorithm rather overcompensates the cloud shadow pixel, whereas the PACO deshadowing algorithm follows the reflectance spectrum of the clear pixel very nicely for low wavelengths and then slowly becomes smaller.

France (scene ID 10)
As a final example, scene ID 10 is chosen. It represents a scene from France taken on the 16'th January 2016 and has a zenith angle of 66.8 deg. The top row of Fig. 8 shows on the left the original scene stretched into the RGB ¼ 665∕560∕490 nm bands for better optical comparison and the chosen subset on the right. For this example, two image zooms of the subset were taken.
As performed in Secs. 4.2.1 and 4.2.2, Figs. 8 and 10 show the scene subsets of the original, PACO deshadowed and ATCOR deshadowed scene in the middle row and the first zoom of the subset in the bottom row. In Fig. 9, the reflectance spectra for the clear and cloud shadow pixel of the original scene, the PACO deshadowed scene and the ATCOR deshadowed scene are given for     Zekoll: Cloud shadow removal for optical satellite data zoom number 1. As can be seen from the visual zoom in Fig. 8 and the reflectance spectra given in Fig. 9, ATCOR is not able to deshadow this area of the cloud shadow. The new version on the other hand performs similar to the previous examples and nicely recovers most of the reflectance spectrum. Figure 10 shows the scene subsets of the original, PACO deshadowed and ATCOR deshadowed scene in the middle row and the second zoom of the subset in the bottom row. In Fig. 11 the reflectance spectra for the clear and cloud shadow pixel of the original scene, the PACO deshadowed scene and the ATCOR deshadowed scene are given for zoom number 2. In this example of France, ATCOR performs better visually as seen in Fig. 11, but looking at the reflectance spectras, the new deshadowing algorithm gives better results.

Validation of Dataset
To evaluate the data set, a metric for the comparison of the surface reflectance retrieval without deshadowing and the deshadowed reflectance is calculated. This metric is represented by the   relative ratio of the mean reflectance vectors over all spectral bands with deshadowing and without deshadowing. Hence, the mean reflectance of all the clear pixels is divided by the mean reflectance of all cloud shadow pixels. For perfect deshadowing, the value of the relative ratio should lie as close as possible to the perfect value of þ1. Depending on the disagreement between clear pixels and cloud shadow pixels, the relative ratio will deviate from þ1.
The computation is done with the following three steps and is performed for each scene and for PACO and ATCOR.
• Calculation of the relative ratio R1: ratio of the mean reflectance vector of the clear scene pixels with ATCOR versus the mean reflectance of the shadow pixels but without deshadowing. • Calculation of the relative ratio R2 for PACO and ATCOR: ratio of the mean reflectance vector of the clear scene pixels versus the mean reflectance of the shadow pixels after deshadowing. • Comparison of R1 and R2 (see Table 2).
For an improvement in the reflectance vector after deshadowing, the relative ratio R2 should be closer to the value þ1 than the relative ratio R1. Table 2 gives the values of R1 and R2 for each scene. R2 was evaluated for the new method, PACO, and for the previous ATCOR method. The bold face numbers of Table 2 indicate the correlation coefficient with a value closest to +1 and hence with the best correlation. No cloud shadows are present in scene ID 5 and 15, hence no values for R1 and R2 are obtained.
As can be deduced from Table 2, all of the relative ratios are improved by the deshadowing algorithm for PACO apart from the case of scene ID 8. The best performance of PACO is obtained with scene ID 1 located in Gobabeb (Africa) with a value of R2 ¼ 0.999. The worst performance of PACO is obtained with scene ID 23 located in Barrax (Spain) with a value of R2 ¼ 0.596. This is due to the value of R1 for this scene to be the worst outlier and hence represents a hard scene to be deshadowed.
For the case of scene ID 8 located in Arcachon (France) a relative ratio close to þ1 for R1 is obtained, since the scene is covered by a film of haze. Hence, the overall scene appears brighter in the reflectance spectrum, even the shadows. This results in a ratio of R1 close to þ1. When the deshadowing is done for a cloud shadow with a bit of haze on top, the corrected deshadowed image does not have a film of haze covering the deshadowed area. Hence the reflectance spectrum of this cloud shadow will appear lower. This has as a consequence, that the value of R2 is a bit less than the value of R1.
In the case of the deshadowing algorithm in ATCOR, no values were obtained for the scenes 8, 9, 11, 12, and 17. For these scenes, the MF deshadowing was turned off during the ATCOR due to not enough pixels present or due to a problem in the matrix inversion. To summarize this section, it can be seen that the PACO deshadowing algorithm with the TIP cloud shadow masking and its additions to the MF method highly improves the deshadowing of the Sentinel-2 data.

Discussion
The improved deshadowing algorithm implemented in the python ATCOR (PACO) has shown very promising results and was able to improve the previous matched filter version implemented in ATCOR through its additions. Relative ratios close to the value of þ1 are reached as shown in Table 2 with a range between 0.596 and 1.245 with deshadowing, R2 (PACO) and 0.469 and 12.434 without deshadowing, R1. The high values that are reached without going through the cloud shadow removal algorithm can be explained by the presence of haze covering the scene (scene ID 8) or when the scene has a lot of water bodies which are part of the clear pixels (scene ID 17). The cloud shadow removal algorithm performed within PACO is done without any ATCOR on the haze particles. This is one additional step that can be taken into account to further improve the deshadowing algorithm. Hence, to fully correct the scene for the visibility and then perform the cloud shadow removal. The visual and spectral comparison with the deshadowing results obtained through PACO and ATCOR are able to show the achieved improvements, but also the weaknesses that will have to be changed in the future work. One of the weaknesses of the presented cloud shadow removal algorithm is the correction at the borders of the shadows. The visual results show, that ATCOR is able to better correct for the transition region between shadow pixels and clear pixels. Nevertheless, better results are obtained overall for PACO and this is also proven by the reflectance spectra shown.
The advantages of the deshadowing method presented in this paper are that it is performed through a fully automatic algorithm which is based on the matched filter method, the TIP cloud shadow masking and a iterative process of the scaled shadow function. It works for multispectral and hyperspectral imagery over land acquired by satellite/airborne sensors. Since the deshadowing algorithm relies on the TIP method after, 29 it must be noted that it is applicable for VNIR-SWIR sensors, such as EnMAP, 37 PRISMA, 38 Landsat-8, 39 and Landsat-9, 40 but for VNIR sensors, such as for example the DESIS sensor, 41 the method can only be implemented partially.
So far the new deshadowing algorithm has been tested for Sentinel-2 scenes having a geometric resolution of 20 m. Nevertheless, it can be assumed that the method will perform similar for Landsat-8, having a resolution of 30 m. For sensors with a ground sampling distance less than 5 m, additional problems will arise due to the TIP method still having to perform test for these cases.

Conclusions
An improved cloud shadow removal algorithm for high spatial resolution optical satellite data was presented. It is based on the matched filter method with the addition of an improved cloud shadow masking and an iterative process for the final reflectance value calculation. Through visual and spectral inspection it was shown that the new method improves the previous de-shadowing algorithm. This was furthermore presented through an evaluation of the relative ratio between the reflectance of clear and cloud shadow pixel and showed promising values for the new method. Future work will have to include the cloud shadow detection improvements of the TIP method and the evaluation of the cloud shadow border correction.