Object-oriented detection of building shadow in TripleSat-2 remote sensing imagery

Abstract. The projection of objects on the earth’s surface caused by the sunlight produces shadows. They are inevitable in high-spatial-resolution satellite remote sensing images and reduce the accuracy of change detection, land cover classification, target recognition, and many more applications. Dark-colored land covers in these satellite images, such as water bodies, road, and soil, appear with similar spectral properties as those of shadows and often result in difficulty in shadow detection, especially in complex urban settings. We propose an object-oriented building shadow extraction method and tested it using six selected study areas from TripleSat-2 satellite imagery with 3.2-m spatial resolution. The method’s main steps include (1) selecting six image features that can highlight the shadow information and then segment the image based on edge; (2) extracting shadow region based on multiple object features; and (3) masking nonbuilding shadow regions by the shadow and dark object separation index, image features including spectral, textural, and geometric features, and contextual information. The average precision, recall, and F1-score of the shadow detection were 85.6%, 88.6%, and 87.0%, respectively, and the ranges were 73.0% to 91.0%, 76.6% to 94.1%, and 74.7% to 91.2%, respectively. Compared with multiscale segmentation, edge-based segmentation is more efficient and helpful to completely and accurately extract shadows.

it includes four bands: blue (B), green (G), red (R), and near-infrared (NIR). Six typical cases in one image were selected and are shown in Fig. 1. The case image size is 400 pixels × 300 pixels. The distribution characteristics of the shadows and dark land covers are shown in Table 1 and are used in Sec. 3

.3.
Among the dark land covers, water body is the first to be easily confused with building shadows, as seen in Figs. 1(a), 1(c), 1(e), and 1(f). Next is asphalt road and dark vegetation, as seen in Figs

Proposed Method
As the dark area formed by the sun's rays project onto an opaque object, shadow is divided into a self-shadow (the part of the object that is not illuminated by direct light) and a cast shadow (the shadow projected by the object based on the direction of the light source), 54 and the latter can be further divided into the umbra and penumbra. The direct light is completely obscured in the umbra, but only a small portion of the direct light is blocked in the penumbra. The penumbra is usually located between the umbra and the nonshadow and may be ambiguous, 4 resulting in a blurry boundary between the shaded and nonshaded areas.
In satellite remote sensing image, due to the limitations of the quality and spatial resolution, the presence of a cast shadow is obvious, while that of self-shadow is often weak. In a cast shadow, the penumbra accounts for a small portion only 8 and is difficult to distinguish from the umbra. Therefore, the cast shadow was extracted in this paper.
The object-oriented shadow method consists of four parts: (1) extracting image features that can enhance the shadow information and then segmenting the image based on edge detection to obtain the object; (2) extracting the suspected shadow regions based on the statistical properties of the object; (3) refining the shadow combined the spectrum, texture, and geometric features with spatial context information to remove other nonbuilding shadows; and (4) postprocessing.
The orthorectification image with digital number (DN) is used as the input. The four bands were denoted as B, G, R, and NIR in order of increasing central wavelength.

Feature selection and extraction
The selected feature should highlight the shadow information and be positive to the image segmentation to obtain objects with the closed boundary. Previous studies 38,39,55 have shown that (1) the brightness (intensity) of the shadow area is low due to the occlusion of sunlight; (2) the atmospheric Rayleigh scattering effect is sensitive to the short-wavelength blue and violet bands, but insensitive to the NIR band; and (3) the C3 component of the shadow area has a high value and a high saturation. Therefore, six image features were selected from C1C2C3, the HSV color space, and the remote sensing index: C3, RATIO H_V , RATIO S_V , NIR, the normalized difference vegetation index (NDVI), and the visible atmospherically resistant index (VARI).
(1) C3 The C3 component of the C1C2C3 color space highlights the difference between shadow and greenish objects (water bodies, green areas, plastic runways, etc.) 54 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 2 9 0 C3 ¼ arctan B maxðR; GÞ : (1) (2) RATIO H_V and RATIO S_V Shadow appears more clearly in the HSV color space than in the RGB color space. RGB is converted to HSV according the following equations: 56 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 9 0 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 1 3 3 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 9 8 V ¼ where θ ¼ arccos Based on the relationship between S and V, Ma et al. 57 constructed the normalized saturationvalue difference index (NSVDI) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 6 1 1 Referring to the selected S and V in NSVDI, the ratio of S to V can be formed E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 5 5 6 A comparison in Fig. 2 showed that RATIO S_V can highlight the shadow better than NSVDI. In Fig. 2, (a) is a local part of Fig. 1(b), (b)-(e) are images of RATIO S_V , NSVDI, and their edges. As seen in Fig. 2, RATIO S_V better highlights the edges of the shadow and dark road. Figure 3 shows the relationship between RATIO S_V and NSVDI in Figs. 2(b) and 2(c). At higher values, the RATIO S_V index rises rapidly and presents a better discrimination.  (3) NIR, NDVI, and VARI In the NIR band, it is easier to distinguish shadow from nonshadow than in the visible band. 4 NDVI 58 can highlight the DN difference between a shadow area and vegetation. 59 VARI is less sensitive to atmospheric effects 60 and is used as a supplement to NDVI E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 6 8 0 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 6 2 6

Image segmentation
The image features C3, RATIO H_V , RATIO S_V , NIR, NDVI, and VARI were first linearly stretched by 2% to enhance the information, and then their value ranges were changed to the range [0, 255]. The image segmentation procession is shown in Fig. 4. First, the edges of each image feature are detected and then the union set of these edges is taken as the image edge to ensure that the shadow region is not missed. Pixels are connected to get the segmented image if their edges touch.
The Canny operator 61 in MATLAB ® 2018b 62 is used to detect the edges of each image feature. The edge's union is postprocessed by the skeleton and bridge operator 62 with a window size of 3 × 3 to make the edge width one pixel. Finally, a pixel search is made around the outermost image boundary and the current pixel is set as an edge if it touches the edge.
The nonedge areas are the image object.

Suspected Shadow Area Detection
The shadows are initially detected by the image features C3, NSVDI, and NDVI. The obvious shadows should have higher C3, lower NSVDI, and NDVI. First, the average of each band value is calculated as the attribute of the image object, then the C3, NSVDI, and NDVI of the object are calculated. The Otsu method 43 is used to determine the threshold to divide the feature into a high-value area and a low-value area. The intersection of the low-value area in NDVI and NSVDI with the high-value area of C3 was used as an initial shadow area S0.

Nonbuilding Shadow Area
A nonbuilding shadow area is shadow caused by objects other than buildings, consisting of dark land cover such as water bodies, buildings, soil, and roads. It is divided into two types: (1) type 1, the spectrum differs from that of building shadow, including dark buildings, dark soil, and dark roads. (2) Type 2, the spectrum is similar to that of the shadow, including dark water bodies and very dark land cover.
Type 1 area was extracted by the spectral features first; then S0, after being masked with type 1, was used to get the type 2 area by the multirule.

Type 1 area
Rayleigh scattering is inversely proportional to the wavelength. The smallest difference in radiant energy between the shaded and nonshaded area appears in the blue band and the largest in the NIR band. 63 In addition, the shaded area has a higher saturation (S) and a lower value (V) in the HSV color space. 39 Based on these conclusions, the shadow and dark object separation index (SDSI) is proposed to expand the difference between the shadow and the nonshadow: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 4 1 5 where B and NIR are the DNs of band B and band NIR in the original image, S and V are calculated by Eqs. (3) and (4), and the range of values of B/NIR and S∕V are all normalized to 0 to 1. α is an environmental parameter indicating the proportion of the shaded area in an image, α ∈ ½0; 1. Confusion between building shadows and dark roads can be reduced by using a different α. Taking the original image in Fig. 4 as an example, the shadow with the typical α is shown in Fig. 5. Figure 5(a) (α ¼ 0) has the best dark removal effect. With an increase of α, the effect worsens, and the building shadows on some vegetation are removed when α ≥ 0.8, as in Figs. 5(f) and 5(g). Section 4.2 further discusses the effect of α on the accuracy of shadow detection.

Type 2 area
After masking the type 1 area from S0, the type 2 area is extracted by integrating spectral, textural, geometrical, and spatial context information.
In a high-spatial-resolution image, the shadows spectra often have the highest similarity to the spectrum of dark water bodies. According to the object boundary shape, a water body is divided into slender and nonslender.
(1) Slender water body A river is an example of a slender water body, as shown in Fig. 1(c). Its shape can be described by the ratio d of the long-axis length l 1 and the short-axis length l 2 of the object in image E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 4 1 2 For a slender water body, d > T d and l 1 > T l . The threshold T d is set to 10 and T l is set to 50 here.
(2) Nonslender water body A nonslender water body is shown in Figs. 1(a), 1(e), and 1(f), where (c) is the river, (e) is the pool, and (f) is the lake.
Spectral and textural features are used to extract this kind of water body. Compared with the shadow spectrum, the water body spectrum often appears as a larger value in the G band and a smaller value in the NIR band. Based on this, some remote sensing indices were designed to distinguish water bodies, such as (G-NIR)/(G + NIR) (normalized difference water index, NDWI), 64 G/NIR, 64 and G-NIR.
It is also possible to extract a water body by segmenting the G band and NIR band. At first, the image was segmented by the Otsu method 43 into the high-and low-value areas, then the intersection of the high-value area of G and the low-value area of NIR can be taken as the water body.
In general, the pixel value in the area of a water body tends to be uniform. This can be described by the sum of the standard deviations (SSD) of the pixel values of each band, where the SSD is low in water bodies and high in shadow areas.
Four methods are used to extract the shadow when a nonslender water body exists. (1) The NDWI low-value area intersects with the SSD high-value area. (2) The G/NIR low-value area intersects with the SSD high-value area. (3) The G-NIR low-value area intersects with the SSD high-value area. (4) The G low-value area is combined with the NIR high-value area, and then intersects with the SSD high-value area. The results are shown in Fig. 6.
For shorter shadows in cases (e) and (f), NDWI and G/NIR are more suitable to distinguish shadows with water bodies. For longer shadows in case (a), method 4 is more suitable. More test cases containing water bodies validate this conclusion.  Fig. 4, ( After masking the water body, the shadow accuracy in Fig. 6 is decreased in the order of method 1, method 2, method 3, and method 4. Method 4 may not extract the water body and the other methods may remove some shadows, but can extract the water body more completely. The appropriate method depended on the image features of a specific image. For example, method 4 is the best for case image (a), whereas method 1 or 2 is the best for case images (e) and (f). As described in Sec. 3.1.1, the ratio G/NIR is more distinguishable than NDWI in the high-value areas. More multispectral test images of TripleSat-2 showed that method 2 or method 4 often achieves better accuracy.
In a city, buildings and water bodies are often surrounded by green plants, trees often grow on both sides of a road, and a building shadow is adjacent to the building. Therefore, the following rules were constructed (S1 denotes the image after removal of the water body): (1) Extract vegetation coverage (VC) area. NDVI > 0 is regarded as VC. In addition, the blue building and the vegetation appear similar in the NDVI image in some cases, and the obvious blue building area can be eliminated by B < T v , where threshold T v is a value between the max DN of the vegetation area and the min DN of the blue building area.
T v is not used in our six case images. (2) To eliminate edges, the VC is dilated by 5 Ã 5 to obtain VC dilate , and S1 is dilated by 3 Ã 3 to obtain I dilate . (3) VC dilate and I dilate are combined to obtain VI, which represents the collection of vegetation, shadows, and water bodies. (4) Traversing through region S1, most pixels in VI (usually 95% or more) are water and are assigned as 0, otherwise, the pixel is a shadow and assigned as 1.
The image S1 processed by the above steps is expressed as image S2.

Postprocessing
To obtain a more unbroken shadow distribution, S2 is processed as follows: (1) Remove any small patches with an area smaller than T a , where T a ¼ 9 (3*3, basic window size) in case images. (2) Perform a morphology closing operation.
(3) Fill in the holes in each region with T h < 30, where T h is the maximum pixel numbers of the hole. (4) Pixel search at the outermost boundary of the image and set the current pixel as an edge by the four-neighbor rule.

Threshold
The difference between the bright object and the shadow was segmented by the Otsu method, 43 including the segmentation of NDVI, NSVDI, and C3 in Sec. 3.2, and SDSI, G and NIR, and SSD in Sec. 3.3. The specific thresholds include T d and T l in Sec. 3.3, and T a and T h in Sec. 3.4. These thresholds need to be specified for a specific image.

Shadow Detection Accuracy
The shadow detection workflow is divided into four steps: step 1 image segmentation; step 2 extraction of the shadow area; step 3 nonbuilding shadow area removal; and step 4 postprocessing.
Step 3 can be further divided into step 3.1, removal of type 1 area, and step 3.2, removal of type 2 area. The necessary steps include step 1, step 2, and step 4; step 3 is optional.
The parameter α of SDSI in step 3 is 0.5, which means that the two components in SDSI are equally important. The images containing water bodies are cases (a), (e), and (f). According to Sec. 3.3.2, the water body removal method in case (a) is method 4, and that of cases (e) and (f) use method 2.
Taking manually identified shadows as the ground truth, precision (P s ), recall (U s ), and F1-score (F 1 ) were used to evaluate the detection accuracy as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 1 1 6 ; 2 8 4 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 1 1 6 ; 2 3 0 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 1 1 6 ; 1 9 7 where TP denotes the number of pixels correctly detected as shadows, FN denotes the number of pixels in which shadows were detected as nonshadows, and FP denotes the number of pixels in which nonshadows were detected as shadows.
The results of the shadow detection in six case images are shown in Fig. 7 followed by the four steps. The shadow ground truth was obtained by visual interpretation and is shown in Fig. 8.
In Fig. 7, step 1 shows that the segmented image is consistent with the source image, and step 2 shows that SDSI can better eliminate the interference of other dark land covers. In step 3, all water bodies in the images, except in (a), are better removed. In step 4, the pond in the upper left corner of case (a) is removed by the rule of contextual information, and the river in cases (a) and (e) is removed by the rule of geometric shape.
According to the F1-score and visual comparison, cases (b) and (f) have the best accuracy without obvious errors or omissions. The omissions in cases (a) and (e) mainly appear in the area of light shadows and small shadows. Overextraction of the vegetation shadow appears in cases (a), (c), and (d). Case (c) also extracts the road shadow.  The error of the precision is mainly related to the vegetation shadow. The error of the recall is mainly due to the fact that the penumbra is often blurred on the image and is difficult to identify and extract. Owing to the skeleton and other operations in image segmentation, small shadows of <3 pixels in case images cannot be extracted.
The lowest accuracy appears in (e) where the shadow is short and small and there is a large area of dark soil and a dark water body as interference.

Parameter Assignment
The shadow detection of Sec. 4.1, step 3 and step 4, are affected by parameters.
Step 3 needs to determine the parameter α of SDSI.
Step 4 needs to assign the number of pixels of the largest fragment and the largest hole according to the actual image.
The following is a discussion of SDSI and its parameter α.
In remote sensing images, the sensitivity of dark land cover in different color spaces is different. We integrated an RGB color space and HSV color space to deal with this complex situation, and two components, B/NIR and S∕V, were combined in SDSI. More tests showed that S∕V can better highlight the difference between shadows and dark roads in summer images, whereas B/NIR performs better in winter images.
Parameter α indicates the proportion of the shaded area representing the degree of influence of the segmentation of B/NIR and S∕V on the result. The larger α is, the greater the influence of B/NIR on the result is, and the smaller the influence of S∕V on the result is. The parameter α is changed from 0 to 1 with an increment of 0.1, and its relationship with shadow detection accuracy is shown in Fig. 9.
SDSI can also be used as an index for directly separating shadows from other dark objects. Without the steps in Sec. 3.2, the shadow detection accuracy corresponding to different α is shown in Fig. 10. Figure 9 shows that the sensitivity of SDSI varies in different images, and the precision of shadow extraction varies greatly with different α. Figure 10 shows that SDSI can also achieve  better results in images [e.g., cases (b) and (e)] without objects that are easily confused with shadows (e.g., water body), indicating that SDSI can also be used as an index of shadow detection.

Comparison with Results of Multiscale Segmentation
The multiscale segmentation algorithm is one of the most widely used and is included in typical remote sensing software tools. Unlike the top-down image segmentation in this paper, multiscale segmentation is bottom-up.
We use eCognition 51 to handle the multiscale segmentation. The simple yet robust estimation of scale parameter (ESP) tool enables fast and objective parametrization when performing image segmentation, 65 but for our experimental image with a spatial resolution of 3.2 m, it produced an insufficient segmentation. Thus, we still choose the segmentation parameters manually. After several tests, four multiscale segmentation parameters including scale, band weight, smoothness, and compactness are determined manually according to the visual display of a color image, and they are 40, 1:1:1:4, 0.1, and 0.5, respectively. The shadow distribution based on this segmentation is shown in Fig. 11 and the precision is listed in Table 3. The errors and omissions of shadows in Fig. 11 are consistent with that in Sec. 4.1. However, compared with Figs. 7, 8, and 11, multiscale segmentation cannot extract different scales of patches at the same time due to the different scales of various ground features, which easily leads to the omission of shadows and light shadows.
The average precision, recall, and F1-score of a shadow based on multiscale segmentation are 78.3%, 67%, and 71.9%, respectively, and their ranges are 55.5% to 87.6%, 61.1% to 71.3%, and 58.2% to 78.6%, respectively. Compared with the results in Table 2, the accuracy of the method is lower and the range is larger.
In high-spatial-resolution image applications, multiscale segmentation often achieves better results. However, for the shadow detection with 3.2-m spatial resolution image in this paper, the performance of multiscale segmentation is poor. The disadvantage of multiscale segmentation is that it requires many manual adjustments, and these adjustments need to take into account the scales of different regions in the image, which requires more experience in practical applications. In our method, the closed and sharp edge is used to determine the boundary of the object, which ensures the integrity of the ground object after segmentation and results in better performance.

Conclusions
A method and workflow of building shadow detection in HSRS remote sensing image based on different image features and edge-based image segmentation were proposed in this paper. Our method is highly automated and has a fast computation speed and the performance of making small-size shadows into independent objects is better than that of multiscale segmentation.
The six case images from the TripleSat-2 multispectral sensor showed that average F1-score of shadow detection accuracy was 87%, indicating that the proposed method has a good reference value for shadow detection in a high-spatial-resolution remote sensing image. The extracted shadow can be used as a mask for invalid areas or for uncertainty analysis and can provide effective help for change detection, remote sensing classification, etc.
The adjustable parameter involved in this method is α of SDSI, which needs to be adjusted according to the image. The empirical value is 0.5.
The availability of the method and workflow for other sensor images needs more validation, including how to better combine the advantages of edge segmentation with those of region segmentation to detect shadows with higher accuracy. This needs further research.

Acknowledgments
This work was supported by the Project No. 41471283 of the Natural Science Foundation of China. The authors declare that this research was conducted in the absence of any business or financial relationships that could be construed as a conflict of interest.