Superpixel generation for synthetic aperture radar imagery using edge-dominated local clustering

Abstract. Recently, superpixel-based methods have shown promising performance for synthetic aperture radar (SAR) image interpretation. In these methods, the statistical model-based local iterative clustering represents the mainstream of superpixel generation for SAR images. However, errors in the model parameter estimation degrade the accuracy of the model-based distance measure between a pixel and a cluster, which directly affects the performance of superpixel segmentation results. Further, the relative weight between statistical similarity and spatial proximity should be carefully selected to control the balance between boundary adherence and regularity of superpixels. An edge-dominated local clustering method is proposed to overcome these limitations. Edge information is introduced not only to define the dissimilarity of a pixel and a cluster but also to provide an adaptive grid with multiple layers for the initialization of cluster centers. Experiments on simulated and real datasets show that, compared with the previous algorithms using the statistical model-based dissimilarity, the proposed method produces superpixels, which have better edge adherence and stable performance.

Superpixel generation for synthetic aperture radar imagery using edge-dominated local clustering

Introduction
Due to the active coherent microwave imaging mechanism, synthetic aperture radar (SAR) provides high-resolution images independent from daylight, cloud coverage, and weather conditions. 1 Nowadays, SAR images have become a regular and powerful information sources for many applications, including environmental monitoring, terrain classification, etc. However, the interpretation of SAR images is still a challenging task because of their special imaging mechanism. In recent years, superpixel-based methods have attracted increasing attention for SAR image understanding. The basic concept of superpixel was first presented by Ren and Malik 2 as the local coherent regions using an oversegmentation algorithm. As superpixels group the pixels with similar characteristics into meaningful atomic regions, they can effectively capture image features and well adhere to object boundaries. Therefore, superpixels can achieve a better perceptual representation of images than pixels, as well as reduce the complexity of subsequent image processing tasks, such as segmentation, classification, object detection, and so on. Until now, most of the superpixel generation methods for SAR images with promising performance are specially tailored from the ones proposed in the computer vision community, such as normalized cut, 3 turbopixels, 4 simple linear iterative clustering (SLIC), 5 etc. Normalized cut is the most classical algorithm;it treats image segmentation as a graph partitioning problem and globally minimizes the segmentation cost. However, the high computational complexity has limited the wide applicability of this algorithm. Turbopixels is an effective method for generating superpixels, and it has been applied for SAR image analysis in much research. 6,7 It gradually dilates regularly distributed seeds using geometric flows and poses strong constraints on the uniformness and compactness of superpixels. Meanwhile, due to the stability and efficiency issues of the underlying level-set method, the generated superpixels present relatively lower adherence to boundaries, 8 and computational results show that it runs relatively slower on real-world datasets than the other OðNÞ superpixel algorithms. 5,9 On the contrary, SLIC 5 has been widely used in SAR images because of its simple concept, easy implementation, and high efficiency in practice. SLIC assigns each pixel to a cluster of the nearest seed and iteratively updates the cluster center by computing a pixel-to-cluster distance measure. However, in the original SLIC, this measure is obtained using five-dimensional (5-D) Euclidean distance in labxy space, 5 which cannot be applied directly on SAR images due to the multiplicative speckle noise. Thus, some alternative distance measures have been proposed in the last few years. For instance, Xiang et al. 10 used a distance based on pixel intensity and location similarity for SAR images that is derived from the Nakagami-Rayleigh distribution and pixel intensity ratio. Zou et al. 11 combined the generalized gamma distribution-based likelihood value with spatial distance to represent the pixel-to-cluster similarity. Yu et al. 12 proposed a distance of two patches based on the likelihood ratio test statistic following the exponential distribution and used it to measure the intensity dissimilarity of a pixel and a cluster center. For polarimetric SAR (PolSAR) images, Feng et al. 13 directly used a complex Wishart distribution-based distance as a substitute for the feature-based distance in SLIC to generate superpixels. Song et al. 14 defined a dissimilarity using the Bartlett distance, which is derived from hypothesis tests on Wishart distribution. Qin et al. 15 improved the cluster center initialization and used the revised Wishart distance for local clustering. Xiang et al. 16 defined a similarity measure that contains multiple cues, including polarimetric, texture, and spatial information.
In summary, to relieve the speckle noise effect and make the SLIC method applicable for SAR/PolSAR images, most of the existing research follows two ideas: (1) replacing the colorbased distance with statistical model based ones and making improvements and (2) combining statistical models with other features to construct a compound distance, such as D T ¼ D 1 þ D 2 þ : : : þ D n . However, there is a problem with these two ideas. First, to calculate the aforementioned pixel-to-cluster distance measures, the parameters of the statistical models should be estimated accurately in each cluster. However, the initial clusters are sampled on a regular grid and will continually change during the local iterative clustering, which means the assumption of the independent and identically distribution (i.i.d.) in the clusters is usually violated, especially in heterogeneous areas. In this situation, the estimated parameters are biased, so the accuracy of distance measures will be degraded and the performance of superpixel generation will be affected. Second, combining statistical models with other features can partly improve the accuracy of the pixel-to-cluster distance measure, but the direct adding of different distances still lacks theoretical support. If there were remarkable differences in the range and distribution of values of each distance, the addition of multiple distances derived from different features would be unreliable in some cases.
In this paper, we explore the issue of distance measure from another point of view. Motivated by Leung and Malik, 17 edge information can be directly used to define the dissimilarity between pairwise pixels in the natural images. Additionally, in SAR images, edges are not the simple sharp changes in image brightness, but significantly reflect changes in the statistical properties of each area in the images. In other words, edge information can be considered the abstraction of the underlying statistical characteristics and a bridge to connect statistics and superpixels. Thus, the edges are more perceptual and stable to represent dissimilarity between two pixels if there is an edge located in the middle of them. Liu et al. 18 computed the dissimilarity by the edge information, which is extracted by a classical region-based detector for SAR images, but the detector suffers from the scale dilemma and the orientation problem. 19 Thus, the locations of edge points are unreliable and the performance of superpixels is not satisfactory. To overcome the limitation, we adopt an up-to-date detector to extract the edge information more precisely and define an edge-dominant distance to replace the statistical model-based distance. Experimental results confirm that a reliable result of superpixels can be provided using only edge information and the superpixels can well adhere to the real edges.
Another problem with the model-based SLIC methods is that it is often difficult to make an appropriate selection of the relative weight between statistical similarity and spatial proximity. The weight is important for offering a balance between boundary adherence, compactness, and regularity of superpixels. 5 However, it is usually set manually to a constant value by trial and error, which might still not be suitable for each iteration and is often too large to lead to undersegmentations in some areas. To solve this problem, we built an initialization step for the cluster centers with an edge-adaptive grid (EG). This grid has multiple layers that are generated based on edge information and quadtree decomposition. Experiments show that it is able to reduce the negative effect caused by a large value of the relative weight and make the performance of superpixels less sensitive to the changes of weight.
The remainder of this paper is organized as follows. The proposed method is described in Sec. 2. The experiments and the performance evaluations are presented in Sec. 3. The conclusions are given in Sec. 4.

Edge Extraction
In this paper, the edge information is extracted using the degenerate filter with the weight maximum likelihood estimation (DG-WMLE) proposed in Ref. 19. The DG-WMLE method can address the scale dilemma in edge extraction and provide a better performance on the estimation of the edge strength and the location of edge points, which is extremely important and necessary for generating superpixels with a good boundary adherence. The key design of the DG-WMLE method is a degenerate filter, as illustrated in Fig. 1. The edge strength of the center pixel is estimated by the dissimilarity between the two pixels adjacent to the center pixel. And the calculation of this dissimilarity needs the noise-free intensity of the two pixels. According to Refs. 20 and 21, the noise-free value can be evaluated using the WMLE, which is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 4 5 3μ where Iðx 0 Þ means the intensity value of the SAR image with noise. The WMLE estimation on x uses all the values Iðx 0 Þ in the search window R SW ðxÞ, and the design of the window is inherited from the classic region-based filter. The weight ω is derived from the probabilistic patch-based dissimilarity using an exponential kernel 20,21 and is calculated as follows: Fig. 1 The degenerate filter design for edge extraction: l df and w df are the length and width of the search window, respectively, d df is the spacing between the two pixels for calculating the edge strength at the center pixel, and θ df is the filter orientation. This figure is adopted from Ref. 19.
where D PPB ðP x ; P x 0 Þ denotes the patch-based dissimilarity measure of two patches P x and P x 0 , with x and x 0 as the centers, respectively, and h > 0 is the kernel parameter. 19 Considering the design of the DG filter and the WMLE-based estimation method, if x and y are the two adjacent pixels to the center pixel z, the corresponding indicator of the edge information (i.e., the edge strength at the pixel z) at the current orientation of the filter θ df is calculated with the use of the Bhattacharyya distance, [22][23][24] and the edge strength at the pixel z is the maximum value among all the orientations, as shown in Eqs. (3) and (4): ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 6 1 6 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 5 7 7 The relative parameters are set as suggested in Ref. 19. The orientations of the filter are f0; π 4 ; π 2 ; 3π 4 g. The detailed information about the DG-WMLE edge extractor can be found in Ref. 19.

Edge-Dominated Local Clustering
The SLIC 5 is an effective and efficient method for superpixel generation. The basic idea of the SLIC is a local k-means clustering method, including three steps: (1) initialization of cluster centers by a regular grid (RG); (2) iterative local clustering based on a distance measure between a pixel and a cluster center; and (3) postprocessing to remove isolated pixels and enforce the connectivity of superpixels.
In general, the performance of the SLIC is greatly affected by the capability of the distance measure. In the original SLIC, this measure is defined as the 5-D Euclidean distance combining the color similarity and the spatial proximity. 5 Since this distance cannot be directly applied for SAR images with multiplicative speckle noise, several studies in recent years have deduced suitable measures and introduced them into the SLIC, as discussed in Sec. 1. Motivated by the work of Leung and Malik 17 and Liu et al., 18 in this paper, we directly use the aforementioned DG-WMLE edge information to measure the pairwise dissimilarity of two arbitrary pixels. As shown in Fig. 2, the edge-based pairwise dissimilarity is perceptually meaningful, easy to understand, and can ensure a good boundary adherence of superpixels. The dissimilarity of two pixels x and y is defined as follows: where E Ã ðzÞ denotes the edge strength at the pixel z and l is the line connecting x and y. Similar to Ref. 5, the distance measure for edge-dominated local clustering (EDLC) is defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 6 5 1 where the subscript ED stands for edge-dominated, d Sp is the spatial distance of the pairwise pixels, and S is the grid interval. m is a relative weight introduced to control the relative importance of the edge information against the spatial distance. As mentioned in Sec. 1, the value of m should be carefully determined to offer a balance between boundary adherence, compactness, and regularity of superpixels. A smaller m will emphasize d Edge more and makes the generated superpixels adhere better to the real boundaries. However, a larger m will emphasize d Sp and makes the superpixels more compact and regular. As shown in Figs. 3(b) and 3(c), an inappropriate choice of m leads to an unsatisfactory segmentation result. More specifically, a large value of m around the edges will have a fatal impact on the performance of segmentation.
Motivated by the idea of quadtree mesh generation, 25 we provide an initialization strategy with an EG instead of the RG to overcome this limitation. First, a RG is generated on the image according to the expected number of superpixels. Next, an automatic thresholding 26 is applied on the extraction result of the DG-WMLE to get an edge map. Then, if the number of edge points in any block of the RG exceeds a preset threshold, the block is recursively subdivided into four smaller equal-sized parts. In this way, a multilayer grid adaptive to the edge information is generated, as displayed in Fig. 3(d). In Fig. 3, under the same value of m and a similar amount of initial clusters, it is shown that EG-based initialization has a larger grid interval S than RG. In addition, more initial centers are generated close to the real edges, which makes the spatial distance d Sp between pixels and cluster centers around the edges decrease a lot. In both cases, according to Eq. (6), the importance of spatial proximity will be weakened, i.e., the importance of edge information will be emphasized. Thus, the boundary adherence of superpixels around the real edges can be improved significantly, as shown in Figs. 3(e) and 3(f).
In summary, the procedure of EDLC for superpixel generation is presented as follows: (1) Parameter setting-Set the number of blocks N b in the top layer of EG, the number of layers N L for EG, the relative weight m, and the maximum number of iterations N itr . (2) Initialization of cluster centers-Generate an EG based on the edge map obtained from the DG-WMLE, and set the center of each block as an initial cluster center. To avoid getting centers on pixels with strong edge strengths, move every center to the position with the lowest edge strength in its 3 × 3 neighborhood. (3) Local iterative clustering-For a cluster center C, compute the edge-dominated distance d ED between C and each pixel p in the region of 2S × 2S around C, according to Eq. (6).
Here, S is the grid interval of the top layer in EG. Then, assign p to the cluster with the minimum d ED , and save the cluster label for p. After all the cluster centers are processed, update the locations of centers and calculate the residual error E r (L 1 distance between previous centers and recomputed centers). Repeat the assignment and updating until the error E r converges or the number of iterations reaches N itr . In our experiments, 20 iterations are found to be enough, and this number is used as the stopping criterion in all the tests. (4) Postprocessing-Due to lack of connectivity enforcement, there may be some broken superpixels produced in the final clustering results. To correct for this, find the regions with the size smaller than 10 pixels, and reassign each pixel of these regions into a large neighboring superpixel with the minimum likelihood-based distance. After this processing, small isolated regions are carefully removed and the boundaries of most of the other superpixels remain the same.
An intuitive flowchart is shown in Fig. 4.

Datasets
In this section, we generate a simulated four-look SAR image based on the Monte Carlo procedure 27 to objectively evaluate the performance of the proposed method. The size of the image is 300 × 300. The image contains five different regions, and the intensity of each region follows the gamma distribution, as shown in Fig. 5(a). The actual intensity values without the interference of noise in the five regions are set to 100, 400, 1600, 3600, and 8100, respectively. The corresponding ground truth of edges is given in Fig. 5(b). In addition, two TerraSAR-X StripMap images are used in our experiments, as shown in Figs. 6(a) and 6(c). The first one is extracted from Dessau, Germany, covering several crop areas. The second is from South Mississippi, USA, covering both water and vegetation areas. The size of each is 300 × 300. The pixel spacings are 3 m in both directions, and the number of looks is ∼6. The ground truth of edges from manual delineation is shown in Figs. 6(b) and 6(d). Fig. 4 The flowchart of EDLC.

Performance Evaluation
To evaluate the performance of the proposed method quantitatively, two commonly used metrics 28 are applied in this section: boundary recall (BR) and under-segmentation error (USE). BR is defined as the fraction of the ground truth edges correctly recovered with the superpixel boundaries. In practice, BR measures the percentage of ground truth edges that fall within superpixel boundaries with a tolerance distance ε ¼ 1. USE compares superpixel segment areas to measure to what extent superpixels cover the ground truth segment border. If G i is a ground truth segment, S k is a superpixel, and j · j indicates the size of the segment in pixels, USE is computed by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 1 3 3 Next, we compare the superpixel generation results of the EDLC with that of the other three methods, i.e., three different measures to represent the dissimilarity between a pixel and a cluster:  (1) the original SLIC using the grayscale-based dissimilarity: 5 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 7 2 3 where A denotes the amplitude for SAR images. (2) the likelihood-based SLIC (LB-SLIC) 11 with a likelihood value based dissimilarity: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 6 6 0 where z j denotes the intensity of a given pixel, pðzjiÞ is the conditional PDF of the i 0 th cluster C i , which can be defined by the gamma distribution: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 6 0 4 where L is the number of looks and μ is the noise-free intensity value. For the i 0 th cluster C i , the MLE of μ is as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 5 3 5μ (3) The modified SLIC using a patch-based dissimilarity (PB-SLIC): 12 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 1 1 6 ; 4 7 5 where P i and P j are two image patches with the center pixel i and j,Ī P i denotes the average intensity in the patch P i , and M is the number of pixels in P i or P j . According to Ref. 12, a 5 × 5 patch is found to be appropriate and is used in the following tests.
To make a fair comparison, we first replace the d Edge in Eq. (6) with the above three dissimilarities. Then, we perform the same local clustering and postprocessing procedures to get the final results. To obtain superpixels with a good balance between boundary adherence and regularity, the values of the weight m are all set carefully for all the three methods according to Refs. 5, 11, and 12. The number of layers N L for EDLC is set as 3. The number of blocks N b in the top layer of EG in EDLC is also set suitably to get a number of the generated superpixels similar to that of the other three methods. The maximum number of iterations N itr is set as 20.
For the simulated data, the segmentation results of EDLC, LB-SLIC, PB-SLIC, and SLIC are shown in Fig. 7 from left to right. The expected number of superpixels N sp in LB-SLIC, PB-SLIC, and SLIC is set as {100, 200, 300, 400, 500}, increasing from top to bottom. Additionally, in the same lines of the figures, the number of generated superpixels in EDLC is close to the other three methods. To provide superpixels with a better boundary adherence, m is set as {0.5, 0.6, 1.0, 0.5} for the four methods, respectively. The numerical evaluation for the superpixels provided by these methods is shown in Fig. 8, using the aforementioned metrics BR and USE.
From Figs. 7 and 8, we notice that (1) The original SLIC has the worst performance among these four methods. SLIC has a good boundary adherence only at the borders between two regions with a low degree of similarity, such as regions 1 and 4 and regions 1 and 5. Some irregular superpixels are produced, and their boundaries poorly adhere to the real edges. The results show that the grayscale distance is not quite applicable for the superpixel generation on the SAR images with speckle noise. (2) Although the BR values of LB-SLIC are close to EDLC, the regularity of superpixels in LB-SLIC is much worse, especially in regions 3 and 4 of the image. The reason for this is that the local clustering in LB-SLIC produces too much broken regions and orphaned pixels, so after merging in postprocessing, the nearby superpixels will probably turn into  irregular regions. Further, as shown in Fig. 7(b), some real boundaries (between the regions 4 and 5 and the regions 3 and 4) are still not covered by the borders of superpixels in spite of the increase of N sp . This indicates a limitation in the performance of LB-SLIC, which is discussed in Sec. 1. (3) The performance of PB-SLIC is worse than EDLC and LB-SLIC. According to Eq. (12), the average intensity of patches is used to calculate the dissimilarity of two central pixels. Thus, the pixels near the edges of two regions, which have a low degree of similarity, will indicate a high degree of similarity. As shown in Fig. 7(c), near the border between the regions 1 and 4 and the regions 1 and 5, some superpixels overlap with different regions at the same time. This overlapping clearly makes a poor adherence to the real boundaries and degrades the performance of PB-SLIC. (4) The proposed EDLC method yields a noticeable improvement on the performance. The superpixels provided by EDLC obtain a higher value of BR and a lower USE than the other three methods. Although the value of m is increased from 0.5 to 1.0 and the number of generated superpixels rises from 100 to 500, both of them have a smaller impact on the performance of EDLC. As shown in Fig. 7(a), the compactness and regularity of superpixels is also ensured.
For the two real images, the segmentation results of the four methods are shown in Figs. 9 and 10 from left to right. The expected number of superpixels N sp in LB-SLIC, PB-SLIC, and SLIC is set as {200, 300, 400, 500, 600}, increasing from top to bottom. And the number of generated superpixels in EDLC is close to the other three methods in the same lines. m is set as {0.5, 0.6, 1.0, 0.3} for the four methods, respectively. The numerical evaluation for the superpixels provided by these methods is shown in Figs. 11 and 12. From these figures, the proposed EDLC still provides better results than the other three methods, considering both BR and USE. Although with a low value of m, LB-SLIC or PB-SLIC can obtain a good boundary adherence, which is close to or even a little bit better than EDLC, their performance of USE is worse. There are also many irregular superpixels generated both near the real boundaries and inside the homogenous areas. Furthermore, a lot of broken regions are produced during the local clustering, so the number of superpixels in the final results is much more than the preset value of N sp . In general, the visual presentation of LB-SLIC and PB-SLIC is poorer because of these negative attributes.

Parameter Analysis
According to Sec. 2.2, two parameters need to be determined before EDLC: the weight m and the number of layers N L . As shown in Fig. 3, both m and N L have a great influence on the superpixel   Here, "EG3" denotes N L ¼ 3, "EG2" denotes N L ¼ 2. When N L ¼ 1, EG equals to RG. In the legends, the numbers in brackets are the values of m. segmentation results. To evaluate the impact of the two parameters, we set N L ¼ f1;2; 3g and m ¼ f0.5; 0.8; 1.0g; then, we applied the EDLC to the simulated image. The performance on the condition of different parameters is shown in Fig. 13. From the figures, it is noticed that, with the increase of layers, the boundary adherence of EDLC is improved remarkably. In addition, the BR and USE curves under different values of m become much closer to each other. This represents that, by the initialization of EG, the performance of EDLC is less sensitive to the change of m than using RG. Thus, we used N L ¼ 3 in all the experiments, and set m in the range ½0.5; 1.0 for EDLC. If the proposed method is applied for a larger dataset, more layers are recommended. However, the size of blocks in the bottom of EG is not suggested to be smaller than 5 × 5.

Conclusions
In this paper, we propose an edge-dominated local clustering method to generate superpixels for SAR images. Edge information is introduced not only to define the dissimilarity of a pixel and a cluster but also to produce an adaptive grid for the initializations of cluster centers. Experiments on the simulated and real SAR images show that the proposed method provides an improved performance of boundary adherence and visual presentation, compared with the other methods using statistical model-based dissimilarities. In the future, we will extend the edge-dominated dissimilarity into multitemporal data and provide a segmentation result suitable for all the temporals. In this case, superpixels will become a basic element for multitemporal analysis.