k-Means image segmentation using Mumford–Shah model

Abstract. k-Means (KM) is well known for its ease of implementation as a clustering technique. It has been applied for color quantization in RGB, YUV, hyperspectral image, Lab, and other spaces, but this leads to fragmented segments as the pixels are clustered only in the color space without considering connectivity. The problem has been attacked by adding connectivity constraints, or using joint color and spatial features (r, g, b, x, y), which prevent fragmented and nonconvex segments. However, it does not take into account the complexity of the shape itself. The Mumford–Shah model has been earlier used to overcome this problem but with slow and complex mathematical optimization algorithms. We integrate Mumford–Shah model directly into KM context and construct a fast and simple implementation of the algorithm. The proposed approach uses standard KM algorithm with distance function derived from Mumford–Shah model so that it optimizes both the content and the shape of the segments jointly. We demonstrate by experiments that the proposed algorithm provides better results than comparative methods when compared using various error evaluation criteria. The algorithm is applied on 100 images in the Weizmann dataset and two remote sensing images.


Introduction
Image segmentation is an essential preprocessing task in many computer vision problems. It handles the problem of segregating regions of interest of an image that possess homogeneity and are spatially connected. These segregated regions may represent objects and thus help to recognize parts of satellite images, or patterns in biometric images, or detect diseased areas in MRI/x-ray images. Segmentation is also extensively used in automation and other artificial intelligence applications. Hence, profound research is still carried out and numerous image segmentation techniques are available in the literature, but a universal method applied to any type of image related to any image processing problem is not known.
In terms of approximation theory, we can represent it as: find an approximation f of g in such a way that each f i represents the region Ω i via a piece-wise smooth function.
Thresholding, clustering, and classification methods, 1-3 region-based methods, 4 and edgebased active contour methods [5][6][7] are different approaches to image segmentation. 8 The segmentation problem has also been considered as an energy minimization problem. But to date, obtaining a generalized solution that works without a priori facts about the object elements, its characteristics, such as shape, color, texture, appearance of shadows, and overlapping of objects, is an open problem.

Classical k-Means Clustering for Image Segmentation
Clustering is defined as an unsupervised learning task, where one needs to identify a finite set of unknown categories called clusters. 9 Clustering can also be applied to image segmentation via grouping the pixels by minimizing intrasegment similarity and by maximizing intersegment similarity.
One of the categories of clustering is hard clustering, where a pixel belongs to one and only one cluster. It is applicable to data sets that have sharp boundaries between clusters. 10 This is suitable also for image segmentation where the goal is to group the pixels into nonoverlapping regions.
The most popular algorithm in hard clustering is classical k-means (KM). 9 It is simple to implement, and its computational cost is low, which makes it suitable for large data sets such as images. Panwar et al. 11 observed that performance of KM segmentation improves with increase in the number of clusters when mean square error and peak signal to noise ratio are used to measure the quality of the reconstruction.
Applying KM based on the pixel values, it corresponds to color quantization. However, this is rarely enough for image segmentation because there is no control of the shapes and convexity of the segments, which can therefore become very fragmented.
KM algorithm has also other drawbacks -such as the number of clusters k must be known a priori. 12 The algorithm is also sensitive to the initialization, which can lead to an inferior clustering result. However, there are plenty of solutions in literature to attack these problems. First, if the number of clusters needs to be solved, we need to revise the original minimization function so that the number of clusters is no longer a parameter but a variable that is also solved. 13 The quality of clustering can be significantly improved using better initialization and by repeating the algorithm 100 times. 14 If not enough, a simple modification is to use random swap algorithm instead, 15 which practically never gets stuck into an inferior local minimum.
To use KM for image segmentation, the procedure is as follows: First, randomly select k pixels from the image as the initial cluster centroids. Second, segment the image by labeling every pixel by finding the minimum-squared Euclidean distance to every centroid. Third, calculate new centroids by averaging the pixels belonging to the same clusters. The steps two and three are repeated until the centroids stabilize. This procedure aims at optimizing sum-of-squared error (SSE) between the pixels and their nearest centroid E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 1 . 2 ; 1 1 6 ; 1 8 8 However, using KM as such leads to color quantization where every pixel is mapped to the nearest color regardless of its location in the image and the mapping of its neighbors. In image segmentation, it is desired that the pixels close to each other in the image would be assigned into the same segment; and vice versa, pixels far away would be assigned to different segments.
The algorithm is shown as follows. It will find local minimum for the SSE function.
Many solutions also implement modifications to the KM algorithm. Theiler and Gisler 16 proposed a variant called contig-KM. It calculates within-cluster variance (V) to measure compactness of the cluster, and discontinuity (D) at a single pixel as the fraction of its neighbors that are in a different cluster. Contig-KM minimizes a linear combination of compactness and discontinuity as λD þ ð1 − λÞV where the parameter λ defines the relative importance of these two properties. They state that spatial contiguity and spectral compactness properties are nearly orthogonal, i.e., meaning that we can make considerable improvements in the one with only minimal loss in the other. Unlike conventional KM, Contig-KM updates the centroids immediately after a pixel changes its segment. Its time complexity is the same as that of KM: OðNkdIÞ, where d is the number of attributes (d ¼ 3 with RGB images, d ¼ 103 to 204 with the remote sensing images), and I is the number of iterations.
Fuzzy c-means (FCM) has also been applied to hyperspectral image (HSI) segmentation using normalized spatial distance in addition to the normalized color distance. 17 The difference of FCM to KM is that the pixels can belong to multiple segments with a membership value in range [0,1]. It is expected to provide smoother clustering result at the cost of higher time complexity of OðNkd 2 IÞ. However, the results reported in Ref. 17 show that the result of standard FCM (0.69) is worse than that of KM (0.72) and an additional processing step called Albedo recovery was applied before clustering to reach better accuracy (0.81).
Lindsten et al. 18 proposed relaxation of KM method formulated as a convex optimization problem which does not require the number of clusters to be specified beforehand. Instead, a regularization parameter is used to control the trade-off between model fit and the number of clusters. They show that due to relaxation and convexification, the original NP-hard KM problem becomes efficiently solvable but this means time complexity of Oðn 3 Þ using semidefinite programming, which is quite high compared with KM.
A two-step KM-based method was developed by Mignotte. 19 The steps include detexturing the input image and then using spatially constrained KM segmentation. The spatial constraint is a set of color values around the pixel. The time complexity is the same as that of KM added by the pi ← arg min 1≤j≤k ðSSE j Þ //Calculate optimal centroids FOR all j ∈ ½1; k DO detexturing process and the analysis of the spatial constraint. The method was reported to perform competitively with the state-of-the-art methods at the time.
A lot of research has been done so far; however, there are many issues that still need to be addressed. Connectivity constraint can still generate complex shape. Spatial constraint tends to keep the segments small but it can prevent to find segments of variable sizes. Many solutions implement modifications to the standard KM algorithm, which itself should not be necessary if a good algorithm and proper minimization function is used. In this paper, we attack the problem by employing minimization function, which strictly controls the complexity of the shape via the Mumford-Shah functional.

Mumford-Shah Model
Mumford-Shah model is widely used and extensively studied energy minimization model for image segmentation. It was introduced by Mumford and Shah in 1989 20 with the aim at segmenting the image into nearly homogeneous regions, separated by consistent boundaries. It consists of finding the minimum of a functional, which is formed by combining the length of the boundaries, the gradient of a smoothed version of the image and the difference of the true and smoothed images. 21 The Mumford-Shah functional is defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 5 1 9 (1) where function f approximates image g which is differentiable over the regions Ω i E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 1 . 3 ; 1 1 6 ; 4 6 3 Here, τ is the set of arcs forming the boundaries, μ and λ are the positive weights that control the quality of the approximation and the coarseness of the segmentation, respectively, and LðτÞ denotes the total length of all the arcs belonging to τ. A large value of λ will lead to fewer boundaries. The goal is to minimize E, i.e., find f and τ so that E is minimized; the smaller the value of E, the better is the segmentation. The first term in Eq. (1) ensures that f approximates the image g, i.e., gives us homogeneous regions. The second term ensures that only smooth changes appear inside the segments. The third term prevents boundaries from growing very large.
If the segments are homogeneous, the second term is no longer needed in Eq. (1). We then obtain a simplified, piecewise constant Mumford-Shah model E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 3 0 3 There has been extensive use of these models in the areas of image denoising, image segmentation, and image restoration. 22 Algorithmic techniques to minimize the Mumford-Shah functional includes the use of Douglas-Rachford algorithm (MS-DR) in Ref. 23, and several implicit method used by researchers in Refs. 24-29. However, the existing approaches are rather slow, the algorithms can be complex, and they may result in poor segmentation result if the parameters are not tuned properly.
In this paper, we show how simple KM algorithm can be used to minimize Mumford-Shah functional. It has all the same benefits as other Mumford-Shah-based methods but with faster and simpler optimization algorithm. Besides the simplicity and speed, the other benefits are that it avoids complex shapes, does not produce numerous tiny segments, and no algorithmic patches are needed as the standard KM algorithm itself can be applied by simple modifications of the partition step adopted with the shape constraint.
The method can be applied to any system where low level color-based segmentation is needed as the first step of the process. These include scene classification, detection of human activity in visual surveillance, assigning a name to a human face in image, classifying handwritten characters, and identifying land cover in satellite images. In this paper, we demonstrate how the method applies to remote sensing images.
The rest of the paper is organized as follows. Section 2 introduces the proposed KM segmentation using Mumford-Shah model. Section 3 explains the evaluation criteria. In Sec. 4, we compare segmented images achieved through the proposed KM using Mumford-Shah model with the ground truth images using criteria discussed in Sec. 3. In Sec. 5, we use the compactness measure to evaluate multiphase segmentation of three methods: proposed KM using Mumford-Shah model, KM, and regularized KM (reg-KM). Section 6 shows the application of the proposed approach to HSIs. Conclusions are drawn in Sec. 7.

k-Means Using Mumford-Shah Model
The goal is to divide the image into m segments by optimizing certain criteria. Image segmentation consists of two contradicting criteria: (1) similarity criterion such as variance, which measures goodness of the segment and (2) spatial connectivity criterion that penalizes pixels belonging to different segment than their neighbors. KM includes only the similarity criterion, the intracluster variance.
A reg-KM using Mumford-Shah was proposed by Ref. 30 combining the intensity and the spatial information together. The model calculates reg-KM energy using two terms. First term introduces the notion of size of the cluster in the regularization term using the scale. Scale is obtained as PðAÞ∕jAj where PðAÞ denotes the perimeter and jAj denotes the area of the cluster. Second term measures the intracluster dissimilarity of the clusters in the same way as in the KM. Their results show better classification of objects when using the combined approach compared with using only the intensity values. However, the regularization term is optimized when, for a given perimeter, the area is maximized. This happens for round shapes whereas we should not direct the optimization toward any given shape. The time complexity of reg-KM is OðNk 2 dÞ.
We next present a new variant called Mumford-Shah KM (MS-KM). For spatial criterion, we introduce a new term adopted from the simplified Mumford-Shah functional in Eq. (2). We use the total boundary length of the segments. It drives the segmentation toward simplicity but not directly toward any certain shape. We next outline how this can be done within the KM.
When generating the optimal partition in KM, the algorithm decides into which segment a pixel will be assigned. Using Mumford-Shah function, the optimal assignment depends not only on the pixel value as in standard KM, but also on its effect on the boundary length. Fortunately, this can be calculated from image gradients easily. The result is a label for every pixel in the image to indicate which segment it is assigned to.
The segmentation process goes as follows: First, we select m random centroids and initialize the gradient at each pixel of image as 0. We obtain the label image P [see Fig. 1(a)] using Mumford-Shah distance (MSD) defined in our algorithm (see Sec. 2.2). At the first iteration the boundary length of each pixel is set to 0. Next, we calculate new centroids of each segment by averaging the pixels belonging to the same segment. The assignment and centroid steps are then repeated until maximum iterations have been reached, or until the Mumford-Shah functional stop improving (ΔE ≤ 0). The Mumford-Shah functional E is calculated at the end of each iteration as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 ; 1 1 6 ; where sse i is the total squared error between every pixel to its cluster centroid, and BL is the total boundary length of all the segments.
This functional is an extension of the standard SSE criterion. Similar λ-based approach is widely used in other multiobjective optimization problems. It jointly optimizes both the content and the border of the segments, and therefore, it is expected to provide more robust segmentation than simpler heuristics such as boundary smoothing of the connected components. For more details about the theoretical properties of Mumford-Shah functional, we refer to Refs. 21 and 23, and for the properties of KM algorithm, we refer to Ref. 14.

Calculating Gradient and Boundary Length of the Segment
Gradient of the image is directional change in intensity or color in the image. Mathematically, it can be defined as the derivative between the pixel value and its surrounding. In image processing, it can be obtained by calculating the change in the horizontal and in the vertical direction separately. Here, instead of intensity or color, we calculate gradient from the labeled image [see Fig. 1(a)]. We use zero padding on the boundaries. Also, the gradient is calculated in backward direction. Hence, the right and bottom borders of the image are not marked as boundary. Gradient of the pixel at point ðx; yÞ is calculated as In Fig. 1, if we trace the boundary marked with red, we will find borders of length 21 pixels, in total. The same result can be obtained by summing up the gradient values using Eq. (4).

Distance Calculation in MS-KM
From the second iteration onward, when deciding the cluster label to be assigned to a pixel, we need to find out its effect on the boundary length (here gradient) when assigned to a particular E iter ← P 1≤i≤N fSSE j g þ lambda Ã blen //Compute functional E // Calculate optimal centroids FOR all j ∈ ½1; k DO c j ← Average of x i , whose p i ¼ j; iter ← iter þ 1; Shah, Patel, and Fränti: k-Means image segmentation using Mumford-Shah model cluster. The boundary length of the pixel will be obtained as gradient using Eq. (4). It is further illustrated in Fig. 2.
The decision about which segment the pixel is assigned to, is calculated by MSD, which is calculated as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 2 ; 1 1 6 ; 6 8 7 MumfordShahDistðx; y; jÞ ← ðgðx; yÞ − c j Þ 2 þ λÃgp j ðx; yÞ; where λ is the weighting factor controlling the coarseness of the segmentation. In other words, the distance is the squared Euclidean distance between the pixel and the centroid of the clusters plus its new gradient value within that segment (weighted by λ). Thus, the distance incorporates two different criteria-similarity and spatial convexity. The algorithm for MS-KM is given in Algorithm 2. Its main structure is the same as KM containing both the optimal partitioning and calculating the new centroids.
Even though the λ·gp part is fixed during a single iteration, any change of the partition stage will influence changes in the gp part of the equation. This will eventually create different segmentation than KM. We further illustrate this with an example image of size 10 × 10 in Fig. 3. Figure 4 shows the effect of the MSD on the small red object of the sample image.

Experimental Evaluations
To evaluate the performance of our algorithm in case of two-phase segmentation, we use regionbased error metrics defined in Ref. 31 and structural similarity index. 32 For multiphase segmentation, we have used compactness measure. 33

Bidirectional Consistency Error
Considering S 1 as segmentation obtained by an algorithm, and S 2 as one of the ground truth, we calculate E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 6 2 0 a value in the range ½0.. × 1, where zero represents no error. Here, RðS; p i Þ is the set of pixels corresponding to the region in segmentation S that contains pixel p i . Here, local error measure is not symmetric, and it gives refinement measure in one direction only. It is to be noted that EðS 1 ; S 2 ; p i Þ ¼ 0 if S 1 is a refinement of S 2 at pixel p i , but not the opposite. Using Eq. (5), the following four evaluation measures can be calculated: Equations (6)-(8) represent the global consistency error (GCE), local consistency error (LCE), and bidirectional consistency error (BCE), respectively. Equation (9) is the measure ðBCE Ã Þ obtained by computing the minimum error at every pixel, which is consistent with each ground truth segmentation S k of that image. We will use BCE*.

Structural Similarity Index
Structural similarity index (SSIM) 32 measures the visual quality between two images. It is calculated as a weighted combination of three factors luminance, contrast, and structural similarity. It can be expressed as where l is the luminance, c is the contrast, s is the structural similarity, and α, β, and γ are the positive constants. The three factors are calculated separately as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 3 . 2 ; 1 1 6 ; 1 7 9 E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 3 . 2 ; 1 1 6 ; 1 2 2 E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 3 . 2 ; 1 1 6 ; 8 6 where μ x and μ y are the local means, σ x and σ y are the standard deviations, and σ xy is the crosscovariance between the two images x and y, respectively. If α ¼ β ¼ γ ¼ 1, then SSIM simplifies to E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 3 . 2 ; 1 1 6 ; 6 9 9 We use SSIM to calculate the similarity between the image generated by the segmentation result, and the image generated from the ground truth segmentation.

Compactness Measure
Assessment of segmentation quality has been analyzed using combination of shape and compactness. 33 In Ref. 34, the authors used the idea of compactness in the area of measuring super pixel compactness. Several approaches have been proposed for measuring shape compactness of discrete regions. In Ref. 35, a review of several methods for measuring shape circularity and compactness was done. They also survey some of the methods when scaling transformation is applied to discrete regions. Compactness is defined as the ratio of the area of an object to the area of a circle with the same perimeter E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 3 . 3 ; 1 1 6 ; 4 9 5 Compactness ¼ Perimeter 2 4 · π · area : The measure takes a minimum value of 1 for a circle. Objects that have complicated, irregular boundaries have larger compactness.

Adjusted Rand Index
The Rand index (RI) computes a similarity measure between two segmentation results. All pairs of pixels are considered and pairs that are assigned in the same or different segments in the obtained segmentation and its ground truth are counted. The RI score is then "adjusted for chance" to get adjusted RI using the following formula: The adjusted RI is thus ensured to have a value close to 0.0 for random labeling independently of the number of segments and pixels and exactly 1.0 when the segmentations are identical.

Results for Two Phase Segmentations
We used nine random images from the Weizmann dataset 36 depicting one object in the foreground and their three ground truth segmentations. Figure 5 shows the original image and one of the three ground truths. The resulting segmentations obtained by KM, reg-KM, MS-DR, and the proposed MS-KM are shown in Fig. 6. The experiments were conducted on a desktop with Intel Core i5 processor with 2.50 GHz speed, 8 GB RAM, 64-bit Windows 10 operating system. The results of KM, MS-DR, and MS-KM depend on the choice of the initial centroids. We therefore executed KM 20 times, calculated the BCE* error for every result, and then selected the one with lowest BCE* value to obtain the best KM result. This was the used as the input to MS-DR and the proposed MS-KM. The number of clusters was selected to match the ground truth. The reg-KM method uses the change in the energy directly, that each pixel is moved to the phase which locally minimizes the energy compared with the previous phase. The parameter used for regularization gives an option of creating a new cluster. If the value of the parameter is large, it increases the number of clusters. Here in Fig. 6, we show the results for only two segments generated by reg-KM, by selecting the value of the parameter between 0.1 and 0.5. Visually, the results show that the proposed method gives improved results with inclusion of scattered minute segments into bigger segments. Also, the proposed method gives smoother boundaries, which can be seen in partial detailed images of Fig. 7.
Results of the BCE* error measure are given in Table 1. We can see that the proposed method reduces the error of KM and reg-KM in eight out of the nine cases. The higher values of SSIM in Table 2 indicate better quality segmentation using the proposed approach. Figure 8 shows the effect of different values of λ on the segmentation. The larger is the λ value, the smoother are the segments. Eventually, too many details will be lost.
We tested statistical significance of the results for the nine images in Table 2 by running KM and MS-KM 10 times with different randomly chosen centroids as the initial solution. We used one-tailed Wilcoxon signed-rank test for paired samples as normality of our distribution is not satisfied. The observed value test-statistic W ¼ 1 and the critical value at N ¼ 9 (p < 0.05) is of W ¼ 8, which shows that the difference between KM (0.69) and MS-KM (0.79) is statistically significant. However, the difference between MS-DR (0.78) and MS-KM (0.79) is not significant.
The processing times are compared in Table 3. The proposed method is almost as fast as KM (25% slower), and significantly than MS-DR (10 times faster) and reg-KM (600 times faster). All KM variants share the same time complexity of OðNkdIÞ except reg-KM, which uses slow OðNk 2 dÞ initialization. MS-DR also uses KM as the initialization but the following proximal splitting step by Douglas-Rachford has a higher number of iterations and more operations in each step, which together make the algorithm slower than the faster KM variants. Figure 9 shows the effect of λ on the results. As the images have nothing in common, we cannot predict much about the value of λ accept that the value 0.5 gives us the average result in all the cases. Figure 10 shows the compactness measure when KM and MS-KM methods are applied to 100 images of the Weizmann dataset. Resolution of the images in Weizmann dataset vary from 300 × 170 (lowest) to 300 × 400 pixels (highest). For the proposed MS-KM method, we have taken λ ¼ 0.5 which gives average results. The results can be even better if appropriate λ is taken for each image. These results are also statistically significant according to ANOVA test (p < 0.00001).

Results for Multiphase Segmentation
We have used three selected images: horse, cows, and flowers. Figure 11 shows the resulting segmentations obtained by KM, the proposed MS-KM, and reg-KM as per. 30 Visually, the results show that the proposed method gives less fragmented segments with smooth and regular boundaries. We can observe from Figs. 12-14 that the compactness value of all clusters by the case of proposed MS-KM method is significantly smaller than that of the other techniques. We also observe that the results of KM and reg-KM techniques are almost similar, except Fig. 13 where reg-KM works better.
We also note that all KM variants tend to produce fragmented results despite the spatial constraint. The reason is not the spatial constraint being too weak, but actually the opposite, being too strong. Figure 15 demonstrates the issue with MS-KM using larger 4-pixel neighborhood gradient instead of the proposed 2-pixel neighborhood. It is theoretically more solid but     will lead to fragmentation in practice. The reason is the KM initialization, which contains many isolated subsegments, and KM with too strong spatial constraint is unable to erode these away as well as the simpler 2-pixel gradient.

Application to HSIs
In this section, we apply the proposed MS-KM method to HSIs. Unlike RGB color image, which stores the intensity value of three bands only (red, green, and blue), HSIs can have intensity  Shah, Patel, and Fränti: k-Means image segmentation using Mumford-Shah model information of more than 100 spectral bands, which are in discrete intervals of 5 to 10 nm across the visible to the infrared region. The spectral resolution is very high due to narrow band gap, which leads to high dimensionality. However, the results in Ref. 15 have shown that the performance of KM depends mostly on the overlap of the clusters in the feature space and is less dependent of the dimensionality. This is also the case with the HSI images, which makes KM as a suitable choice here. We have used two images. First image is AVIRIS-Salinas HSI captured by 224-band AVIRIS sensor in 1998 over agricultural fields in the Salinas Valley, California. 37 The number of spectral bands used is 204 (discarding 20 water absorption bands) and the total number of pixels is 512 × 217. Its ground truth with 16 classes is shown in Fig. 16. There are also unclassified data in the ground truth, which are shown by white color.  The second image is Pavia University HSI acquired by the ROSIS sensor over Pavia, northern Italy. 37 The number of spectral bands is 103, and the number of pixels is 610 × 610, but some of the samples contain no information and have been discarded before the analysis. Hence, the actual resolution is 340 × 610 pixels. The ground truth has nine classes. In this image, there are also unclassified data in the ground truth, which are shown by white color in Fig. 16.
As a preprocessing step, we normalize bands using the following formula: E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 6 ; 1 1 6 ; 6 6 3 norm_val ¼ ðval − band_ minÞ∕ðband_ max −band_ minÞ: The number of clusters is set to 16 for the AVIRIS-Salinas image and to 9 for the Pavia University image according to the number of classes in their ground truth data. The value of λ is set to 0.5. The results of KM method and the proposed MS-KM are shown in Fig. 16. The implementation of the reg-KM did not support HSIs. As expected, the segments generated by the proposed method are much less fragmented than that of generated by KM. More detailed analysis will follow. Table 4 shows the BCE* error, adjusted Rand index (ARI), and SSIM for the KM and the proposed method. In case of Pavia University image, the BCE* values are 0.27 by KM and 0.18 by MS-KM. Smaller BCE* values indicate better accuracy. The ARI values are very close to 1.0 by both methods. While the proposed method improves based on BCE*, the segmentation is worse when measured by SSIM (0.94 versus 0.87). While MS-KM provides more convex and visually pleasant segments, the color difference of those fragmented pixels is so high that they cause measurable degradation in SSIM.
In case of AVIRIS-Salinas image, both BCE* values are close to 0.50 (0.47 and 0.46). While RI values are still high (0.92 and 0.93), the results indicate that, regardless of the algorithm, segmenting the image by spectral features alone is not sufficient even when using >100 bands. This is evidenced by the fact that even the less fragmented segmentation by the proposed method did not provide significant improvement according to any of the used measure.
The authors of Ref. 38 have concluded that their nearest-neighbor density-based (NN-DB) methods perform better than density-based spatial clustering of applications with noise (DBSCAN), fast sparse affinity propagation (FSAP), and fuzzy C-means (FCM) regardless the chosen number of clusters (obtained or initialized). For AVIRIS-Salinas image, the highest ARI for all the methods mentioned in Ref. 38 is ∼0.7. They reported that classes A and B cannot be clearly identified by any of the methods in their tests, and by any methods in papers published earlier. We have observed the same phenomenon with our tests.
The quality assessment of the segmentation results of HSIs requires precise and unbiased ground truth. 39 When analyzed, the spectral signatures of the ground truth (GT) classes reveal that several classes exhibit high heterogeneity. Especially for Pavia University HSI, it is observed by the authors that the GT classes are not perfectly homogeneous and do not correctly represent the diversity and variability of the land cover.

Conclusions
The approach is introduced by embedding the well-known Mumford-Shah model into KM. The key contribution is to combine KM with Mumford-Shah model. Its main benefits are: (1) better segmentation performance than other models used with KM and (2) significantly faster optimization than that of the existing optimization methods for Mumford-Shah model. The results of the proposed approach are better than KM and reg-KM techniques in terms of shorter boundary lengths, and 25% better SSIM-value than Reg-KM (0. 24  The proposed method has two limitations. First, algorithm has the weakness of KM of being sensitive to the initialization which can lead to suboptimal clustering results and also fragmented segmentation. These could be overcome by better initialization and repeats, 14 or by adding random swap step around the KM as in Ref. 40. The second limitation is that it was tested only with color features (RBG and HSI), which are not good enough in all applications. For example, the HSI examples showed that some segments cannot be modeled by the color spectrum. This could be overcome by applying detexturing method as a preprocessing 19 or extending the method to be used with texture or some application-specific features. These are points for future research.