Automated extraction of homogeneous regions by seeded region shrinkage

Abstract. Drawing a spectrally homogeneous region of interest in a remotely sensed image is a common task for an image analyst when performing, for instance, atmospheric correction or end-member selection. Manually selecting a homogeneous sample of pixels can be tedious and error prone due to the limits of human perception and data visualization. I present a region shrinkage method that automates the extraction of a spectrally homogeneous and spatially contiguous region from a user selected seed pixel. The proposed technique combines divisive clustering, connected component analysis, and image noise estimation to generate a series of candidate regions of increasingly smaller size until they converge to the seed pixel through similarity space. From these candidate regions, an optimal one is identified that is spectrally homogeneous, spatially contiguous, and as large as possible. Experimental results demonstrate that the proposed method achieved detection rates of up to 95%, false alarm rates below 1%, and was robust to the main user input, the seed pixel location.


Introduction
Seed region growing (SRG) techniques are simple, fast, and effective image segmentation algorithms. 1,2 These methods require the selection of seed pixels that satisfy some user criterion then grow the seeds into segments by absorbing adjacent pixels until a statistical, visual, or physical criterion is met. The goal of these algorithms frequently is segmenting the entire image 3 or extracting a domain specific region of interest (ROI) such as vegetation, 4 cancer masses, 5 or road networks. 6 Another popular technique, superpixels, 7 over-segments an image into patches of similar pixels that help define visually meaningful boundaries. This method requires a priori knowledge, a database of exemplars, and a classifier trained using features based on the classical gestalt theory of grouping. However, these approaches introduce additional requirements not needed when the goal is a simpler problem, extracting spectrally homogeneous ROIs which need not correspond to an object's visual boundaries.
Drawing accurate and homogeneous ROIs is a common yet often difficult task even for trained image analysts when exploiting remote sensing imagery. 8 For instance, drawing spectrally homogeneous ROI samples of several materials can be a first step in applying an in-scene atmospheric compensation algorithm such as the empirical line method. 9 However, several wellknown problems arise when visually exploiting imagery such as (1) contrast stretch selection, (2) the spatial scale of image statistics used to compute contrast histograms, and (3) the selection of which spectral bands to display. 10 Therefore, image exploitation algorithms should automate any steps that rely on human perception wherever possible. 11 There are surprisingly few tools to deal with this common problem of extracting a spectrally homogeneous, spatially contiguous region. For example, a standard image analysis software package like ENVI, developed by L3Harris, has a seed growing tool. However, it requires (1) drawing a seed polygon or multiple points, (2) providing a threshold that depends on target statistics knowledge, and (3) using only a single band. Those inputs are susceptible to the human perception issues addressed or require a priori knowledge.
To overcome these common limitations, this work proposes a method that requires two inputs easier to provide: (1) a single seed point and (2) a target agnostic noise level threshold that is shown to be robust over a reasonably wide range of values. The method combines divisive clustering, connected component analysis, and image noise estimation to gauge the homogeneity of a series of shrinking candidate ROIs and then from these selects an optimal ROI. Region shrinkage itself is not new and has been previously proposed to solve problems like image segmentation. 12 However in the proposed approach, the region shrinkage is guided by the seed pixel and therefore converges to the seed through similarity space. Finally, since the goal is neither image nor object segmentation, the resulting ROI need not comprise the entire underlying physical object. Figure 1 provides an overview of the problem. The user wants an ROI of a target (e.g., the reflectance panel). They place a seed at its center. The scene is clustered into a series of shrinking ROIs using the proposed method. Each ROI (e.g., #2) is contained by the previous ROI (e.g., #1). The spectral variance of the initial ROIs (#1 to #4) is due to the target and the background. The next ROIs (#5 to #7) contain only target pixels. The final ROI (#8) is just the seed pixel which has a zero variance. Generally, the shrinking ROIs have a decreasing variance but not always. ROI #4 has a spike because it contains the target and a neighboring panel but is no longer dominated by the background statistics. However, the target is not always as simple as a calibration panel for which ground truth is likely available. Therefore, the goal is to build a classifier that automates finding a threshold to discriminate between the background and target regions without requiring a priori knowledge of either. This paper is organized as follows. In Sec. 2, the proposed method is described. Section 3 describes the experimental setup used to validate the method. Section 4 presents the experimental results of applying the proposed method to multiband imagery. Section 5 provides the conclusions drawn from this work and recommendations for future research.

Proposed Method
This work proposes to reverse the SRG concept to extract a homogeneous ROI. The intuition behind the approach is that since SRG methods start from a seed and grow outward, they require assumptions of what constitutes closeness between pixels to decide which to absorb. However, by starting with the entire image and proceeding inward, closeness can be learned by how the pixels group together. There are three steps to the proposed method outlined below and detailed in the following sections.
1. Candidate ROI generation. A series of shrinking candidate ROIs are generated using divisive clustering and connected component analysis guided by the user-selected seed. 2. Noise level estimation. A patch-based approach estimates a signal-dependent noise model which is used to assess the spectral homogeneity of each candidate ROI. 3. Optimal ROI selection. The candidate ROI that is spectrally homogeneous in all spectral bands, spatially contiguous, and as large as possible is labeled the optimal ROI.

Candidate Region of Interest Generation
Unsupervised divisive clustering is used to create a series of shrinking candidate ROIs. A Gaussian mixture model 13 (GMM) solved using expectation maximization 14 (EM) was selected due to its ability to allow fuzzy memberships instead of hard assignments (e.g., k-means) and not require performing full image segmentation or object extraction 15 (e.g., semantic labeling). GMM-EM uses all bands simultaneously during clustering. The candidate ROIs are generated as follows. A seed is selected by the user that is contained within the desired target. This seed remains fixed throughout the generation process. The first candidate ROI is the entire image. Then this ROI is divided into two clusters using the GMM-EM algorithm. The connected component within these two clusters that contains the seed pixel is the next candidate ROI. A connected component is defined as a set of pixels that have the same cluster labeling and each pixel is spatially adjacent to some other pixel with the same labeling. This next candidate is further subdivided into two clusters and the connected component with the seed is the next candidate. This process of bilevel clustering and connected component extraction is continued until the last candidate ROI is the seed pixel itself. Bilevel clustering is used because the goal of the proposed method is to identify increasingly homogeneous clusters that contain the seed and not to segment the entire scene which would likely require more than two clusters.
This clustering process yields a series of shrinking ROIs where each ROI is contained by the previous ROI. Neither the largest candidate (which is the entire image and likely spectrally heterogeneous) nor the smallest candidate (which is the seed pixel and too small to provide meaningful statistics) are expected to be of interest to the user. Therefore, one of the other intermediate ROIs is expected to be more useful for further exploitation. The next section describes a noise estimation approach that computes the signal-dependent noise level in an image which is used to identify spectrally homogeneous ROIs.

Noise Level Estimation
Among the simplest metrics that characterize the variability of the pixels in an ROI is the variance of their intensity values. The variance of a spectrally homogeneous ROI should only be due to the noise of the image acquisition process 16 and not the variability in a natural image's scene content. 17 There are different types of noise present in digital imagery including Gaussian noise, saltand-pepper noise, shot noise, quantization noise, and speckle noise. 18 A common assumption in remote sensing is that the image corrupting noise is additive white Gaussian. However, digital remote sensors with increasingly smaller pixel resolution are becoming more sensitive to photon counting which yields signal-dependent noise. Therefore, this work assumes a Poisson-Gaussian model, 19 which yields a signal-dependent variance given by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 1 9 8 (1) where the first term is a signal-dependent component, a depends on the sensor quantum efficiency and other camera-specific settings, μ is the expected sensor output pixel intensity, and b is the signal-independent component due to thermal and electronic noise. 20 These model parameters can be approximated using in-scene noise estimation techniques. A variety of noise estimation algorithms have been proposed including principal components analysis, 21 kurtosis estimation, 22 bit planes, 23 and patch-based approaches. 24 Recent advances in image denoising also include deep learning approaches with convolutional neural networks. 25 However, our goal is to develop a tool that is fast and efficient, does not require significant expertise or training databases, and can be easily used by an image analyst. Therefore, machine learning methods were not considered suitable for our goal. To account for both scene content variability and signal-dependent noise, the patch-based approach presented in Ref. 24 was selected. This approach estimates the local noise using a Laplacian-based filter to compute the noise standard deviation and is summarized as follows, for further details see Ref. 24.
The following noise estimation is performed for each image band separately. The image is partitioned into M nonoverlapping patches. For the j'th patch, the sample average intensity μ j is computed and the local noise standard deviation σ j is computed using the following estimator: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 6 5 0 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 5 9 0 where W is the patch width, H is the patch height, I is the noisy image, N is a Laplacian-based mask, and the sum is over the i'th pixels in the j'th patch. Finally, a and b are estimated using least squares regression between the M values of μ j and σ 2 j . Specifically, the ordinary least squares solver (lscov) provided by the Octave 26 software package was used to perform the regression, where σ 2 j are the response variables and μ j are the regressor variables. This approach is simple, fast (requiring only a single convolution and a few arithmetic operations), and it mitigates the effects of varying terrain without having to explicitly identify edges 27 or weakly textured patches, 28 which can be difficult in multiband imagery.

Spectral Homogeneity and Optimal ROI Selection
The spectral homogeneity of a candidate ROI is determined using the estimated noise level in Eq. (1). The intensity sample mean μ R and sample standard deviation σ R are computed for each candidate ROI. The predicted noise level standard deviation σ P is computed with Eq. (1) using μ R and the estimated parameters a and b. Then an ROI is labeled as spectrally homogeneous if, E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 3 6 9 σ R ≤ T · σ P ; (4) the variability of the ROI σ R is at or below the noise level σ P times a noise level threshold factor T. For multiband images, this test is performed on each band separately and the ROI is labeled homogeneous if the test is satisfied for all bands. The value of T can be determined empirically for a given sensor and collection environment. However, experiments showed that the proposed method is robust around a T value of one. This would be reasonable to label an ROI homogeneous if it is within one standard deviation from the noise floor. In fact, a range near one was evaluated by computing the detection and false alarm rates using ground truth and it performed well (see Sec. 3).
Finally, the optimal ROI is the candidate that is spectrally homogeneous with the most pixels and has no spectrally heterogeneous smaller candidate. This ROI is optimal because larger candidates introduce heterogeneity and smaller ones are simply partitioning pixels among noise.

Experimental Setup
The proposed method is a binary classifier that discriminates between two classes of targetsspectrally homogeneous versus heterogeneous ROIs. Receiver operating characteristic (ROC) metrics were calculated to evaluate the diagnostic ability of the classifier. The experiments used to evaluate the classifier are divided into two groups, those with and those without ground truth. For the experiments with ground truth, the accuracy and the sensitivity of the optimal ROIs were tested using spectrally homogeneous reflectance calibration panels. The ground truth consists of the polygon shapes that outline the calibration panels. For the experiments without ground truth, the sensitivity of the optimal ROIs was evaluated, and a visual assessment was performed. For both imagery with and without ground truth, the author selected the test seed pixels near the center of the target regions.

Experimental Methodology
To determine a viable operating interval for the noise level threshold factor T from Eq. (4), the ground truth data sets were evaluated using a range of values from zero to ten. From this analysis, an ROC curve and its area under the curve (AUC) were computed. In addition, the interval near one noise level standard deviation was evaluated since this would be the range many users would expect to set as a noise threshold. A threshold from this range was used to evaluate the sensitivity of the classifier.
To evaluate the accuracy of any optimal ROI generated, a truth mask was created of the ground truth panels. The pixels within the truth mask are the expected output of the classifier for any seed pixel selected within the corresponding panel. The actual classifier output ROI was then compared to the expected truth to create a confusion matrix from which the detection rate and false alarm rate were computed. The seed pixels for the ground truth cases were selected near the panel centers. Note, for cases without ground truth only a relative evaluation is possible, and the optimal ROI generated from an initial user-selected, seed pixel is considered "ground truth." The procedure described in the remainder of this section is used to evaluate the sensitivity to this seed location.
It is unrealistic to require a user to select any specific seed to achieve a good result. Generally, an image analyst will try to select a seed near the center of the desired target and not be too concerned with its exact position to within a few pixels. So any ROI extraction algorithm must also be insensitive to the seed location to within a few pixels. To evaluate the sensitivity of the optimal ROI, similar confusion matrix metrics were computed. However, the optimal ROI using an initial seed pixel was used as the expected, or "truth," mask. Then the seed was perturbed by a few pixels in the line and sample directions. The optimal ROI for each perturbation was compared to the unperturbed ROI. This yields a relative accuracy whose behavior as a function of the perturbation distance characterizes the sensitivity of the algorithm to the seed location. The perturbation window was 5 × 5 pixels centered at the seed.
Note to reduce the computational load of the clustering process, the initial ROI can be a subset of the initial image. For this analysis, the initial ROI is a window of size 101 × 101 pixels centered at the seed. However, the noise estimation was performed using the entire image and used patch sizes of 13 × 13 pixels. This patch size was determined empirically to be large enough to still find many homogeneous patches and small enough to minimize the computational load. The connected component analysis used an eight-connectivity neighborhood.

Data Sets
Testing was limited to RGB images. The ground truth data sets consisted of the RGB imagery from the SHARE 2012 29 collection and Forest Radiance I hyperspectral data for the HYDICE 30 sensor. The HYDICE data were reduced to RGB bands by selecting those corresponding bands. These images contain 12 spectrally homogeneous calibration panels.
Data sets without suitable ground truth included the Moffett Field imagery collected using the AVIRIS 31 sensor and the Pavia University imagery collected using the ROSIS 32 sensor. These data cubes were also similarly reduced to RGB images. Although these data cubes had land cover mask ground truth, they did not have ground truth targets deemed to be spectrally homogeneous like reflectance calibration panels.

Experimental Results
The detector ROC curve [ Fig. 2(a)], parameterized by T, has an AUC of 98%. Its detection rates were 82%, 90%, and 95% with false alarms rates of 0.2%, 0.3%, and 0.31% for T values at 0.5, 0.75, and 1.0, respectively. Since the goal is to find a homogeneous ROI and not the entire object, extracting dozens or hundreds of pixels will be sufficient for most users. So even having detection rates as low as, say, 50% and false alarm rates as high as 1% would be satisfactory. For example, if the desired target had 100 pixels these rates yield 50 true pixels and one false pixel. A few false pixels potentially contaminating ROI statistics can be easily mitigated with a robust estimator. So the classifier does a good job of detecting homogeneous ROIs in the range of expected thresholds. A threshold in this interval 1.0 was used for subsequent tests.
The accuracy [ Fig. 2(b)], detection rate [ Fig. 3(a)], and false alarm rate [ Fig. 3(b)] for each test case and their average demonstrate that the detector is insensitive to the exact location of the seed. The average detection rate showed a gradual decline from 95% to a minimum of 90% at a perturbation distance of about 3 pixels. All cases yielded similar declines in detection rate with nearly constant false alarm rates. The Moffett image showed the largest drop to 75%. The total accuracy also showed a gradual decline across the perturbation distances. Note for the Forest Radiance and SHARE 2012 cases, Fig. 2(b) presents absolute accuracies since ground truth is available but for the Moffett and Pavia cases, they represent relative accuracies since no ground truth is available. The same is true for Fig. 3.
Optimal ROIs were extracted for six Forest Radiance calibration panels [ Fig. 4(a)] and six SHARE 2012 calibration panels [ Fig. 4(b)]. The optimal ROIs visually appear to do a good job covering most of the panel pixels.
Optimal ROIs of nonground truth for the Pavia (Fig. 5), Moffett (Fig. 6), and SHARE 2012 (Fig. 7) images present similar results. The optimal ROI does not always fill the underlying physical object (e.g., a building roof) but visually appears to occupy spectrally similar nearby Fig. 2 ROC curve (a) shows a high AUC and detection rates with low false alarm rates near one noise level standard deviation. The accuracy (b) shows a gradual decline across the seed pixel perturbation distances. Fig. 3 The detection rate (a) shows a gradual decline across perturbations with (b) a nearly constant false alarm rate.    pixels. In the Pavia image, the optimal ROI discriminates between the different shadings on each side of the tilted roof and avoids a group of anomalous blue pixels in the upper right corner. Also when those anomalous pixels were used as a seed, the optimal ROI detected was just those pixels. So the proposed method did not force the anomalies to be grouped with their neighbors.
The Moffett image shows the ROIs generated for a light and dark area of terrain. For the dark area, the classifier was able to exclude the similar but spectrally different areas of lighter pixels to its right. The SHARE 2012 image shows optimal ROIs taking the shape of the basketball court regions without including the surrounding paint lines or anomalous, neighboring yellow pixels.

Conclusions
An automated extraction method of spectrally homogeneous ROIs has been developed for remotely sensed images. The method requires the selection of a single pixel and a target agnostic, noise threshold by the user. The ROI is generated based on an in-scene estimation of the image noise level. Performance and sensitivity tests showed the extracted ROIs achieved detection rates up to 95%, false alarms rates below 1%, and remained robust to the main user input, the initial seed location. The method was also able to exclude anomalous pixels and subtle terrain shading that a user might incorrectly include in an ROI due to suboptimal visualization decisions.
Four key improvements over current approaches are: (1) simplifying the user input to a single seed pixel, (2) preventing the misapplication of algorithms that segment the physical object and not spectrally homogeneous pixels, (3) eliminating the need for the user to possess a priori knowledge of image or target specific thresholds, and (4) mitigating against mistakes introduced by human visualization of multiband spectral data. The proposed method avoids these problems by estimating an accurate noise model and reversing the growing process by converging to the seed through similarity space.
The approach described can be reasonably extended to imagery with additional bands in the same spectral region. However, more research is needed to evaluate its efficacy with, for example, hyperspectral images (HSI) with hundreds of bands. In particular, the ROI selection process (Sec. 2.3), which requires target homogeneity in all bands, may be too stringent since a target's spectral behavior can change significantly between, say, the visible and the infrared range.
Future research will include evaluating (1) a more efficient noise estimation since the current approach may be slow for large images, (2) different noise models for other sensor modalities (e.g., SAR), (3) the behavior of the proposed method at various nominal and stressing noise levels, and (4) different data types (e.g., HSI).