Urban area detection from high-spatial resolution remote sensing imagery using Markov random field-based region growing

Abstract. Dynamically changing urban areas require periodic automatic monitoring, but urban areas include various objects and different objects show diverse appearances. This makes it difficult to effectively detect urban areas. A region-growing method using the Markov random field (MRF) model is proposed for urban detection. It consists of three modules. First, it provides an automatic urban seed objects extraction approach by designing three features with respect to urban characteristics. Second, the method uses an object-based MRF to model the spatial relationship between urban seed objects and surrounding objects. Third, a MRF-based region-growing criterion is proposed to detect urban areas based on seed points and spatial constraints. The strength of the proposed method lies in two aspects. One is that automatic selection of seed points is presented instead of manual selection. The other one is that the region-growing technique, instead of probabilistic inference, is used to solve the MRF optimization problem. Experiments on aerial images and SPOT5 images demonstrate that our method provides a better performance compared with the region-growing method, the classical and object-based MRF methods, or some other state-of-art methods.

Urban area detection from high-spatial resolution remote sensing imagery using Markov random field-based region growing 1 Introduction In recent years, urban detection has become more and more crucial for many applications. It helps government agencies and urban region planners in updating the geographic information system and forming plans. Moreover, due to an enormous number of human activities, the scope of urban areas quickly changes from time to time. Considering the conflict between the need for periodically detecting urban areas and the high-human cost, many approaches had been proposed to automatically detect urban areas from remote sensing images. [1][2][3][4][5][6][7][8] However, an urban area is an abstract semantic object. It is a comprehensive region including several subobjects such as buildings, roads, trees, water bodies, grass spaces, etc. This means that classical spectral-based recognition methods cannot be simply transferred to extract urban areas. Hence, besides spectral value, features that are more effective are needed for urban detection. Since urban scenes usually have a unique texture with respect to natural scenes, texture analysis becomes one main approach for urban monitoring. [9][10][11] However, the texture pattern of urban scenes is not consistent in all kinds of areas. Methods of texture analysis may suffer from a lack of robustness. In order to answer this problem, several methods have been studied.
For instance, Benediktsson et al. 1 adopted morphological transformations to extract features of urban areas and classify them using a neural network. Weizman and Goldberger 12 built a visual dictionary to learn the urban visual words and then detected the urban regions based on the dictionary. Sirmacek and Ünsalan 13 employed the local feature points extracted by the Gabor filter to vote for the candidate urban areas. Furthermore, Kajimoto and Susaki 14 and Liu et al. 15 extracted the urban areas from polarimetric SAR images using the polarization orientation angle and only positive samples, respectively. However, algorithms may have less transferability with respect to different urban characteristics, as no single-feature descriptor is available for all kinds of the urban objects. On the contrary, some subobjects that consist of a typical urban pattern can be well detected according to their own characteristics. For instance, man-made objects, such as buildings 6,7,16 and roads, [17][18][19] usually have compact shapes. In contrast, spectral features are important for detecting natural objects, e.g., vegetations 20,21 and water bodies. 21 Hence, an alternative way of urban detection is to first detect some urban subobjects and then extract the entire urban area based on the extracted subobjects. The region-based classification is a widely used approach to detect certain land cover objects. [22][23][24][25] However, different urban areas may consist of different subobjects. Meanwhile, some subobjects, such as trees and water bodies, may appear in both urban areas and the nonurban areas. This phenomenon makes the region-based urban detection methods challenging, even though each urban subobject can be accurately classified. As urban objects are spatially adjacent, one possible way to answer this problem is to take the spatial information of objects into account. The Markov random field (MRF) 26 model provides a statistical way to model spatial contextual information, and it has been extended to the region level for image classification. [23][24][25] For example, Wu et al. 23 used some rectangular regions as the initial objects and then classified the polarimetric SAR images using the Wishart MRF. However, the accuracy of classification is still limited when the rectangular region is located on the edge of some objects. Zhang et al. 24 improved this method by using a mean shift to obtain the finer initial regions. Wang and Zhang 25 used the Gaussian distribution to recognize images instead of the Wishart distribution. Although these MRF-based classification approaches usually obtained remarkable results, they assumed that each land class obeyed a certain probability distribution, e.g., the Wishart or Gaussian distribution. Nevertheless, the assumption about the probability distribution does not hold in the case of detecting urban areas, as urban areas are often represented as complex regions with various subobjects. Using the probabilistic inference of the MRF model in terms of common probability distributions cannot appropriately detect urban areas.
Motivated by this observation, this paper proposes an MRF-based region-growing method to extract urban areas. Our main contributions include two aspects. First, the proposed method introduces a new MRF-based region-growing criterion to overcome the limitation of the traditional probabilistic inference way of the MRF model. The method retains the advantages of the MRF model in the description of the regional spatial constraints. Both the spatial constraints and the characteristic of urban areas are considered to design a region-growing criterion. Second, an automatic seed objects extraction method is proposed for the MRF-based region growing. The method automatically extracts three features to describe the spectral and granularity information and uses these three features to detect buildings and their shadows as seed points. Our method provides an unsupervised way to detect urban areas, which makes it possible to capture the correlations among various urban objects by combining the benefits of region growing and the MRF model.
The rest of this paper is organized as follows. Section 2 introduces the method for initializing seeds, and Sec. 3 presents the details of the MRF-based region-growing method. Section 4 discusses the results obtained by applying our method on remote sensing images. Finally, Sec. 5 draws a conclusion.

Selection of Seed Points
The selection of seed points is a fundamental step for a region-growing algorithm. The main concept of the selection of seed points is grounded in the observation that the buildings are located in every corner of the city and are often adjacent to shadow areas. Hence, we extract them and their shadows as seed points in this section. In order to appropriately detect seed points, we will first explore three features F 1 , F 2 and F 3 . The details are given in the following sections.

Extract the Pixel-Level Spectral Value F 1
Because buildings usually show a bright appearance in an image and their shadows are dark, a spectral value F 1 is used to describe this feature. Namely, for a given image Y ¼ ðY 1 ; Y 2 ; : : : ; Y P Þ, each spectral channel Y t (1 ≤ t ≤ P) is defined on an M × N rectangular lattice S, i.e., S ¼ fsjs ¼ ði; jÞ; 1 ≤ i ≤ M; 1 ≤ j ≤ Ng and Y t ¼ ðy t s Þ M×N . Then, spectral value which can describe the spectral value of each pixel s on different channels.

Extract the Region-Level Spectral Variance F 2
Different urban objects have various appearances, so their spectral variance should be relatively large. Hence, we design a region-level spectral variance F 2 to capture this feature. First, the initial objects are obtained using a mean shift method, 27 which constructs a probability density to reflect the underlying distribution of points in some feature space and to map each point to the mode of the density which is closest to it. Then, the given image Y is divided into an oversegmented region set R, i.e., R ¼ fR 1 ; R 2 ; : : : ; R k g. Each R i of R denotes an over-segmented region (i ¼ 1;2; : : : ; k), R i ∩ R j ¼ ∅ (i ≠ j), and k is the number of these regions. With the region set R, we can further define the neighborhood system N ¼ fN i ji ¼ 1;2; : : : ; kg to describe the spatial context of regions. Here, each N i denotes the set of regions neighboring R i . Let MðR i Þ be the mean value of pixels in R i , and the local spectral variance between region R i and its adjacent regions can be calculated as follows: (1) where μ i ¼ ð1 þ jN i jÞ −1 ½MðR i Þ þ P j∈N i MðR j Þ, and jN i j is the number of regions in N i . In Eq. (1), every region has the same impact on VðR i Þ. Intuitively, it may be preferable to determine the impacts in Eq. (1) using an adaptive way. Hence, the equation for VðR i Þ is revised as In Eq. (2), M Ã ðR j ; R i Þ is defined as follows: region size jR i j is the number of pixels in region R i . In this revised equation, the impact of R j will be reduced when the ratio of jR j j to jR i j is less than p. That is to say, the effect of each region is affected by its region size. Here, p is a threshold with which to measure the ratio between the sizes of two regions. Since the relationship of region sizes among different objects is relatively stable, we empirically set p as 0.3 in this paper.
Based on the VðR i Þ, to reflect the spectral variance among regions. Here, RðsÞ is the region to which pixel s belongs.

Extract the Granularity Information F 3
Urban areas have more different types of objects and more complicated appearances than nonurban areas. Therefore, in the over-segmented region set R ¼ fR 1 ; R 2 ; : : : ; R k g, objects of urban areas usually have smaller region sizes than objects of nonurban areas. In other words, the granularity of urban areas is finer than that of nonurban areas. Hence, we employ the region size and the spatial relationship among regions to define where P½RðsÞ ¼ jRðsÞj · ðM · NÞ −1 . In the above equation, P½RðsÞ − P½RðsÞ · logfP½RðsÞg is used to reflect the region size of RðsÞ and P j∈N RðsÞ fPðR j Þ − PðR j Þ · log½PðR j Þg is used to describe the context information of regions. Note that fðxÞ ¼ x − x · logðxÞ is a monotonically increasing convex function when x ∈ ½0;1. Hence, the monotonicity of fðxÞ can make P½RðsÞ − P½RðsÞ · logfP½RðsÞg to indicate the region size. What is more, if we assume that jN RðsÞ j is fixed, the convexity of fðxÞ can make P j∈N RðsÞ fpðR j Þ − pðR j Þ · log½pðR j Þg take a small value when the sizes of regions neighboring RðsÞ are close. It will lead to a consistent result with a smooth region size, which is suitable for capturing the granularity information since the granularities of regions are usually similar for one certain object.
An example to illustrate these features is shown in Fig. 1, where Figs. 1(b), 1(d), and 1(e) are features F 1 , F 2 and F 3 extracted from Fig. 1(a). From this example, one can see that the buildings in Fig. 1(b) are bright, which denotes a high F 1 value, and their shadows are of the low F 1 value. Similarly, the spectral variance F 2 of urban areas is larger than that of others areas, and urban areas have a small granularity F 3 value. Based on these features, we design to describe the buildings' spectral values, dark shadows' spectral values, regional spectral variance, and granularity information, respectively. They are where F 1 γ , F 1 1−γ , F 2 λ , and F 3 π denote the γ, 1 − γ, λ, and π fractile of F 1 , F 2 and F 3 , respectively. γ,λ, and π are the key parameters for the selection of seed points. The parameter γ is used to make E 1 capture the spectral feature of buildings. Since buildings usually take a high spectral value, they are often expressed as the tail of the histogram of F 1 . Hence, γ is set to a high value to get the tail of the histogram of F 1 , such as Fig. 1(f). Correspondingly, E 2 uses 1 − γ to obtain the first peak of the histogram of F 1 , which describes the dark shadows with a low F 1 value. For the same reason, E 3 and E 4 are set with a high λ value and low π value to catch the tail of the histogram of F 2 and the first peak of the histogram of F 3 , respectively. These can extract buildings' spectral variance and granularity features. An illustration of setting γ, λ, and π is shown in Figs. 1 Then, by sequentially combining E 1 , E 2 , E 3 , and E 4 , seed points can be obtained. Namely, we first use D 1 ¼ ðd 1 s Þ M×N to get pixels belonging to buildings and adjoining the shadows, or pixels belonging to shadows and adjoining the buildings. This is defined as where the local square window wðs; rÞ is centered at site s and its radius is r. Then, we further consider the information of E 3 and E 4 by defining At last, seed points will be selected as the set D ¼ fsjd 3 s ¼ 1; s ∈ Sg. For r and l, these seed points are used to determine whether a local window wðs; rÞ simultaneously contains pixels from E 1 , E 2 , E 3 , and E 4 and whether pixels of each kind are not less than l. Because a building is spatially adjacent to its shadow, they can be effectively detected together using a relative small patch of the given image. Hence, by setting r to 2 for D 1 , D 2 , and D 3 , we use the local window wðs; r ¼ 2Þ as the small patch to select seed points in the following. At the same time, if there are buildings and their shadows in the small patch, there will be at least one pixel labeled 1 in the patch for each e i s , i ¼ 1, 2, 3, 4. Therefore, l is set to 1. It means that only a pixel which simultaneously possesses or neighbors E 1 , E 2 , E 3 , and E 4 within the small local window wðs; 2Þ can be chosen as the seed point. An example is shown in Fig. 1(i). Note that one pixel would show different sizes of the Earth's surface in remote sensing images with various spatial resolutions, which may affect the setting of parameter r. Namely, r can be set to 1 for the low-spatial resolution remote sensing images and be set larger than 2 for extreme high-spatial resolution remote sensing images.

MRF-Based Region Growing
Based on extracted seed points, a MRF-based region-growing criterion is proposed in this section. First, the MRF model is briefly reviewed. Then, the proposed criterion for urban detection is introduced.

MRF Model
Let X ¼ fX R i jR i ∈ Rg be the label random field defined on the over-segmented region set R. We use 1 to flag urban areas and 0 to flag nonurban areas, and each random variable X R i takes a value of 1 or 0 to represent the label of region R i it belongs to. If x ¼ fx R i jR i ∈ Rg denotes the realization of X, the optimal realizationx can be obtained by maximizing the posterior probability, i.e.,x ¼ argmax The energy form of Eq. (4) iŝ x ¼ argmin In Eq. (5), the likelihood function PðYjXÞ is used to describe image features. In this paper, we assume that all Y R i of Y are independent given labels. That is The distribution of random field PðXÞ is assumed to be of the Markovianity property, i.e., Therefore, Eq. (5) can be rewritten aŝ Due to the complexity caused by interactions among labels, it is difficult to find the solution of the MRF model. Hence, the local optimal solutionx ¼ ðx R i Þ can be obtained as follows: where the likelihood energy E f ðR i Þ is the cost of the observation of R i , and the label energy E l ðR i Þ is the cost of the label of R i .

MRF-Based Region Growing
In this section, an MRF-based region-growing criterion is introduced to find the optimal realizationx. To minimize the total energy of the MRF model, the proposed method will iteratively merge adjacent regions that could decrease the total energy. Namely, for neighboring regions R i and R t , the total changed energy EðR i ; R t Þ is first calculated these two regions are merged. Based on Eq. (7), EðR i ; R t Þ equals the sum of the changed likelihood energy E f ðR i ; R t Þ and the changed label energy E l ðR i ; R t Þ, i.e., Here, where jR i j · ½MðR i Þ − MðR i ∪ R t Þ 2 and jR t j · ½MðR t Þ − MðR i ∪ R t Þ 2 can reflect the change of the observations in region R i and R t , respectively. The changed label energy of R i is defined as where the pair-clique potential E l ðR i ; R t Þ uses jR i j to consider all changed label energies for each pixel in R i and its neighbors when x R i is relabeled as x R t . Then, by merging region R i and its neighboring region that can minimize the total changed energy, a MRF-based region-growing approach can realize urban detection step by step. The details of the rule of region growing are given in Algorithm 1. The proposed criterion is different from traditional region-growing methods, as it does not begin from seed points but from nonseed points. We only consider the nonurban regions labeled 0 and their region sizes are less than the threshold. For each selected region R i , the energy values are calculated between R i and its neighbor regions, respectively. Then, R i is merged with the one neighbor region that has the minimum energy value. Hence, R i merged with an urban region will lead to a larger urban region; in contrast, R i merged with a nonurban region will result a new nonurban region. Therefore, the rule of our approach is a competition rule of region growing for both urban and nonurban regions.
Urban areas can be extracted using the region-growing criterion. Namely, urban areas are first initialized using the label field x ¼ fx R i jR i ∈ Rg based on seed points D, i.e., set Then, by increasing the thresholds, the growing criterion gradually updates the urban areas. Note that different sun angles may affect the shadow length and direction, but it does not change the spatial topological relationship between buildings and their shadows. Hence, the proposed method is robust for effective detection of varying urban areas contained in different remote sensing images.

Parameter Setting
There are two parameters in the MRF-based region-growing criterion, i.e., β and T. The potential parameter β is used to balance the influence between E f ðR i ; R t Þ and E l ðR i ; R t Þ. A high β value emphasizes E l ðR i ; R t Þ and leads to results with large homogeneous objects. On the contrary, a low β value emphasizes E f ðR i ; R t Þ and is suitable for getting results with many details. Hence, β should select different values for various applications. However, as the relationship between urban and nonurban areas is quite stable, β is fixed and is empirically set as 0.05 for simplifying the parameter setting.
The threshold T is used to control the process of region growing. By gradually increasing T, small regions labeled nonurban are merged into larger urban regions or nonurban regions, then urban areas are extracted. In practice, we used T ¼ 25 as the initial threshold and doubled the threshold each time. The final termination threshold was determined by the change of the spectral variance. The assumptions supporting this threshold selection are that urban areas consist of various subobjects and their spectral variance should be large; if the nonurban areas are wrongly recognized as urban areas, an abrupt change of the spectral variance should be observed. Here, we use CRði; i þ 1Þ to show the change rate of spectral variances, i.e., where StdTðiÞ denotes the standard deviation of detected urban areas with termination threshold T ¼ TðiÞ. Then, we can take the inflection point of CRði; i þ 1Þ as the final termination threshold, after which CRði; i þ 1Þ will abruptly decrease. An example is shown in Fig. 2, where we use T ¼ ½25; 50; 100; 200; 400; 800; 1600 as the candidates of termination thresholds. Some extracted urban areas are illustrated in Figs. 2(a)-2(g). StdTðiÞ with different Ts is calculated and given in Fig. 2(h), where the corresponding CRði; i þ 1Þs are also shown in Fig. 2(i). As CRð200;400Þ is an inflection point, we take T ¼ 400 as the final termination threshold for this experiment and Fig. 2(e) shows the corresponding detection result.

Algorithm 1
Input: the observed image.
Output: urban detection result.
1) Set a threshold T .
2) If there exists a region R i satisfying jR i j < T and x R i ¼ 0, select R i and go to step 3; else, stop.
3) For R i and its neighbor region R t , based on Eqs. (8)(9)(10), calculate the total changed energy E ðR i ; R t Þ.
4) Find the region R Ã i that has the minimum energy value, i.e., R Ã i ¼ argmin Merge R i and R Ã i as a new region labeled x R Ã i , then go to step 2.

Experiments
The MRF-based region-growing method provides an unsupervised way for the monitoring of urban areas. With the aim of fully evaluating the performance of the proposed method, experiments and comparisons were carried on two groups of images, i.e., aerial images (Sec. 4.1) and SPOT5 images (Sec. 4.2).

Experiments of Aerial Images
In this experiment, three aerial images, as shown in Fig. 3, are used to test our method and other urban extraction methods. These aerial images were acquired in 2009 and are located in Taizhou City, China. The three images have the same size of 500 × 500, and the spatial resolution is 0.4 m. The test images contain plane agriculture fields and small villages, where urban objects show various spectral appearances and some nonurban objects are similar to seed points in terms of spectral characteristics. This makes urban detection challenging. Moreover, the following competitive methods are also considered for comparison: 1. The traditional region-growing method: 28 it detects urban areas without employing the MRF model. 2. The classical MRF model: 29 it uses the generated probabilistic model at the pixel level to obtain results. 3. The object-based MRF (OMRF) model: 25 it extends the MRF model from the pixel level to the object level for capturing the macrotexture pattern of a given image; this uses  initial over-segmented regions to build the region adjacency graph (RAG) and defines the MRF model on the RAG to realize the segmentation. 4. The two-class support vector machine (SVM): 30 it is provided by ENVI software, which is a commonly used classification approach with training data. 5. The object-based SVM: 22 it extracts the regional features from a hierarchical tree of the scene and obtains a classification using the SVM classifier.
For the sake of fairness, we chose the same seed points to train the urban areas for the traditional region-growing method and the two SVM methods and deliberately selected samples to train the nonurban areas for these SVM methods as well. We also tuned the parameters of these methods to get their optimal performances. For the traditional region-growing method, we chose the threshold parameter following the instructions in the literature. 28 For the two-class SVM, we set the radial basis function as the kernel type, the gamma in kernel function as 0.33, and the penalty function as 100, respectively. For the object-based SVM, we use 0.1% as the ratio of training samples based on the literature. 22 Therefore, the comparison can demonstrate the difference between our model and other state-of-the-art methods.
Experimental results of aerial images are shown in Fig. 3. Here, the caption of Fig. 3 consists of two parts, where the first part using the alphabetical order denotes different test images and the second part using the number order denotes different detection methods. Detected urban objects are represented as yellow masks over the test images. From the comparative test, one can see that the proposed method exhibits a remarkable improvement for urban detection. Namely, the traditional region-growing method, as shown in Figs. 3(a2)-3(c2), still has huge misclassifications which belong to different object categories and have similarity spectral appearances. The main reason is that the traditional region-growing method only uses the spectral features which do not consider the spatial constraint. By employing the spatial context information, the classical MRF model has less misclassification of nonurban areas. However, this pixel-level generate model can just recognize the parts of the urban areas with similar appearances, since it cannot model the complex and macropatterns by incorporating the long-range interactions. It also wrongly labels some urban objects as nonurban, such as the roofs of buildings and vegetation. The OMRF model utilizes the regions to describe the macrospatial constraints and improves the classical MRF model, e.g., Figs. 3(a4) and 3(c4). However, the OMRF model usually leaves the characteristic of urban areas out of consideration, which may lead to some undesirable results such as Fig. 3(b4). The SVM method trains data to obtain urban areas. Although it can effectively recognize buildings, urban vegetation objects are sometimes classified as nonurban areas because of the lack of spatial information. The object-based SVM improves the pixel-based SVM and gets results that are more consistent by considering the object semantic information with regional features. Nevertheless, it still cannot sufficiently use spatial information whose results have some misclassifications. Compared with these methods, our MRF-based region-growing method first considers the urban characteristics when we select seed points, then employs the MRF defined on the region level to capture regional spatial constraints, and finally proposes a corresponding region-growing criterion that utilizes these features to detect urban areas. Hence, our method demonstrates a better performance than the other methods.
Experimental results are quantitatively evaluated by the overall accuracy (OA) and kappa coefficient κ. OA and κ are the two indicators that measure the degree of similarity between two images. 31 If P ij is the proportion of subjects that were assigned to the i'th class by the first image and the j'th class by the second image and denotes P i• ¼ P k j¼1 P ij and The OA and κ of aerial images are given in Table 1. From these quantitative indexes, we know that MRF-based region growing can enhance both the OA and kappa for each experimental image. This also shows that our method extracts a better scope of urban areas than do the other methods. In particular, when the topographic features are     complex, the enhancement of indices is obvious. For clarity, the quantitative indices of Table 1 are illustrated in Fig. 4.

Experiments of SPOT5 Images
The effectiveness of the proposed method is further tested in this section. Two SPOT 5 remote sensing images, as shown in Fig. 5, are employed for the next experiment. These test images are located on the Pingshuo area of China. Both sizes are 438 × 438. These test images mainly consist of three object types, i.e., urban areas, cultivated land, and woodland. Among them, urban green space and woodland and urban building and cultivated land have similar spectral appearances, respectively. This phenomenon increases the difficulty of urban detection. Experiments of SOPT5 images are illustrated in Fig. 5. Compared with the ground truth, the MRF-based region-growing method performs well and the results are close to the ground truth. This demonstrates that our model can effectively extract urban areas from different datasets.

Conclusions
To summarize, we proposed an unsupervised urban detection method by unifying the regiongrowing method and the MRF model. It first uses the granularity information and spectral features to automatically extract some typical urban objects as the seed points, which can be treated as the skeleton for the urban areas. Then, the MRF is employed to model the spatial relationships between urban seed points and other urban objects. At last, the region-growing criterion uses these relationships to recognize urban nonseed objects, which will lead to consistent results. The main novelty of the method the automatic extraction of urban seed points and the detection of urban areas using a region-growing criterion under the regional MRF-based spatial constraints. The effectiveness of the proposed method is validated by experimental results obtained from various high-spatial resolution remote sensing images. Compared to a traditional region-growing method, the classical and object-based MRF models, and the common and object-based SVM, our method can provide more precise and more meaningful results, which verifies that our method is suitable to detect urban areas. However, this method is only proper for urban detection. If it is used to extract other terrestrial objects, then one has to design a new seed extraction method and modify the region-growing criterion.
For the method presented, the potential parameter β need to be empirically set. If this parameter can be estimated in an adaptive way, then it will improve the current method.