## 1.

## Introduction

In recent years, urban detection has become more and more crucial for many applications. It helps government agencies and urban region planners in updating the geographic information system and forming plans. Moreover, due to an enormous number of human activities, the scope of urban areas quickly changes from time to time. Considering the conflict between the need for periodically detecting urban areas and the high-human cost, many approaches had been proposed to automatically detect urban areas from remote sensing images.^{1}2.3.4.5.6.7.^{–}^{8} However, an urban area is an abstract semantic object. It is a comprehensive region including several subobjects such as buildings, roads, trees, water bodies, grass spaces, etc. This means that classical spectral-based recognition methods cannot be simply transferred to extract urban areas. Hence, besides spectral value, features that are more effective are needed for urban detection. Since urban scenes usually have a unique texture with respect to natural scenes, texture analysis becomes one main approach for urban monitoring.^{9}10.^{–}^{11} However, the texture pattern of urban scenes is not consistent in all kinds of areas. Methods of texture analysis may suffer from a lack of robustness. In order to answer this problem, several methods have been studied. For instance, Benediktsson et al.^{1} adopted morphological transformations to extract features of urban areas and classify them using a neural network. Weizman and Goldberger^{12} built a visual dictionary to learn the urban visual words and then detected the urban regions based on the dictionary. Sirmacek and Ünsalan^{13} employed the local feature points extracted by the Gabor filter to vote for the candidate urban areas. Furthermore, Kajimoto and Susaki^{14} and Liu et al.^{15} extracted the urban areas from polarimetric SAR images using the polarization orientation angle and only positive samples, respectively. However, algorithms may have less transferability with respect to different urban characteristics, as no single-feature descriptor is available for all kinds of the urban objects.

On the contrary, some subobjects that consist of a typical urban pattern can be well detected according to their own characteristics. For instance, man-made objects, such as buildings^{6}^{,}^{7}^{,}^{16} and roads,^{17}18.^{–}^{19} usually have compact shapes. In contrast, spectral features are important for detecting natural objects, e.g., vegetations^{20}^{,}^{21} and water bodies.^{21} Hence, an alternative way of urban detection is to first detect some urban subobjects and then extract the entire urban area based on the extracted subobjects. The region-based classification is a widely used approach to detect certain land cover objects.^{22}23.24.^{–}^{25} However, different urban areas may consist of different subobjects. Meanwhile, some subobjects, such as trees and water bodies, may appear in both urban areas and the nonurban areas. This phenomenon makes the region-based urban detection methods challenging, even though each urban subobject can be accurately classified. As urban objects are spatially adjacent, one possible way to answer this problem is to take the spatial information of objects into account. The Markov random field (MRF)^{26} model provides a statistical way to model spatial contextual information, and it has been extended to the region level for image classification. ^{23}24.^{–}^{25} For example, Wu et al.^{23} used some rectangular regions as the initial objects and then classified the polarimetric SAR images using the Wishart MRF. However, the accuracy of classification is still limited when the rectangular region is located on the edge of some objects. Zhang et al.^{24} improved this method by using a mean shift to obtain the finer initial regions. Wang and Zhang^{25} used the Gaussian distribution to recognize images instead of the Wishart distribution. Although these MRF-based classification approaches usually obtained remarkable results, they assumed that each land class obeyed a certain probability distribution, e.g., the Wishart or Gaussian distribution. Nevertheless, the assumption about the probability distribution does not hold in the case of detecting urban areas, as urban areas are often represented as complex regions with various subobjects. Using the probabilistic inference of the MRF model in terms of common probability distributions cannot appropriately detect urban areas.

Motivated by this observation, this paper proposes an MRF-based region-growing method to extract urban areas. Our main contributions include two aspects. First, the proposed method introduces a new MRF-based region-growing criterion to overcome the limitation of the traditional probabilistic inference way of the MRF model. The method retains the advantages of the MRF model in the description of the regional spatial constraints. Both the spatial constraints and the characteristic of urban areas are considered to design a region-growing criterion. Second, an automatic seed objects extraction method is proposed for the MRF-based region growing. The method automatically extracts three features to describe the spectral and granularity information and uses these three features to detect buildings and their shadows as seed points. Our method provides an unsupervised way to detect urban areas, which makes it possible to capture the correlations among various urban objects by combining the benefits of region growing and the MRF model.

The rest of this paper is organized as follows. Section 2 introduces the method for initializing seeds, and Sec. 3 presents the details of the MRF-based region-growing method. Section 4 discusses the results obtained by applying our method on remote sensing images. Finally, Sec. 5 draws a conclusion.

## 2.

## Selection of Seed Points

The selection of seed points is a fundamental step for a region-growing algorithm. The main concept of the selection of seed points is grounded in the observation that the buildings are located in every corner of the city and are often adjacent to shadow areas. Hence, we extract them and their shadows as seed points in this section. In order to appropriately detect seed points, we will first explore three features ${\mathit{F}}^{1}$, ${\mathit{F}}^{2}$ and ${\mathit{F}}^{3}$. The details are given in the following sections.

## 2.1.

### Extract the Pixel-Level Spectral Value ${\mathsf{F}}^{\mathsf{1}}$

Because buildings usually show a bright appearance in an image and their shadows are dark, a spectral value ${\mathit{F}}^{1}$ is used to describe this feature. Namely, for a given image $\mathit{Y}=({\mathit{Y}}^{1},{\mathit{Y}}^{2},\dots ,{\mathit{Y}}^{P})$, each spectral channel ${\mathit{Y}}^{t}$ ($1\le t\le P$) is defined on an $M\times N$ rectangular lattice $\mathit{S}$, i.e., $\mathit{S}=\{s|s=(i,j),1\le i\le M,1\le j\le N\}$ and ${\mathit{Y}}^{t}={({y}_{s}^{t})}_{M\times N}$. Then, spectral value ${\mathit{F}}^{1}={({f}_{s}^{1})}_{M\times N}$ is defined as ${f}_{s}^{1}=\prod _{t=1}^{P}{y}_{s}^{t}$, which can describe the spectral value of each pixel $s$ on different channels.

## 2.2.

### Extract the Region-Level Spectral Variance ${\mathsf{F}}^{\mathsf{2}}$

Different urban objects have various appearances, so their spectral variance should be relatively large. Hence, we design a region-level spectral variance ${\mathit{F}}^{2}$ to capture this feature. First, the initial objects are obtained using a mean shift method,^{27} which constructs a probability density to reflect the underlying distribution of points in some feature space and to map each point to the mode of the density which is closest to it. Then, the given image $\mathit{Y}$ is divided into an over-segmented region set $\mathit{R}$, i.e., $\mathit{R}=\{{R}_{1},{R}_{2},\dots ,{R}_{k}\}$. Each ${R}_{i}$ of $\mathit{R}$ denotes an over-segmented region ($i=\mathrm{1,2},\dots ,k$), ${R}_{i}\cap {R}_{j}=\varnothing $ ($i\ne j$), and $k$ is the number of these regions. With the region set $\mathit{R}$, we can further define the neighborhood system $N=\{{N}_{i}|i=\mathrm{1,2},\dots ,k\}$ to describe the spatial context of regions. Here, each ${N}_{i}$ denotes the set of regions neighboring ${R}_{i}$. Let $M({R}_{i})$ be the mean value of pixels in ${R}_{i}$, and the local spectral variance between region ${R}_{i}$ and its adjacent regions can be calculated as follows:

## (1)

$$V({R}_{i})=\frac{1}{|{N}_{i}|}\{{[M({R}_{i})-{\mu}_{i}]}^{2}+\sum _{j\in {N}_{i}}{[M({R}_{j})-{\mu}_{i}]}^{2}\},$$In Eq. (1), every region has the same impact on $V({R}_{i})$. Intuitively, it may be preferable to determine the impacts in Eq. (1) using an adaptive way. Hence, the equation for $V({R}_{i})$ is revised as

## (2)

$$V({R}_{i})=\frac{1}{|{N}_{i}|}\{{[M({R}_{i})-{\mu}_{i}^{*}]}^{2}+\sum _{j\in {N}_{i}}{[{M}^{*}({R}_{j},{R}_{i})-{\mu}_{i}^{*}]}^{2}\}.$$In Eq. (2), ${M}^{*}({R}_{j},{R}_{i})$ is defined as follows:

Based on the $V({R}_{i})$, ${\mathit{F}}^{2}={({f}_{s}^{2})}_{M\times N}$ is defined as ${f}_{s}^{2}=V{[R(s)]}^{1/2}$ to reflect the spectral variance among regions. Here, $R(s)$ is the region to which pixel $s$ belongs.

## 2.3.

### Extract the Granularity Information ${\mathsf{F}}^{3}$

Urban areas have more different types of objects and more complicated appearances than nonurban areas. Therefore, in the over-segmented region set $\mathit{R}=\{{R}_{1},{R}_{2},\dots ,{R}_{k}\}$, objects of urban areas usually have smaller region sizes than objects of nonurban areas. In other words, the granularity of urban areas is finer than that of nonurban areas. Hence, we employ the region size and the spatial relationship among regions to define ${\mathit{F}}^{3}={({f}_{s}^{3})}_{M\times N}$, i.e.,

## (3)

$${f}_{s}^{3}=(P[R(s)]-P[R(s)]\xb7\mathrm{log}\{P[R(s)]\})+\frac{1}{|{N}_{R(s)}|}\sum _{j\in {N}_{R(s)}}\{P({R}_{j})-P({R}_{j})\xb7\mathrm{log}[P({R}_{j})]\},$$An example to illustrate these features is shown in Fig. 1, where Figs. 1(b), 1(d), and 1(e) are features ${\mathit{F}}^{1}$, ${\mathit{F}}^{2}$ and ${\mathit{F}}^{3}$ extracted from Fig. 1(a). From this example, one can see that the buildings in Fig. 1(b) are bright, which denotes a high ${\mathit{F}}^{1}$ value, and their shadows are of the low ${\mathit{F}}^{1}$ value. Similarly, the spectral variance ${\mathit{F}}^{2}$ of urban areas is larger than that of others areas, and urban areas have a small granularity ${\mathit{F}}^{3}$ value. Based on these features, we design ${E}^{1}={({e}_{s}^{1})}_{M\times N}$, ${E}^{2}={({e}_{s}^{2})}_{M\times N}$, ${E}^{3}={({e}_{s}^{3})}_{M\times N}$, and ${E}^{4}={({e}_{s}^{4})}_{M\times N}$ to describe the buildings’ spectral values, dark shadows’ spectral values, regional spectral variance, and granularity information, respectively. They are

$\gamma $,$\lambda $, and $\pi $ are the key parameters for the selection of seed points. The parameter $\gamma $ is used to make ${E}^{1}$ capture the spectral feature of buildings. Since buildings usually take a high spectral value, they are often expressed as the tail of the histogram of ${\mathit{F}}^{1}$. Hence, $\gamma $ is set to a high value to get the tail of the histogram of ${\mathit{F}}^{1}$, such as Fig. 1(f). Correspondingly, ${E}^{2}$ uses $1-\gamma $ to obtain the first peak of the histogram of ${\mathit{F}}^{1}$, which describes the dark shadows with a low ${\mathit{F}}^{1}$ value. For the same reason, ${E}^{3}$ and ${E}^{4}$ are set with a high $\lambda $ value and low $\pi $ value to catch the tail of the histogram of ${\mathit{F}}^{2}$ and the first peak of the histogram of ${\mathit{F}}^{3}$, respectively. These can extract buildings’ spectral variance and granularity features. An illustration of setting $\gamma $, $\lambda $, and $\pi $ is shown in Figs. 1(f)–1(h).

Then, by sequentially combining ${E}^{1}$, ${E}^{2}$, ${E}^{3}$, and ${E}^{4}$, seed points can be obtained. Namely, we first use ${D}^{1}={({d}_{s}^{1})}_{M\times N}$ to get pixels belonging to buildings and adjoining the shadows, or pixels belonging to shadows and adjoining the buildings. This is defined as

At last, seed points will be selected as the set $D=\{s|{d}_{s}^{3}=1,s\in S\}$.

For $r$ and $l$, these seed points are used to determine whether a local window $w(s,r)$ simultaneously contains pixels from ${E}^{1}$, ${E}^{2}$, ${E}^{3}$, and ${E}^{4}$ and whether pixels of each kind are not less than $l$. Because a building is spatially adjacent to its shadow, they can be effectively detected together using a relative small patch of the given image. Hence, by setting $r$ to 2 for ${D}^{1}$, ${D}^{2}$, and ${D}^{3}$, we use the local window $w(s,r=2)$ as the small patch to select seed points in the following. At the same time, if there are buildings and their shadows in the small patch, there will be at least one pixel labeled 1 in the patch for each ${e}_{s}^{i}$, $i=1$, 2, 3, 4. Therefore, $l$ is set to 1. It means that only a pixel which simultaneously possesses or neighbors ${E}^{1}$, ${E}^{2}$, ${E}^{3}$, and ${E}^{4}$ within the small local window $w(s,2)$ can be chosen as the seed point. An example is shown in Fig. 1(i). Note that one pixel would show different sizes of the Earth’s surface in remote sensing images with various spatial resolutions, which may affect the setting of parameter $r$. Namely, $r$ can be set to 1 for the low-spatial resolution remote sensing images and be set larger than 2 for extreme high-spatial resolution remote sensing images.

## 3.

## MRF-Based Region Growing

Based on extracted seed points, a MRF-based region-growing criterion is proposed in this section. First, the MRF model is briefly reviewed. Then, the proposed criterion for urban detection is introduced.

## 3.1.

### MRF Model

Let $X=\{{X}_{{R}_{i}}|{R}_{i}\in \mathit{R}\}$ be the label random field defined on the over-segmented region set $R$. We use 1 to flag urban areas and 0 to flag nonurban areas, and each random variable ${X}_{{R}_{i}}$ takes a value of 1 or 0 to represent the label of region ${R}_{i}$ it belongs to. If $\mathit{x}=\{{x}_{{R}_{i}}|{R}_{i}\in \mathit{R}\}$ denotes the realization of $\mathit{X}$, the optimal realization $\widehat{\mathit{x}}$ can be obtained by maximizing the posterior probability, i.e.,

## (4)

$$\widehat{x}=\underset{x}{\mathrm{argmax}}\text{\hspace{0.17em}}P(\mathit{X}|\mathit{Y})\phantom{\rule{0ex}{0ex}}=\underset{x}{\mathrm{argmax}}\text{\hspace{0.17em}}P(\mathit{Y}|\mathit{X})\xb7P(\mathit{X}).$$The energy form of Eq. (4) is

## (5)

$$\widehat{\mathit{x}}=\underset{x}{\mathrm{argmin}}\{-\mathrm{log}[P(\mathit{Y}|\mathit{X})]-\mathrm{log}[P(\mathit{X})]\}.$$In Eq. (5), the likelihood function $P(\mathit{Y}|\mathit{X})$ is used to describe image features. In this paper, we assume that all ${Y}_{{R}_{i}}$ of $\mathit{Y}$ are independent given labels. That is

The distribution of random field $P(\mathit{X})$ is assumed to be of the Markovianity property, i.e.,

Therefore, Eq. (5) can be rewritten as

## (6)

$$\widehat{x}=\underset{x}{\mathrm{argmin}}\{\sum _{{R}_{i}\in R}[-\mathrm{log}\text{\hspace{0.17em}}P({Y}_{{R}_{i}}|{X}_{{R}_{i}})-\mathrm{log}\text{\hspace{0.17em}}P({X}_{{R}_{i}})]\}.$$Due to the complexity caused by interactions among labels, it is difficult to find the solution of the MRF model. Hence, the local optimal solution $\widehat{\mathit{x}}=({\widehat{x}}_{{R}_{i}})$ can be obtained as follows:

## (7)

$${\widehat{x}}_{{R}_{i}}=\underset{{x}_{{R}_{i}}}{\mathrm{argmin}}[-\mathrm{log}\text{\hspace{0.17em}}P({Y}_{{R}_{i}}|{X}_{{R}_{i}})-\mathrm{log}\text{\hspace{0.17em}}P({X}_{{R}_{i}})]\phantom{\rule{0ex}{0ex}}=\underset{{x}_{{R}_{i}}}{\mathrm{argmin}}[{E}_{f}({R}_{i})+{E}_{l}({R}_{i})],$$## 3.2.

### MRF-Based Region Growing

In this section, an MRF-based region-growing criterion is introduced to find the optimal realization $\widehat{\mathit{x}}$. To minimize the total energy of the MRF model, the proposed method will iteratively merge adjacent regions that could decrease the total energy. Namely, for neighboring regions ${R}_{i}$ and ${R}_{t}$, the total changed energy $E({R}_{i},{R}_{t})$ is first calculated these two regions are merged. Based on Eq. (7), $E({R}_{i},{R}_{t})$ equals the sum of the changed likelihood energy ${E}_{f}({R}_{i},{R}_{t})$ and the changed label energy ${E}_{l}({R}_{i},{R}_{t})$, i.e.,

Here,

## (9)

$${E}_{f}({R}_{i},{R}_{t})={E}_{f}({R}_{i}\cup {R}_{t})-{E}_{f}({R}_{i})-{E}_{f}({R}_{t})\phantom{\rule{0ex}{0ex}}=|{R}_{i}|\xb7{[M({R}_{i})-M({R}_{i}\cup {R}_{t})]}^{2}+|{R}_{t}|\xb7{[M({R}_{t})-M({R}_{i}\cup {R}_{t})]}^{2},$$## Algorithm 1

Input: the observed image. |

Output: urban detection result. |

1) Set a threshold T. |

2) If there exists a region Ri satisfying |Ri|<T and xRi=0, select Ri and go to step 3; else, stop. |

3) For Ri and its neighbor region Rt, based on Eqs. (8–10), calculate the total changed energy E(Ri,Rt). |

4) Find the region Ri* that has the minimum energy value, i.e., Ri*=argminRt,t∈Ni E(Ri,Rt). |

Merge Ri and Ri* as a new region labeled xRi*, then go to step 2. |

The proposed criterion is different from traditional region-growing methods, as it does not begin from seed points but from nonseed points. We only consider the nonurban regions labeled 0 and their region sizes are less than the threshold. For each selected region ${R}_{i}$, the energy values are calculated between ${R}_{i}$ and its neighbor regions, respectively. Then, ${R}_{i}$ is merged with the one neighbor region that has the minimum energy value. Hence, ${R}_{i}$ merged with an urban region will lead to a larger urban region; in contrast, ${R}_{i}$ merged with a nonurban region will result a new nonurban region. Therefore, the rule of our approach is a competition rule of region growing for both urban and nonurban regions.

Urban areas can be extracted using the region-growing criterion. Namely, urban areas are first initialized using the label field $x=\{{x}_{{R}_{i}}|{R}_{i}\in R\}$ based on seed points $D$, i.e., set ${x}_{{R}_{i}}=1$ if ${R}_{i}\cap D\ne \varnothing $; or else, set ${x}_{{R}_{i}}=0$. Then, by increasing the thresholds, the growing criterion gradually updates the urban areas. Note that different sun angles may affect the shadow length and direction, but it does not change the spatial topological relationship between buildings and their shadows. Hence, the proposed method is robust for effective detection of varying urban areas contained in different remote sensing images.

## 3.3.

### Parameter Setting

There are two parameters in the MRF-based region-growing criterion, i.e., $\beta $ and $T$. The potential parameter $\beta $ is used to balance the influence between ${E}_{f}({R}_{i},{R}_{t})$ and ${E}_{l}({R}_{i},{R}_{t})$. A high $\beta $ value emphasizes ${E}_{l}({R}_{i},{R}_{t})$ and leads to results with large homogeneous objects. On the contrary, a low $\beta $ value emphasizes ${E}_{f}({R}_{i},{R}_{t})$ and is suitable for getting results with many details. Hence, $\beta $ should select different values for various applications. However, as the relationship between urban and nonurban areas is quite stable, $\beta $ is fixed and is empirically set as 0.05 for simplifying the parameter setting.

The threshold $T$ is used to control the process of region growing. By gradually increasing $T$, small regions labeled nonurban are merged into larger urban regions or nonurban regions, then urban areas are extracted. In practice, we used $T=25$ as the initial threshold and doubled the threshold each time. The final termination threshold was determined by the change of the spectral variance. The assumptions supporting this threshold selection are that urban areas consist of various subobjects and their spectral variance should be large; if the nonurban areas are wrongly recognized as urban areas, an abrupt change of the spectral variance should be observed. Here, we use $\mathrm{CR}(i,i+1)$ to show the change rate of spectral variances, i.e.,

where $\mathrm{Std}\_T(i)$ denotes the standard deviation of detected urban areas with termination threshold $T=T(i)$. Then, we can take the inflection point of $\mathrm{CR}(i,i+1)$ as the final termination threshold, after which $\mathrm{CR}(i,i+1)$ will abruptly decrease. An example is shown in Fig. 2, where we use $T=[25,50,100,200,400,800,1600]$ as the candidates of termination thresholds. Some extracted urban areas are illustrated in Figs. 2(a)–2(g). $\mathrm{Std}\_T(i)$ with different Ts is calculated and given in Fig. 2(h), where the corresponding $\mathrm{CR}(i,i+1)$s are also shown in Fig. 2(i). As $\mathrm{CR}(\mathrm{200,400})$ is an inflection point, we take $T=400$ as the final termination threshold for this experiment and Fig. 2(e) shows the corresponding detection result.## 4.

## Experiments

The MRF-based region-growing method provides an unsupervised way for the monitoring of urban areas. With the aim of fully evaluating the performance of the proposed method, experiments and comparisons were carried on two groups of images, i.e., aerial images (Sec. 4.1) and SPOT5 images (Sec. 4.2).

## 4.1.

### Experiments of Aerial Images

In this experiment, three aerial images, as shown in Fig. 3, are used to test our method and other urban extraction methods. These aerial images were acquired in 2009 and are located in Taizhou City, China. The three images have the same size of $500\times 500$, and the spatial resolution is 0.4 m. The test images contain plane agriculture fields and small villages, where urban objects show various spectral appearances and some nonurban objects are similar to seed points in terms of spectral characteristics. This makes urban detection challenging. Moreover, the following competitive methods are also considered for comparison:

1. The traditional region-growing method:

^{28}it detects urban areas without employing the MRF model.2. The classical MRF model:

^{29}it uses the generated probabilistic model at the pixel level to obtain results.3. The object-based MRF (OMRF) model:

^{25}it extends the MRF model from the pixel level to the object level for capturing the macrotexture pattern of a given image; this uses initial over-segmented regions to build the region adjacency graph (RAG) and defines the MRF model on the RAG to realize the segmentation.4. The two-class support vector machine (SVM):

^{30}it is provided by ENVI software, which is a commonly used classification approach with training data.5. The object-based SVM:

^{22}it extracts the regional features from a hierarchical tree of the scene and obtains a classification using the SVM classifier.

For the sake of fairness, we chose the same seed points to train the urban areas for the traditional region-growing method and the two SVM methods and deliberately selected samples to train the nonurban areas for these SVM methods as well. We also tuned the parameters of these methods to get their optimal performances. For the traditional region-growing method, we chose the threshold parameter following the instructions in the literature.^{28} For the two-class SVM, we set the radial basis function as the kernel type, the gamma in kernel function as 0.33, and the penalty function as 100, respectively. For the object-based SVM, we use 0.1% as the ratio of training samples based on the literature.^{22} Therefore, the comparison can demonstrate the difference between our model and other state-of-the-art methods.

Experimental results of aerial images are shown in Fig. 3. Here, the caption of Fig. 3 consists of two parts, where the first part using the alphabetical order denotes different test images and the second part using the number order denotes different detection methods. Detected urban objects are represented as yellow masks over the test images. From the comparative test, one can see that the proposed method exhibits a remarkable improvement for urban detection. Namely, the traditional region-growing method, as shown in Figs. 3(a2)–3(c2), still has huge misclassifications which belong to different object categories and have similarity spectral appearances. The main reason is that the traditional region-growing method only uses the spectral features which do not consider the spatial constraint. By employing the spatial context information, the classical MRF model has less misclassification of nonurban areas. However, this pixel-level generate model can just recognize the parts of the urban areas with similar appearances, since it cannot model the complex and macropatterns by incorporating the long-range interactions. It also wrongly labels some urban objects as nonurban, such as the roofs of buildings and vegetation. The OMRF model utilizes the regions to describe the macrospatial constraints and improves the classical MRF model, e.g., Figs. 3(a4) and 3(c4). However, the OMRF model usually leaves the characteristic of urban areas out of consideration, which may lead to some undesirable results such as Fig. 3(b4). The SVM method trains data to obtain urban areas. Although it can effectively recognize buildings, urban vegetation objects are sometimes classified as nonurban areas because of the lack of spatial information. The object-based SVM improves the pixel-based SVM and gets results that are more consistent by considering the object semantic information with regional features. Nevertheless, it still cannot sufficiently use spatial information whose results have some misclassifications. Compared with these methods, our MRF-based region-growing method first considers the urban characteristics when we select seed points, then employs the MRF defined on the region level to capture regional spatial constraints, and finally proposes a corresponding region-growing criterion that utilizes these features to detect urban areas. Hence, our method demonstrates a better performance than the other methods.

Experimental results are quantitatively evaluated by the overall accuracy (OA) and kappa coefficient $\kappa $. OA and $\kappa $ are the two indicators that measure the degree of similarity between two images.^{31} If ${P}_{ij}$ is the proportion of subjects that were assigned to the $i$’th class by the first image and the $j$’th class by the second image and denotes ${P}_{i\u2022}=\sum _{j=1}^{k}{P}_{ij}$ and ${P}_{\u2022j}=\sum _{i=1}^{k}{P}_{ij}$, then

The OA and $\kappa $ of aerial images are given in Table 1.

## Table 1

Comparison of results.

Fig. 3(a) | Fig. 3(b) | Fig. 3(c) | ||||
---|---|---|---|---|---|---|

κ | OA | κ | OA | κ | OA | |

Traditional region growing | 0.379 | 0.591 | 0.398 | 0.604 | 0.489 | 0.648 |

Classical MRF | 0.778 | 0.913 | 0.460 | 0.684 | 0.615 | 0.758 |

OMRF | 0.883 | 0.953 | 0.663 | 0.803 | 0.770 | 0.863 |

Two-class SVM | 0.806 | 0.923 | 0.617 | 0.770 | 0.683 | 0.796 |

Object-based SVM | 0.911 | 0.966 | 0.740 | 0.850 | 0.832 | 0.905 |

MRF-based region growing | 0.914 | 0.967 | 0.902 | 0.952 | 0.886 | 0.938 |

Note: For each column, the bold value denotes the best index among all the indexes in this column.

From these quantitative indexes, we know that MRF-based region growing can enhance both the OA and kappa for each experimental image. This also shows that our method extracts a better scope of urban areas than do the other methods. In particular, when the topographic features are complex, the enhancement of indices is obvious. For clarity, the quantitative indices of Table 1 are illustrated in Fig. 4.

## 4.2.

### Experiments of SPOT5 Images

The effectiveness of the proposed method is further tested in this section. Two SPOT 5 remote sensing images, as shown in Fig. 5, are employed for the next experiment. These test images are located on the Pingshuo area of China. Both sizes are $438\times 438$. These test images mainly consist of three object types, i.e., urban areas, cultivated land, and woodland. Among them, urban green space and woodland and urban building and cultivated land have similar spectral appearances, respectively. This phenomenon increases the difficulty of urban detection.

Experiments of SOPT5 images are illustrated in Fig. 5. Compared with the ground truth, the MRF-based region-growing method performs well and the results are close to the ground truth. This demonstrates that our model can effectively extract urban areas from different datasets.

## 5.

## Conclusions

To summarize, we proposed an unsupervised urban detection method by unifying the region-growing method and the MRF model. It first uses the granularity information and spectral features to automatically extract some typical urban objects as the seed points, which can be treated as the skeleton for the urban areas. Then, the MRF is employed to model the spatial relationships between urban seed points and other urban objects. At last, the region-growing criterion uses these relationships to recognize urban nonseed objects, which will lead to consistent results. The main novelty of the method the automatic extraction of urban seed points and the detection of urban areas using a region-growing criterion under the regional MRF-based spatial constraints. The effectiveness of the proposed method is validated by experimental results obtained from various high-spatial resolution remote sensing images. Compared to a traditional region-growing method, the classical and object-based MRF models, and the common and object-based SVM, our method can provide more precise and more meaningful results, which verifies that our method is suitable to detect urban areas. However, this method is only proper for urban detection. If it is used to extract other terrestrial objects, then one has to design a new seed extraction method and modify the region-growing criterion.

For the method presented, the potential parameter $\beta $ need to be empirically set. If this parameter can be estimated in an adaptive way, then it will improve the current method.

## Acknowledgments

The authors are very grateful to the editor and the anonymous referees for comments and suggestions, which led to the present improved version of the manuscript. This work is supported jointly by the National Natural Science Foundation of China, under Grants 41301470, 41001286, 41101425, and 41001251, and the basic research funds for the provincial universities. The authors would like to thank Associate Prof. Tiancan Mei, Wuhan University, China, for kindly providing aerial images.

## References

J. A. Benediktsson, M. Pesaresi and K. Arnason, “Classification and feature extraction for remote sensing images from urban areas based on morphological transformations,” IEEE Trans. Geosci. Remote Sens. 41(9), 1940–1949 (2003).IGRSD20196-2892http://dx.doi.org/10.1109/TGRS.2003.814625Google Scholar

D. Lu et al., “Detection of urban expansion in an urban-rural landscape with multitemporal QuickBird images,” J. Appl. Remote Sens. 4(1), 041880 (2010).1931-3195http://dx.doi.org/10.1117/1.3501124Google Scholar

P. Gamba, M. Aldrighi and M. Stasolla, “Robust extraction of urban area extents in HR and VHR SAR images,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 4(1), 27–34 (2011).IGRSD20196-2892http://dx.doi.org/10.1109/JSTARS.2010.2052023Google Scholar

X. Huang, L. Zhang and P. Li, “Classification and extraction of spatial features in urban areas using high-resolution multispectral imagery,” IEEE Geosci. Remote Sens. Lett. 4(2), 260–264 (2007).IGRSBY1545-598Xhttp://dx.doi.org/10.1109/LGRS.2006.890540Google Scholar

C. Corbane et al., “Comparative study on the performance of multiparameter SAR data for operational urban areas extraction using textural features,” IEEE Geosci. Remote Sens. Lett. 6(4), 728–732 (2009).IGRSBY1545-598Xhttp://dx.doi.org/10.1109/LGRS.2009.2024225Google Scholar

P. Gamba, B. Houshmand and M. Saccani, “Detection and extraction of buildings from interferometric SAR data,” IEEE Trans. Geosci. Remote Sens. 38(1), 611–617 (2000).IGRSD20196-2892http://dx.doi.org/10.1109/36.823956Google Scholar

B. Sirmacek and C. Ünsalan, “Urban-area and building detection using SIFT keypoints and graph theory,” IEEE Trans. Geosci. Remote Sens. 47(4), 1156–1167 (2009).IGRSD20196-2892http://dx.doi.org/10.1109/TGRS.2008.2008440Google Scholar

C. Chen and L. Chang, “Rapid change detection of land use in urban regions with the aid of pseudo-variant features,” J. Appl. Remote Sens. 6(1), 063574 (2012).1931-3195http://dx.doi.org/10.1117/1.JRS.6.063574Google Scholar

P. C. Smits and A. Annoni, “Updating land-cover maps by using texture information from very high-resolution space-borne imagery,” IEEE Trans. Geosci. Remote Sens. 37(3), 1244–1254 (1999).IGRSD20196-2892http://dx.doi.org/10.1109/36.763282Google Scholar

S. Yu, M. Berthod and G. Giraudon, “Toward robust analysis of satellite images using map information—application to urban area detection,” IEEE Trans. Geosci. Remote Sens. 37(4), 1925–1939 (1999).IGRSD20196-2892http://dx.doi.org/10.1109/36.774705Google Scholar

G. Rellier et al., “Texture feature analysis using a Gauss-Markov model in hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens. 42(7), 1543–1551 (2004).IGRSD20196-2892http://dx.doi.org/10.1109/TGRS.2004.830170Google Scholar

L. Weizman and J. Goldberger, “Urban-area segmentation using visual words,” IEEE Geosci. Remote Sens. Lett. 6(3), 388–392 (2009).IGRSBY1545-598Xhttp://dx.doi.org/10.1109/LGRS.2009.2014400Google Scholar

B. Sirmacek and C. Ünsalan, “Urban area detection using local feature points and spatial voting,” IEEE Geosci. Remote Sens. Lett. 7(1), 146–150 (2010).IGRSBY1545-598Xhttp://dx.doi.org/10.1109/LGRS.2009.2028744Google Scholar

M. Kajimoto and J. Susaki, “Urban-area extraction from polarimetric SAR images using polarization orientation angle,” IEEE Geosci. Remote Sens. Lett. 10(2), 337–341 (2013).IGRSBY1545-598Xhttp://dx.doi.org/10.1109/LGRS.2012.2207085Google Scholar

Y. Liu et al., “Urban area extraction from polarimetric SAR imagery using only positive samples,” in ICSP Proc., pp. 2332–2335, IEEE (2010).http://dx.doi.org/10.1109/ICOSP.2010.5655181Google Scholar

A. Thiele et al., “Building recognition from multi-aspect high-resolution in SAR data in urban areas,” IEEE Trans. Geosci. Remote Sens. 45(11), 3583–3593 (2007).IGRSD20196-2892http://dx.doi.org/10.1109/TGRS.2007.898440Google Scholar

S. Hinz and A. Baumgartner, “Automatic extraction of urban road networks from multi-view aerial imagery,” ISPRS J. Photogramm. Remote Sens. 58(1–2), 83–98 (2003).IRSEE90924-2716http://dx.doi.org/10.1016/S0924-2716(03)00019-4Google Scholar

Y. He, H. Wang and B. Zhang, “Color-based road detection in urban traffic scenes,” IEEE Trans. Intell. Transp. Syst. 5(4), 309–318 (2004).1524-9050http://dx.doi.org/10.1109/TITS.2004.838221Google Scholar

J. Hu et al., “Road network extraction and intersection detection from aerial images by tracking road footprints,” IEEE Trans. Geosci. Remote Sens. 45(12), 4144–4157 (2007).IGRSD20196-2892http://dx.doi.org/10.1109/TGRS.2007.906107Google Scholar

T. Jan, L. Tobia and H. Patrick, “Urban vegetation classification: benefits of multitemporal Rapid Eye satellite data,” Remote Sens. Environ. 136(9), 66–75 (2013).RSEEA70034-4257http://dx.doi.org/10.1016/j.rse.2013.05.001Google Scholar

I. Sebari and D. He, “Automatic fuzzy object-based analysis of VHSR images for urban objects extraction,” ISPRS J. Photogramm. Remote Sens. 79(5), 171–184 (2013).IRSEE90924-2716http://dx.doi.org/10.1016/j.isprsjprs.2013.02.006Google Scholar

L. Wang et al., “Adaptive regional feature extraction for very high spatial resolution image classification,” J. Appl. Remote Sens. 6(1), 061708 (2012).1931-3195http://dx.doi.org/10.1117/1.JRS.6.061708Google Scholar

Y. Wu et al., “Region-based classification of polarimetric SAR images using Wishart MRF,” IEEE Geosci. Remote Sens. Lett. 5(4), 668–672 (2008).IGRSBY1545-598Xhttp://dx.doi.org/10.1109/LGRS.2008.2002024Google Scholar

B. Zhang et al., “Region-based classification by combining MS segmentation and MRF for POLSAR images,” J. Syst. Eng. Electron. 24(3), 400–409 (2013).Google Scholar

X. Wang and X. Zhang, “A new localized superpixel Markov field for image segmentation,” in Proc. IEEE Conf. Multimedia and Expo, pp. 642–645, IEEE (2009).http://dx.doi.org/10.1109/ICME.2009.5202578Google Scholar

S. Z. Li, Markov Random Field Modeling in Computer Vision, 3rd ed., Springer-Verlag, New York (2009).Google Scholar

D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002).ITPIDJ0162-8828http://dx.doi.org/10.1109/34.1000236Google Scholar

R. C. Gonzalez, R. E. Woods and S. L. Eddins, Digital Image Processing Using MATLAB, Pearson Prentice Hall, Upper Saddle River, New Jersey (2003).Google Scholar

J. Besag, “On the statistical analysis of dirty pictures,” J. R. Stat. Soc. B 48(3), 259–302 (1986).JSTBAJ0035-9246Google Scholar

C. Cortes and V. Vapnik, Support-Vector Networks, Machine Learning, Springer-Verlag, New York (1995).Google Scholar

R. Unnikrishnan and M. Hebert, “Measure of similarity,” in Seventh IEEE Workshop on Application of Computer Vision, pp. 394–394 (2005).http://dx.doi.org/10.1109/ACVMOT.2005.71Google Scholar

## Biography

**Chen Zheng** is currently an assistant professor at the School of Mathematics and Information Sciences, Henan University. He received his BS degree in mathematics (information sciences) from Henan University in 2007 and his MS and PhD degrees in statistics and image processing of remote sensing from Wuhan University, in 2009 and 2012, respectively. His current research interests include various topics in remote sensing and image processing.

**Leiguang Wang** received his PhD degree in the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS) from Wuhan University in 2009. Since 2012, he has been an associate professor with Southwest Forestry University, Kunming, China. He is the author of more than 10 articles. His research interests include remote sensing image segmentation and pattern recognition.