Sparsity-guided saliency detection for remote sensing images

Abstract. Traditional saliency detection can effectively detect possible objects using an attentional mechanism instead of automatic object detection, and thus is widely used in natural scene detection. However, it may fail to extract salient objects accurately from remote sensing images, which have their own characteristics such as large data volumes, multiple resolutions, illumination variation, and complex texture structure. We propose a sparsity-guided saliency detection model for remote sensing images that uses a sparse representation to obtain the high-level global and background cues for saliency map integration. Specifically, it first uses pixel-level global cues and background prior information to construct two dictionaries that are used to characterize the global and background properties of remote sensing images. It then employs a sparse representation for the high-level cues. Finally, a Bayesian formula is applied to integrate the saliency maps generated by both types of high-level cues. Experimental results on remote sensing image datasets that include various objects under complex conditions demonstrate the effectiveness and feasibility of the proposed method.


Introduction
Object detection in remote sensing images is of vital importance and has great potential in many fields such as navigation reconnaissance, autonomous navigation, scene understanding, geological survey, and precision-guided systems.Remote sensing images are captured by sensors on an airplane or other aircraft as an aerial view under various luminance and viewing angle conditions.In contrast to natural scene images taken from the ground, remote sensing images have more complex backgrounds (e.g., forests, lakes, sand, roads, and lawns) that sometimes share similar characteristics with the interesting objects.In addition, remote sensing images with downlooking or front-downward views are more likely to be disturbed by noise, luminance fluctuation, fog, cloud cover, and blur caused by flight vibration.Therefore, it is difficult and time-consuming to precisely and quickly extract objects from complex backgrounds in practical applications.][9] There are two main types of models for saliency detection: data-driven bottom-up models [10][11][12][13][14][15][16][17][18][19][20][21][22]23,24 and task-driven top-down models. 25 Th bottom-up model has shown that low-level cues (e.g., frequency 26,27 and contrast 10,11,[13][14][15][16][17][18][19][20]22,28,29,30,31 ) are quite useful for saliency detection.Itti et al. 10 exploited the contrast of the center and its surroundings at multiple scales with multiple features to detect salient regions in an image. Brce and Tsotsos 11 extracted the local Shannon's self-information to generate the saliency map.Color contrast (e.g., RGB or LAB) 10,[13][14][15][16][17][18][19][20]22,28,29,30,31,32 has been utilized to form low-level cues, and many studies 7,[15][16][17][18]20,22,28,31 have shown that the LAB color space is more suitable for human visual perception.8]22 exploiting foreground and background priors have proven to be efficient.In particular, the extraction of background information [16][17][18] provides a background template and achieves unsupervised saliency detection.Despite all this, models employing only low-level cues fail to generate object-level saliency maps. Todiscover more effective cues for detecting salient regions, high-level saliency cues have been investigated.Shen and Wu 19 designed a unified model based on low-rank matrix recovery to obtain the saliency map.Margolin et al. 20 computed saliency by exploiting the reconstruction error of the principle component analysis to analyze the distinctness of a region.Xie et al.21 proposed a Bayesian model via low and midlevel cues to produce a saliency map.Borji and Itti 22 detected the salient regions by calculating local and global patch rarities after reconstructing the image using a sparse representation. Lit al. 18 achieved efficient saliency maps with dense and sparse reconstruction errors.In contrast to low-level cues, these high-level cues can generate a better saliency detection performance.Some researchers tend to combine existing saliency models to detect saliency.Sun et al. 1 employed a combination of edge-and graph-based visual saliency models by fusing two saliency maps to detect salient regions in remote sensing images.Zhang and Yang 6 proposed a method based on frequency domain analysis and salient region detection to extract salient regions.However, the methods that fuse two saliency maps generated by different saliency models can easily lead to a less effective performance of saliency detection in remote sensing images because of the complex and abundant image content.Consequently, it is important to seek new cues that effectively predict salient regions where candidate objects are likely to exist in remote sensing images.
Because the objects in remote sensing images are different from complex backgrounds in the visible spectrum, we attempt to discover persuasive cues to extract salient regions from complex backgrounds.In this paper, we propose a sparsity-guided saliency model (SGSM) that combines global cues with background priors for saliency detection in remote sensing images.Our proposed model takes a sparse representation approach by measuring the relationship between image patches and a dictionary to generate an objective saliency map.This method exploits a sparse representation to produce high-level cues via global-based and background-based dictionaries.These two dictionaries are, respectively, obtained by low-level cues based on global cues and the background prior, and they contain the category information (i.e., object or background).Hence, high-level cues can reveal the intrinsic similarity of images and determine the categories of patches.Using the patch category information, the saliency map is obtained by a clustering algorithm.As there are no benchmark datasets for saliency detection in remote sensing images, we constructed two datasets to validate the efficiency of our proposed model.The images in the datasets contain various objects (e.g., house or vehicle) captured by Google Earth under varying conditions.The single-object dataset (SOD) contains 500 images of a single object, while the multiple-object dataset (MOD) contains 1000 images of multiple objects.
The remainder of this paper is organized as follows: Sec. 2 demonstrates the theory and motivation of our proposed model first and then illustrates the specific implementation of the proposed model.In Sec. 3, the experimental results and analysis are shown.Finally, Sec. 4 provides the conclusion.

Sparsity-Guided Saliency Model
This section presents the theoretical basis of SGSM in detail.
First, we provide the general theory that is necessary to understand our proposed model.SGSM exploits a combination of global cues and background prior information to provide global and background information, respectively.With the global cues, the false positive detection of regions that contain candidate objects can be avoided, especially when these regions are similar to the background.In addition, by using the background prior information, regions that are different from the background stand out.The low-level cues based on global cues and background priors are, respectively, clustered into global-based and background-based dictionaries.These two dictionaries separately contain the category information (i.e., object or background) of global and background cues.Based on these two low-level dictionaries, high-level cues are generated using a sparse representation.Finally, these high-level cues are clustered to obtain a saliency map.The overall procedure is presented in Fig. 1.

Low-level Feature Description via Global Cues and Background Prior
In order to determine the visual uniqueness of image regions, we decompose the image into nonoverlapping patches of uniform size.Because the LAB color space 7,15,[16][17][18]20,22,28,31 corresponds more closely to human vision, we chose it for the low-level representation. Generally, lobal information comes from global cues, and background information stems from the background prior.According to the background prior assumptions 17,18 that salient objects usually appear in the center of the image and the boundaries are mostly background, we use boundary-based cues to extract background information.
Given a color image I of size T ¼ W × H (W and H are, respectively, the image width and height), we first divide it into nonoverlapping patches of size T p ¼ P × Q such that the whole image contains tðt ¼ T∕T P Þ patches.There are then n ¼ 2ðW∕P þ H∕QÞ − 4 patches at the four boundaries to form the background set.For the i'th (1 ≤ i ≤ t) patch containing T p pixels, the values of all pixels in the three LAB channels form the rows of matrix G lab ðiÞ, and the pixel values of the j'th (1 ≤ j ≤ n) patch in the background set form the rows of matrix B lab ðjÞ.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 7 3 5 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 6 7 0 Furthermore, all t patches form the global information set G lab , and all n patches at the four boundaries of the image form the background information set B lab .E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 6 0 0 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 5 3 8 The two matrices G lab and B lab are clustered into global-based dictionary D Global and background-based dictionary D Background , respectively, using K-means with clustering number K D .These two dictionaries, respectively, contain the global and background information.The details of this procedure are illustrated in part I of Fig. 1.

High-Level Feature Transformation Using a Sparse Representation
Sparse representation 22,33,34,35,30 has been a focus of research in the area of computer vision and pattern recognition.Based on a dictionary consisting of a set of bases, sparse representation can represent an image by a sparse coefficient vector.A nonzero element in the vector reflects the correlation between the image and the bases in the dictionary.As we divide the image into patches, the sparse coefficients of each patch can be learned by sparse representation.We choose one group of sparse coefficients to express the patch-dictionary relationship by max pooling the T p groups of sparse coefficients in every patch.These sparse coefficient vectors are used to compute the patch categories.
Concretely, we represent the image using a sparse representation by minimizing the l 1 -norm using a given dictionary.Every patch in the global-based set G lab can be represented by the corresponding global coefficients α Global from the global-based dictionary D Global .Similarly, each patch in the background-based set B lab can be represented by the corresponding background coefficients α Background from the background-based dictionary D Background .This representation is shown as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 2 2 7 G lab ðiÞ ¼ D Global α Global ðiÞ B lab ðiÞ ¼ D Background α Background ðiÞ: (5) We then encode all the patches in image I by where β ≥ 0 is a tuning parameter.The sparse coefficients of all patches α Global and α Background are optimized using the least absolute shrinkage and selection operator (Lasso). 36

Sparse Representation-Based Saliency Computation
According to the background prior principle 16,18,22 mentioned in Sec.2.1, we assume that the edges of the image are generally background.We then obtain the patch object probability P Object by calculating the ratio of the patches confirmed as objects to all edge patches.Similarly, we obtain the patch background probability P Background by calculating the ratio of the patches confirmed as background to all edge patches.These probabilities, respectively, form the estimated maps EM Global and EM Background .According to the background prior, P Object should be less than P Background .Therefore, we define the parts with lower probability to be objects and the parts with higher probability to be background.We then form a binary object map BMðiÞ of the clustered pixel patches defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 5 4 3 BMðiÞ ¼ 1 P Object < P Background 0 Otherwise ; i¼ 1;2; : : : ; t: The mean values of the sparse coefficient vectors after pooling show the degree of the patchdictionary relationship.If the patches are similar, their pooling coefficients are analogous, and the mean values of the sparse coefficients indicate only slight differences.We then define the mean values of sparse coefficients after pooling to be the saliency scores of the patches.A labeled map SðzÞ is obtained by returning saliency scores to the corresponding patches if they are confirmed as objects E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 4 2 5 B M ðiÞ ¼ 0 ; i ¼ 1;2; : : : ; t; z ¼ 1;2; : : : ; T; where α max i denotes the sparse coefficients of the i'th patch after max pooling.The primary saliency maps S Global ðzÞ and S Background ðzÞ are, respectively, obtained from α Global and α Background according to Eq. ( 8).This high-level feature transformation is illustrated in part III of Fig. 1.

Saliency Map Integration
Because remote sensing images are captured by sensors in aircraft, there is no certainty regarding the location of the objects in the images.Therefore, an object-biased Gaussian model 18 is more suitable than a center-biased Gaussian model 22 for erasing interference.Finally, we employ a Bayesian formula to integrate primary saliency maps S Global ðzÞ and S Background ðzÞ using posterior probability.

Object-biased Gaussian smoothing
We employ object-biased Gaussian smoothing to erase the interference judged to be noise.Borji and Itti 22 noted that a center-bias exists in some saliency detection datasets and hence removes noise by the Gaussian model E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 1 6 2 where σ x and σ y denote the covariances, ðx; yÞ denotes the coordinates of the object center, and ðx z ; y z Þ are the coordinates of any pixel in the map, where x ¼ 0 and y ¼ 0 indicate the image center.Li et al. 18 refined the model to be object-biased with dense and sparse reconstruction errors.In this paper, we adopt patch labels from Eq. ( 7) instead of dense and sparse reconstruction errors to determine a more accurate object center.We set the coordinates ðx; yÞ of the object center to be the position determined using the labels of the image region as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 7 1 1 8 < : An object-biased Gaussian model is generated using Eq. ( 9) with coordinates ðx; yÞ in Eq. (10).The final result S is a convolution of the primary saliency map SðzÞ and refined object-biased Gaussian model GðzÞ.We refine global-based saliency map (G-map) S Global and background-based saliency map (B-map) S Background via this object-biased Gaussian model with its more accurate object centers.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 5 9 1 S Global ¼ GðzÞ Ã S Global ðzÞ; S Background ¼ GðzÞ Ã S Background ðzÞ:

Bayesian integration
As illustrated in Ref. 18, an effective saliency map is obtained by the Bayesian integration of two given saliency maps.Bayes' formula states that E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 1 1 6 ; 4 9 7 where pðFÞ is the prior probability, namely, the saliency map pðS map jFÞ is the probability of foreground for the whole saliency map, and pðS map jBÞ is the respective probability of background.
We utilize a global-based saliency map S Global or background-based saliency map S Background as the prior, and, respectively, either S Background or S Global is then used to compute the likelihood.Together, these maps determine the final saliency map S E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 1 1 6 ; 3 7 9 where F Global and F Background , respectively, denote the foreground segmented by the mean saliency value from S Global and S Background .The saliency map integration procedure is shown in part IV of Fig. 1.

Algorithm
The full SGSM algorithm consists of the following steps: Step 1: Divide input color image I into patches of size P × Q.
Step 2: Extract global information G lab and background information B lab from the three LAB channels and then, respectively, cluster them into dictionaries D Global and D Background using K-means with clustering number K D .
E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 .5 ; 1 1 6 ; 2 1 2 Step 3: Learn coefficients α Global and α Background using Eq. ( 2) via a sparse representation based on D Global and D Background .
Step 4: Cluster sparse coefficients after separately max pooling α max Global and α max Background into two categories by K-means to get estimated maps EM Global and EM Background .
Step 5: Compute the patch saliency values to get primary saliency maps S Global ðzÞ and S Background ðzÞ by Eqs. ( 7) and (8).
Step 6: Smooth S Global ðzÞ and S Background ðzÞ using an object-biased Gaussian model by Eq. (11)  to get S Global and S Background , respectively.Step 7: Obtain saliency map S by a Bayesian integration of S Global and S Background in Eq. ( 13).

Multiple Scales Integration
We obtained different results at different spatial scales for objects at different depths and of different sizes, hence we divided the input image into patches of size ðk Ã PÞ × ðk Ã QÞ at the k'th scale to generate the SGSM saliency map at that scale.Large patches contribute to the definition of properties for the image region, but they generate jagged edges because of the few pixels that do not have the same property as the majority of the pixels within that patch.The final saliency map was obtained by fusing the maps at the k scales as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 1 1 6 ; 6 4 4 where ε; ψ; • • • ; ϑ are the weights for different scales.We then normalized the saliency map S final to the range of [0,1] to obtain the final saliency map S final .

Experiments
This section presents the database used to validate the efficiency of our proposed method and evaluates it with respect to 10 other state-of-the-art methods.

Databases
SGSM aims to detect salient objects in remote sensing images that mainly contain houses and oil tanks.All images were collected from Google Earth and were captured under conditions of diverse illumination and various viewpoints.We collected images taken at heights of 300 to 2000 m, the resolution is about 0.4 to 1.9 m.It is important to ensure that detailed images can be captured.There are 500 images containing a single object and 1000 images containing multiple objects.Each group of images forms a database, respectively, called the SOD and MOD, and their corresponding binary ground truth GT is manually obtained.In remote sensing images, all kinds of interesting objects have different appearances and shapes, but the objects share a great deal in common with surrounding backgrounds in color, texture, and shape.Complicated backgrounds (such as forest, lakes, sand, roads, and lawns) and various conditions (including fog, shadow, and luminance fluctuation) easily lead to false detection.Sample images from the two datasets are shown in Fig. 2.

Experimental Setup
The database test images were resized to 400 × 400 pixels.For these experiments, we set the patch size P ¼ 2, Q ¼ 2, the first clustering number K D ¼ 10, the parameters σ x ¼ 100 and σ y ¼ 100 in Eq. ( 9).
We carried out the experiments to certify the efficiency of the combination of global cues and background prior, and the experimental results are detailed in Sec.3.2.1.We note that the selection of patch size affects the performance of SGSM, and there are different outputs at different scales.Hence, we employed multiple scales to produce a better saliency map.The selection of these multiple scales is based on the experimental results of Sec.3.2.2.

Combining global cues and background prior information
The global-based saliency map (G-map), background-based saliency map (B-map), and final saliency map (C-map) obtained by combining both maps were obtained for all 1500 images from the SOD and MOD.The performance of these three saliency maps is shown in Fig. 3(a), where it can be seen that the information selected to generate the dictionaries affects the results of saliency detection.The sparse coefficients computed by the global-and background-based dictionaries show different relationships among the same patches.The objects are easily confused with the background if they have sparse coefficients that are similar to it.In addition, sparse coefficients computed by the global-based dictionary interpret the relationship between image patches and all the categories that the image contains, while sparse coefficients computed by background-based dictionary interpret the relationship between image patches and the categories that the background contains.The C-map clearly generates the best results.From Fig. 3(b), we can see that the integration of global cues and background prior information results in better precision and recall (PR) values and detects salient regions more accurately and efficiently.

Selection of multiple scales
We can obtain k saliency maps with the procedure in Sec.2.5 in k scales, and chose the scale on the basis of experimental analysis.According to the results of different scales shown in Fig. 4, we chose k ¼ 2 to generate SGSM saliency maps at two scales in order to obtain an efficient and accurate saliency map.Furthermore, we set ε ¼ 0.2 and ψ ¼ 0.8 in Eq. ( 14).

Precision and recall curves and F-measure
We evaluated the results of our algorithm to a manually generated ground truth using the PR curve 28,37 and F-measure. 28,37Precision measures the ratio of correctly assigned salient pixels to all pixels of the extracted regions.Recall measures the percentage of detected salient pixels to the salient ground truth in the same image.A binary map is generated with the threshold T ∈ ½0;255 and then compared to the ground truth image to obtain the average PR values of all the images in the datasets to measure the overall performance.The F-measure is computed as the weighted harmonic of precision and recall and is defined as: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 5 ; 1 1 6 ; 4 5 4 We set β 2 to 0.3 for these experiments. 9,15,16,28

Mean absolute error
Similar to Ref. 28, we also evaluated the mean absolute error (MAE) between the binary ground truth GT and final saliency map S final to obtain a more balanced comparison.MAE is defined as: where W and H, respectively, denote the width and height of the saliency map and ground truth image.

Comparison with 10 State-of-the-Art Methods
We compared our proposed method (SGSM) with 10 state-of-art methods: dense and sparse reconstruction (DSR), 18 graph-based manifold ranking (GBMR), 16 global cues (GC), 14 a model of information maximization (AIM), 11 saliency-based visual attention (Itti), 10 frequency-tuned (FT), 27 histogram-based contrast (HC), 15 spatial attention model (LC), 8 spectral residual (SR), 26 and region-based contrast (RC). 15We find that it is very difficult for all these methods to exactly detect the saliency region in remote sensing images.Two experiments were performed to validate the efficiency of the proposed method.The first experiment detected a single salient object from the SOD, while the second group detected multiple salient objects from the MOD.The results of single object detection are illustrated in Figs. 5 and 6, and those of multiple object detection are illustrated in Figs.7 and 8. Figures 5 and 7 show 11 saliency models, and Figs.6 and 8 show the PR curves and F-measure values.Table 1 lists the MAE results for both the SOD and MOD.

Single salient object detection
Methods exploiting low-level cues such as AIM, Itti, FT, and SR tend to find the boundaries of the salient object.Methods employing global cues such as GC, RC, and LC are likely to mistake background noise as salient points.Methods based on background priors such as DSR and GBMR fail to accurately detect salient regions, specifically when the salient regions have a similar appearance to the background.Our method, exploiting both global cues and background prior information, produces a more precise saliency map.It can distinguish features when the object and background regions share the same appearance.The high-level cues of the patches, which are learned from the global and background dictionaries, can precisely reveal the category of the patches.Therefore, the categories of all patches can be obtained by the machine learning method.Figure 6 shows that our proposed SGSM can highlight the entire salient region of an object.Furthermore, it has a higher F-measure.It is superior to 10 state-of-art methods both in terms of integrity and accuracy of object segmentation.When the object has similar color and a different structure compared with the background, SGSM can detect the differences and highlight the corresponding regions.All kinds of interesting objects in the testing database have different colors, sizes, type attributes, and forms, which makes every detection task unique and difficult.Facing these complicated situations, our method can still acquire the better saliency detection results.From Table 1, we can see that our method is closer to the ground truth and reduces MAE by 24.44% with respect to the previous best method, GBMR.

Multiple salient object detection
In contrast to the detection of a single object, it is difficult to identify two or more objects with different colors and shapes in one image.Figure 7 shows that our model achieves the best results visually of all the saliency models.Methods exploiting low-level cues such as AIM, Itti, FT, and SR hardly detect the objects at all.Methods employing global cues such as GC, RC, and LC cannot generate accurate saliency maps because of noise interference.GBMR is unable to detect the objects if they have different appearances because of its dependence on ranking with queries, as it is likely to mistake objects with a lower ranking score as background.GC fails to detect objects that have an analogous appearance to the background because it relies on the color histogram.But our method can avoid these situations, because it stems from the machine learning theory.It can precisely categorize the patches though there are multiple salient objects in one image.
Figure 8 shows that our model also has better PR values and F-measures than the other 10 saliency methods.Because it ignores background patches judged to be salient parts by the others, the final saliency maps of SGSM are much closer to the ground truth.In comparison to single salient object detection, it is more difficult to detect multiple salient objects in one image because not only do the objects have different types and appearances, but a portion of object areas are similar to the background.In addition, the objects may be covered by the fog, sheltered by the trees, or interfered with by their own shadows.The test results demonstrate that our proposed method can weaken these interferences and precisely detect the edge of the multiple salient objects.Table 1 proves that our method has less error when detecting multiple objects, reducing the error by 52.11% with respect to the second-best method, GBMR.

Conclusion
In this paper, we proposed a sparsity-guided saliency detection method based on global cues and background prior information for remote sensing images.This method uses a sparse representation to obtain high-level global and background cues, and then integrates the saliency maps generated by both of these cues using a Bayesian formula.Consequently, SGSM not only considers the global and background properties of the image content, but also introduces a sparse representation for high-level cues.The proposed method was evaluated on a database of remote sensing images that contained diverse textures, structures, and complex conditions.Experimental results showed that our method outperforms 10 state-of-the-art saliency detection methods, yielding higher precision and better recall rates, in particular when multiple salient objects have analogous appearances.But our propose method is not very effective for lowresolution remote sensing images with fewer detail features.Furthermore, the problem of the time consumed problem also urgently needs to be resolved.In the next work, we intend to use enforcement learning or a deep learning algorithm to obtain more high-level cues and obtain fast and precise saliency detection results.
In addition, rather than performing a traversal search, quickly and accurately extracting some salient object regions can be useful for large data volumes of remote sensing images, which in turn will improve the object detection and recognition rate in cluttered scenes.Hence, our future work will also focus on how to automatically detect and recognize objects (e.g., houses and oil depots) based on SGSM.

Fig. 3
Fig. 3 Comparison of G-map, B-map, and C-map: (a) saliency maps computed from different clustering dictionaries and (b) average precision and recall (PR) curves of 1500 images from the SOD and MOD.

Fig. 5
Fig.5Saliency maps of the proposed method and 10 state-of-the-art methods for SOD images.

Fig. 4
Fig. 4 Comparison of saliency maps at different scales: (a) visual results of four scales from SOD and MOD and (b) average PR curves in four scales and the combination of multiple scales.

Fig. 6
Fig. 6 Performance of the proposed method and 10 state-of-the-art methods: (a) average PR curves and (b) F -measures.

Fig. 7
Fig. 7 Saliency maps of proposed method and 10 state-of-the-art methods for MOD images.The images show man-made objects including houses and oil depots.

Fig. 8
Fig. 8 Performance of the proposed method compared to 10 other methods: (a) average PR curves and (b) F -measures.
.e., object and background) by K-means to determine the patch category labels.We obtain global-based estimate maps EM Global and background-based estimate maps EM Background by returning the category labels to the corresponding patches.The saliency map integration procedure is shown in part II of Fig.1.
After max pooling in every patch, we obtain the global-based coefficient set α max Global and background based coefficients' set α max Background .Coefficients' sets α max Global and α max Background are separately clustered into two categories (i

Table 1
Mean absolute errors of the proposed method and 10 state-of-the-art methods.