1 May 2010 Color conspicuity map based on wavelet low-pass pyramid for popping out contours of salient objects
Author Affiliations +
Abstract
In Itti's model, which was one of the representative saliency models proposed in 1998, a Gaussian pyramid is used to analyze color information in scene images, and to generate a color conspicuity map. In this conspicuity map, some important objects can be located by salient areas, but their contours cannot be described clearly and perfectly. In this work, a wavelet low-pass pyramid is used to generate a color conspicuity map, and the contours of important objects pop out perfectly from salient areas. Experimental results validate the superiority of the proposed method.
Li, Fang, Huo, and Zhu: Color conspicuity map based on wavelet low-pass pyramid for popping out contours of salient objects

1.

Introduction

In recent years, many saliency models have been proposed to simulate human visual attention. These models have gathered much attention for the successful and rapid search of important objects in scene images 1, 2, 3, 4, 5, 6, 7 According to different salient features from image segmentation5, 6, 7 or direct pixels of an image,1, 2, 3, 4 most previous saliency models can be divided into two categories: region- and pixel-based approaches. As for region-based saliency, a representative saliency model was presented by Aziz and Mertsching.5 In this model, the original image is segmented into fragments by using an image segmentation algorithm first. Based on the segmented image, a color contrast map is generated in terms of the color theory of human vision, a symmetry map is constructed using a novel scanning-based method, and a size contrast map is generated using a new algorithm proposed by Aziz. Eccentricity and orientation maps are computed using the moments of segmented regions, respectively. As for pixel-based saliency,1, 2, 3, 4 a classic and representative saliency model is from Itti, Koch, and Niebur.1 In this model, each input image is processed in parallel through three feature channels including intensity, color, and orientation. The outputs of these channels are ultimately combined to form a saliency map, which indicates the locations of important objects.

In Ref. 5, Aziz and Mertsching compared their results with the results of the model in Ref. 1 and concluded that the salient regions in the color conspicuity map from the model by Itti, Koch, and Niebur could not reflect the contour of important objects. In our experiment, as is seen later, a similar phenomenon can also be found. To solve this problem, a new version of generating color conspicuity maps based on wavelet low-pass pyramids, instead of the Gaussian pyramid in the Ref. 1 model, is proposed in this work. In addition, several heuristic techniques are used to improve the results. In the color conspicuity map generated by our method, the contours of important objects pop out perfectly from salient areas.

2.

Algorithm for Generating Color Conspicuity Map

Let R , G , and B be the red, green, and blue components of the input image, respectively. Let r(x,y) , g(x,y) , and b(x,y) denote the values at location (x,y) in R , G , and B channels, respectively. An intensity image I is produced by

1

i(x,y)=[r(x,y)+g(x,y)+b(x,y)]3,
where i(x,y) is the intensity value at location (x,y) in I . When the intensity value of a pixel in a scene image is very small, the color information of the pixel is hardly perceived. Thus, when i(x,y) is smaller than 110 of the maximum over the whole intensity image I , the values of r(x,y) , g(x,y) , and b(x,y) are set to be zero. In terms of R , G , and B , four new color component images are constructed and denoted by RN , GN , BN , and YN , respectively. Let rn(x,y) , gn(x,y) , bn(x,y) , and yn(x,y) denote the values at location (x,y) in RN , GN , BN , and YN , respectively. RN , GN , BN , and YN are defined as follows:

2

rn(x,y)=r(x,y)[g(x,y)+b(x,y)]2,

3

gn(x,y)=g(x,y)[r(x,y)+b(x,y)]2,

4

bn(x,y)=b(x,y)[r(x,y)+g(x,y)]2,

5

yn(x,y)=g(x,y)+r(x,y)|r(x,y)g(x,y)|b(x,y).
When the values of rn(x,y) , gn(x,y) , bn(x,y) , and yn(x,y) are negative, these values are set to zero. In terms of RN , GN , BN , and YN , four image pyramids RN(k) , GN(k) , BN(k) , and YN(k) are constructed respectively, where k is the number of levels in the pyramid. Unlike the model of Itti, Koch, and Niebur, in this work, a wavelet low-pass pyramid is used to generate an image pyramid instead of the Gaussian pyramid.

Before generating a color conspicuity map, the color feature map needs to be constructed. Let RN(c) , GN(c) , BN(c) , and YN(c) denote the images on level c , respectively. Let RN(s) , GN(s) , BN(s) , and YN(s) denote the images on level s , respectively. c{2,3,4} and s=c+p , p{3,4} . To generate color feature maps, these images are resized to a finer size, respectively. In the Ref. 1 model, the resizing size is level 4 of the pyramid. In this work, the finer size is level c in the size of the pyramid image. When the size of the original image is too small and the pyramid images from original image are resized to the size of level 4, the size of the resized pyramid images may be smaller than 1×1 . Let RN(s)* , GN(s)* , BN(s)* , and YN(s)* be the resized images, respectively. Let rn(s)*(x,y) , gn(s)*(x,y) , bn(s)*(x,y) , and yn(s)*(x,y) denote the values at the location (x,y) in RN(s)* , GN(s)* , BN(s)* , and YN(s)* , respectively. Let rn(c)(x,y) , gn(c)(x,y) , bn(c)(x,y) , and yn(c)(x,y) denote the values at location (x,y) in RN(c) , GN(c) , BN(c) , and YN(c) , respectively. 12 color feature maps denoted by RG(c,s) and BY(c,s) can be generated by

6

rg(c,s)(x,y)=|[rn(c)(x,y)gn(c)(x,y)][rn(s)*(x,y)gn(s)*(x,y)]|,
and

7

by(c,s)(x,y)=|[bn(c)(x,y)yn(c)(x,y)][bn(s)*(x,y)yn(s)*(x,y)]|,
where rg(c,s)(x,y) and by(c,s)(x,y) are the values at location (x,y) in RG(c,s) and BY(c,s) , respectively.

In the model by Itti, Koch, and Niebur, a color conspicuity map is constructed by integrating all color feature maps that are resized to the size of level 4. In this work, the color feature maps are resized to the size of the original image and then are used to generate color conspicuity maps by additional operations. Thereafter, a smart skill is adopted as follows: we square each element of the color conspicuity map to generate a new map. Because of this square operation, the range of values in the color conspicuity map is stretched, which results in few redundant salient areas in the color conspicuity map.

Wavelet low-pass pyramid

The level 0, the base of the pyramid, is the original image. The i ’th-level image is obtained from the (i1) ’th-level image by: 1. translating the (i1) ’th-level image by wavelet transform, and 2. extracting the low-pass part of the resultant image from 1. as the i ’th-level image.

Gaussian pyramid

The level 0, the base of the pyramid, is the original image. The i ’th-level image is obtained from the (i1) ’th-level image by: 1. using a Gaussian filter kernel to convolute the (i1) ’th-level image, and 2. downsampling the image from 1.

3.

Experimental Results

In our experiment, original images from Refs. 1, 5, 8 are used as input images to generate color conspicuity maps. These images are shown in Figs. 1 and 2. Because the color conspicuity maps based on the Harr wavelet function are similar to the ones based on other wavelet functions, only the color conspicuity maps based on the Harr wavelet function are generated. Figures 1 and 2 represent the color conspicuity maps from the Ref. 1 model based on the Gaussian pyramid. It is easy to see that all salient regions in these color conspicuity maps point to important objects, such as the red telephone, golden building, balloon, dog, etc. However, the contours of these important objects cannot be represented by the salient regions. These salient regions have many fragments that cover part of the object. Figures 1 and 2 show the color conspicuity maps based on a wavelet low-pass pyramid. Comparing the color conspicuity maps with the ones in Figs. 1 and 2, the contours represented by salient regions based on the wavelet low-pass pyramid are better than ones based on the Gaussian pyramid. These objects, such as the postbox, building, animal, etc., can be distinctly perceived from the color conspicuity maps in Figs. 1 and 2.

Fig. 1

(a) Input images. Row 1: the image from Ref. 1. Row 2: the image from Ref. 8. (b) The color conspicuity maps generated by Gaussian pyramid. (c) The color conspicuity maps generated by wavelet low-pass pyramid. (Color online only.)

050502_1_1.jpg

Fig. 2

(a) Input images from Ref. 5. (b) The color conspicuity maps generated by Gaussian pyramid. (c) The color conspicuity maps generated by wavelet low-pass pyramid. (Color online only.)

050502_1_2.jpg

For the quantitative comparison of our results with the ones from Ref. 1, a comparison criterion is set. All of the labeled maps corresponding to images in Figs. 1 and 2 are manually annotated in the following way: 20 individuals are invited to view an image. The time for viewing the image is one second. After that, the individuals are asked what they see first. If one object in the image is noticed first by most of the individuals, this object is viewed as the salient one and labeled to generate a labeled map, as shown in Fig. 3. In the labeled map, the white areas represent the salient areas. Then, the salient fragments in color conspicuity maps from our model and the model from Itti, Koch, and Niebur are analyzed. If the salient units (pixels) in the conspicuity map are also salient in the labeled map, they are defined as hit units, else as false units. In the same image, if the number of hit units is nearly the same in these two models, then the smaller the number of false units, the better the model. If the number of false units is nearly the same, then the bigger number of hit units corresponds to the better model. The comparative results are listed in Table 1. The term “S area” means the number of the labeled salient units (the white pixels in the labeled map), “H units” is the number of hit units, and “F units” is the number of false units. As a whole, the results indicate that our model excels the model in Ref. 1.

Fig. 3

Labeled maps corresponding to Figs. 1 and 2.

050502_1_3.jpg

Table 1

Comparison data corresponding to criterion.

ImageS areaOur modelRef. 1 model
H unitsF unitsH unitsF units
Fig. 1.17095629144272
Fig. 1.256,63956,09211023,184105
Fig. 2.17488701112321610
Fig. 2.229862704157128
Fig. 2.314,87410,89635103223
Fig. 2.4185986213348124
Fig. 2.576467608821533116
Fig. 2.6899268021431612
Fig. 2.7426367125339134
Fig. 2.822342141231419
Fig. 2.919733842627921
Fig. 2.10611127621315

4.

Conclusion

The salient regions in the color conspicuity map from the model from Itti, Koch, and Niebur based on the Gaussian pyramid can only cover part of each object. In this work, the proposed method based on a wavelet low-pass pyramid generates the color conspicuity map, and the salient regions in this map can describe the contours of the objects in an image perfectly. Therefore, our method has the ability of improving the quality of scene analysis and object search.

Acknowledgments

This work is supported by the National Basic Research Program of China (973 program) under grant number 2006CB701303, and the National High Technology Research and Development Program of China (863 program) under grant number 2006AA12Z105.

References

1.  L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 20(11), 1254–1259 (1998). 10.1109/34.730558 Google Scholar

2.  X. D. Hou and L. Q. Zhang, “Saliency detection: a spectral residual approach,” IEEE Intl. Conf. Computer Vision Patt. Recog., pp. 1–8 (2007). Google Scholar

3.  J. K. Tsotsos and S. M. Culhane, “Modeling visual attention via selective tuning,” Artif. Intell.0004-3702 78(1), 507–545 (1995). 10.1016/0004-3702(95)00025-9 Google Scholar

4.  B. A. Olshausen, C. H. Anderson, and D. C. V. Essen, “A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information,” J. Neurosci.0270-6474 13, 4700–4719 (1993). Google Scholar

5.  M. Z. Aziz and B. Mertsching, “Fast and robust generation of feature maps for region-based visual attention,” IEEE Trans. Image Process.1057-7149 17(5), 633–644 (2008). 10.1109/TIP.2008.919365 Google Scholar

6.  C. Carson, S. Belongie, H. Greespan, and J. Malik, “Blobworld: image segmentation using expectation maximization and its application to image querying,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 24(8), 1026–1038 (2002). 10.1109/TPAMI.2002.1023800 Google Scholar

7.  J. Z. Wang, J. Li, and G. Wiederhold, “Simplicity: semantics-sensitive integrated matching for picture libraries,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 23(9), 947–963 (2001). 10.1109/34.955109 Google Scholar

Zhiqiang Li, Tao Fang, Hong Huo, Julian Zhu, "Color conspicuity map based on wavelet low-pass pyramid for popping out contours of salient objects," Optical Engineering 49(5), 050502 (1 May 2010). https://doi.org/10.1117/1.3425655
JOURNAL ARTICLE
3 PAGES


SHARE
Back to Top