In recent years, many saliency models have been proposed to simulate human visual attention. These models have gathered much attention for the successful and rapid search of important objects in scene images 1, 2, 3, 4, 5, 6, 7 According to different salient features from image segmentation5, 6, 7 or direct pixels of an image,1, 2, 3, 4 most previous saliency models can be divided into two categories: region- and pixel-based approaches. As for region-based saliency, a representative saliency model was presented by Aziz and Mertsching.5 In this model, the original image is segmented into fragments by using an image segmentation algorithm first. Based on the segmented image, a color contrast map is generated in terms of the color theory of human vision, a symmetry map is constructed using a novel scanning-based method, and a size contrast map is generated using a new algorithm proposed by Aziz. Eccentricity and orientation maps are computed using the moments of segmented regions, respectively. As for pixel-based saliency,1, 2, 3, 4 a classic and representative saliency model is from Itti, Koch, and Niebur.1 In this model, each input image is processed in parallel through three feature channels including intensity, color, and orientation. The outputs of these channels are ultimately combined to form a saliency map, which indicates the locations of important objects.
In Ref. 5, Aziz and Mertsching compared their results with the results of the model in Ref. 1 and concluded that the salient regions in the color conspicuity map from the model by Itti, Koch, and Niebur could not reflect the contour of important objects. In our experiment, as is seen later, a similar phenomenon can also be found. To solve this problem, a new version of generating color conspicuity maps based on wavelet low-pass pyramids, instead of the Gaussian pyramid in the Ref. 1 model, is proposed in this work. In addition, several heuristic techniques are used to improve the results. In the color conspicuity map generated by our method, the contours of important objects pop out perfectly from salient areas.
Algorithm for Generating Color Conspicuity Map
Let , , and be the red, green, and blue components of the input image, respectively. Let , , and denote the values at location in , , and channels, respectively. An intensity image is produced byis the intensity value at location in . When the intensity value of a pixel in a scene image is very small, the color information of the pixel is hardly perceived. Thus, when is smaller than of the maximum over the whole intensity image , the values of , , and are set to be zero. In terms of , , and , four new color component images are constructed and denoted by , , , and , respectively. Let , , , and denote the values at location in , , , and , respectively. , , , and are defined as follows: , , , and are negative, these values are set to zero. In terms of , , , and , four image pyramids , , , and are constructed respectively, where is the number of levels in the pyramid. Unlike the model of Itti, Koch, and Niebur, in this work, a wavelet low-pass pyramid is used to generate an image pyramid instead of the Gaussian pyramid.
Before generating a color conspicuity map, the color feature map needs to be constructed. Let , , , and denote the images on level , respectively. Let , , , and denote the images on level , respectively. and , . To generate color feature maps, these images are resized to a finer size, respectively. In the Ref. 1 model, the resizing size is level 4 of the pyramid. In this work, the finer size is level in the size of the pyramid image. When the size of the original image is too small and the pyramid images from original image are resized to the size of level 4, the size of the resized pyramid images may be smaller than . Let , , , and be the resized images, respectively. Let , , , and denote the values at the location in , , , and , respectively. Let , , , and denote the values at location in , , , and , respectively. 12 color feature maps denoted by and can be generated byand are the values at location in and , respectively.
In the model by Itti, Koch, and Niebur, a color conspicuity map is constructed by integrating all color feature maps that are resized to the size of level 4. In this work, the color feature maps are resized to the size of the original image and then are used to generate color conspicuity maps by additional operations. Thereafter, a smart skill is adopted as follows: we square each element of the color conspicuity map to generate a new map. Because of this square operation, the range of values in the color conspicuity map is stretched, which results in few redundant salient areas in the color conspicuity map.
Wavelet low-pass pyramid
The level 0, the base of the pyramid, is the original image. The ’th-level image is obtained from the ’th-level image by: 1. translating the ’th-level image by wavelet transform, and 2. extracting the low-pass part of the resultant image from 1. as the ’th-level image.
In our experiment, original images from Refs. 1, 5, 8 are used as input images to generate color conspicuity maps. These images are shown in Figs. 1 and 2. Because the color conspicuity maps based on the Harr wavelet function are similar to the ones based on other wavelet functions, only the color conspicuity maps based on the Harr wavelet function are generated. Figures 1 and 2 represent the color conspicuity maps from the Ref. 1 model based on the Gaussian pyramid. It is easy to see that all salient regions in these color conspicuity maps point to important objects, such as the red telephone, golden building, balloon, dog, etc. However, the contours of these important objects cannot be represented by the salient regions. These salient regions have many fragments that cover part of the object. Figures 1 and 2 show the color conspicuity maps based on a wavelet low-pass pyramid. Comparing the color conspicuity maps with the ones in Figs. 1 and 2, the contours represented by salient regions based on the wavelet low-pass pyramid are better than ones based on the Gaussian pyramid. These objects, such as the postbox, building, animal, etc., can be distinctly perceived from the color conspicuity maps in Figs. 1 and 2.
For the quantitative comparison of our results with the ones from Ref. 1, a comparison criterion is set. All of the labeled maps corresponding to images in Figs. 1 and 2 are manually annotated in the following way: 20 individuals are invited to view an image. The time for viewing the image is one second. After that, the individuals are asked what they see first. If one object in the image is noticed first by most of the individuals, this object is viewed as the salient one and labeled to generate a labeled map, as shown in Fig. 3. In the labeled map, the white areas represent the salient areas. Then, the salient fragments in color conspicuity maps from our model and the model from Itti, Koch, and Niebur are analyzed. If the salient units (pixels) in the conspicuity map are also salient in the labeled map, they are defined as hit units, else as false units. In the same image, if the number of hit units is nearly the same in these two models, then the smaller the number of false units, the better the model. If the number of false units is nearly the same, then the bigger number of hit units corresponds to the better model. The comparative results are listed in Table 1. The term “S area” means the number of the labeled salient units (the white pixels in the labeled map), “H units” is the number of hit units, and “F units” is the number of false units. As a whole, the results indicate that our model excels the model in Ref. 1.
Comparison data corresponding to criterion.
|Image||S area||Our model||Ref. 1 model|
|H units||F units||H units||F units|
The salient regions in the color conspicuity map from the model from Itti, Koch, and Niebur based on the Gaussian pyramid can only cover part of each object. In this work, the proposed method based on a wavelet low-pass pyramid generates the color conspicuity map, and the salient regions in this map can describe the contours of the objects in an image perfectly. Therefore, our method has the ability of improving the quality of scene analysis and object search.
This work is supported by the National Basic Research Program of China (973 program) under grant number 2006CB701303, and the National High Technology Research and Development Program of China (863 program) under grant number 2006AA12Z105.