3D processing techniques are really promising. However, several hurdles have to be overcome. In this paper, two of
them are examined. The first is related to the high disparity management. It is currently not well mastered and its impact
is strong for viewing 3D scene on stereoscopic screens. The second concerns the salient regions of the scene. These areas
are commonly called Region-Of-Interest (RoI) in the image processing domain. The problem appears when there are
more than one region-of-interest in a video scene. Indeed, it is then complicated for the eyes to scan them and especially
if the depth difference between them is high. In this contribution, the 3D experience is improved by applying some
effects related to RoIs. The shift between the two views is adaptively adjusted in order to have a null disparity on a given
area in the scene. In the proposed approach, these areas are the visually interesting areas. A constant disparity on the
salient areas improves the viewing experience over the video sequence.
Most of the efficient objective image or video quality metrics are based on properties and models of the Human Visual System (HVS). This paper is dealing with two major drawbacks related to HVS properties used in such metrics applied in the DWT domain : subband decomposition and masking effect. The multi-channel behavior of the HVS can be emulated applying a perceptual subband decomposition. Ideally, this can be performed in the Fourier domain but it requires too much computation cost for many applications. Spatial transform such as DWT is a good alternative to reduce computation effort but the correspondence between the perceptual subbands and the usual wavelet ones is not straightforward. Advantages and limitations of the DWT are discussed, and compared with models based on a DFT. Visual masking is a sensitive issue. Several models exist in literature. Simplest models can only predict visibility threshold for very simple cue while for natural images one should consider more complex approaches such as entropy masking. The main issue relies on finding a revealing measure of the surround influences and an adaptation: should we use the spatial activity, the entropy, the type of texture, etc.? In this paper, different visual masking models using DWT are discussed and compared.
In this paper, a coherent computational model of visual selective attention for color pictures is described and its performances are precisely evaluated. The model based on some important behaviours of the human visual system is composed of four parts: visibility, perception, perceptual grouping and saliency map construction. This paper focuses mainly on its performances assessment by achieving extended subjective and objective comparisons with real fixation points captured by an eye-tracking system used by the observers in a task-free viewing mode. From the knowledge of the ground truth, qualitatively and quantitatively comparisons have been made in terms of the measurement of the linear correlation coefficient (CC) and of the Kulback Liebler divergence (KL). On a set of 10 natural color images, the results show that the linear correlation coefficient and the Kullback Leibler divergence are of about 0.71 and 0.46, respectively. CC and Kl measures with this model are respectively improved by about 4% and 7% compared to the best model proposed by L.Itti. Moreover, by comparing the ability of our model to predict eye movements produced by an average observer, we can conclude that our model succeeds quite well in predicting the spatial locations of the most important areas of the image content.
The saliency-based or bottom-up model of visual attention presented in this paper deals with still color images. The
model we built is based on numerous properties of the human visual system (HVS), thus providing a biologically
plausible system. The computation of early visual features such as color and orientation is a key step for any bottom-up
model and the way to extract these visual features easily permits to differentiate a model from an other. The novelty of
the proposed approach lies on the fact that the computation of early visual features is fully based on a HVS model
consisting in projecting the picture into an opponent-colors space, applying a perceptual decomposition, contrast
sensitivity and masking functions. Moreover, a strategy essentially based on a center surround mechanism and on the
perceptual grouping phenomena underscores conspicuous locations by combining visual feature maps. A saliency map
which is defined as a 2D topographic representation of conspicuity is then deduced. The model is applied to a number of
natural images. Our results are then compared with the results of a well-know bottom-up model.