Predicting which areas of an image are perceptually salient or attended to has become an essential pre-requisite
of many computer vision applications. Because observers are notoriously unreliable in remembering where they
look a posteriori, and because asking where they look while observing the image necessarily in
uences the results,
ground truth about saliency and visual attention has to be obtained by gaze tracking methods.
From the early work of Buswell and Yarbus to the most recent forays in computer vision there has been, perhaps
unfortunately, little agreement on standardisation of eye tracking protocols for measuring visual attention.
As the number of parameters involved in experimental methodology can be large, their individual in
the nal results is not well understood. Consequently, the performance of saliency algorithms, when assessed by
correlation techniques, varies greatly across the literature.
In this paper, we concern ourselves with the problem of image quality. Specically: where people look when
judging images. We show that in this case, the performance gap between existing saliency prediction algorithms
and experimental results is signicantly larger than otherwise reported. To understand this discrepancy, we rst
devise an experimental protocol that is adapted to the task of measuring image quality. In a second step, we
compare our experimental parameters with the ones of existing methods and show that a lot of the variability
can directly be ascribed to these dierences in experimental methodology and choice of variables.
In particular, the choice of a task, e.g., judging image quality vs. free viewing, has a great impact on measured
saliency maps, suggesting that even for a mildly cognitive task, ground truth obtained by free viewing does not
adapt well. Careful analysis of the prior art also reveals that systematic bias can occur depending on instrumental
calibration and the choice of test images.
We conclude this work by proposing a set of parameters, tasks and images that can be used to compare the
various saliency prediction methods in a manner that is meaningful for image quality assessment.
Digital camera sensors are sensitive to wavelengths ranging from the ultraviolet (200-400nm) to the near-infrared
(700-100nm) bands. This range is, however, reduced because the aim of photographic cameras is to capture and
reproduce the visible spectrum (400-700nm) only. Ultraviolet radiation is filtered out by the optical elements of
the camera, while a specifically designed "hot-mirror" is placed in front of the sensor to prevent near-infrared
contamination of the visible image.
We propose that near-infrared data can actually prove remarkably useful in colour constancy, to estimate the
incident illumination as well as providing to detect the location of different illuminants in a multiply lit scene.
Looking at common illuminants spectral power distribution show that very strong differences exist between
the near-infrared and visible bands, e.g., incandescent illumination peaks in the near-infrared while fluorescent
sources are mostly confined to the visible band.
We show that illuminants can be estimated by simply looking at the ratios of two images: a standard RGB
image and a near-infrared only image. As the differences between illuminants are amplified in the near-infrared,
this estimation proves to be more reliable than using only the visible band. Furthermore, in most multiple
illumination situations one of the light will be predominantly near-infrared emitting (e.g., flash, incandescent)
while the other will be mostly visible emitting (e.g., fluorescent, skylight). Using near-infrared and RGB image
ratios allow us to accurately pinpoint the location of diverse illuminant and recover a lighting map.