Predicting which areas of an image are perceptually salient or attended to has become an essential pre-requisite
of many computer vision applications. Because observers are notoriously unreliable in remembering where they
look a posteriori, and because asking where they look while observing the image necessarily in
uences the results,
ground truth about saliency and visual attention has to be obtained by gaze tracking methods.
From the early work of Buswell and Yarbus to the most recent forays in computer vision there has been, perhaps
unfortunately, little agreement on standardisation of eye tracking protocols for measuring visual attention.
As the number of parameters involved in experimental methodology can be large, their individual in
the nal results is not well understood. Consequently, the performance of saliency algorithms, when assessed by
correlation techniques, varies greatly across the literature.
In this paper, we concern ourselves with the problem of image quality. Specically: where people look when
judging images. We show that in this case, the performance gap between existing saliency prediction algorithms
and experimental results is signicantly larger than otherwise reported. To understand this discrepancy, we rst
devise an experimental protocol that is adapted to the task of measuring image quality. In a second step, we
compare our experimental parameters with the ones of existing methods and show that a lot of the variability
can directly be ascribed to these dierences in experimental methodology and choice of variables.
In particular, the choice of a task, e.g., judging image quality vs. free viewing, has a great impact on measured
saliency maps, suggesting that even for a mildly cognitive task, ground truth obtained by free viewing does not
adapt well. Careful analysis of the prior art also reveals that systematic bias can occur depending on instrumental
calibration and the choice of test images.
We conclude this work by proposing a set of parameters, tasks and images that can be used to compare the
various saliency prediction methods in a manner that is meaningful for image quality assessment.