Letter-scanning cameras (LSCs) form the front- end imaging systems for virtually all mail-scanning systems that are currently used to automatically sort mail products. As with any vision-dependent technology, the quality of the images generated by the camera is fundamental to the overall performance of the system. We present novel techniques for objective evaluation of LSCs using comparative imaging-a technique that involves measuring the fidelity of target images produced by a camera with reference to an image of the same target captured at very high quality. Such a framework provides a unique opportunity to directly quantify the camera's ability to capture real-world targets, such as handwritten and printed text. Noncomparative techniques were also used to measure properties such as the camera's modulation transfer function, dynamic range, and signal-to-noise ratio. To simulate real-world imaging conditions, application-specific test samples were designed using actual mail product materials.
Perceptual image distortion measures can play a fundamental role in evaluating and optimizing imaging systems
and image processing algorithms. Many existing measures are formulated to represent "just noticeable differences"
(JNDs), as measured in psychophysical experiments on human subjects. But some image distortions,
such as those arising from small changes in the intensity of the ambient illumination, are far more tolerable to
human observers than those that disrupt the spatial structure of intensities and colors. Here, we introduce a
framework in which we quantify these perceptual distortions in terms of "just intolerable differences" (JIDs).
As in (Wang & Simoncelli, Proc. ICIP 2005), we first construct a set of spatio-chromatic basis functions to
approximate (as a first-order Taylor series) a set of "non-structural" distortions that result from changes in
lighting/imaging/viewing conditions. These basis functions are defined on local image patches, and are adaptive,
in that they are computed as functions of the undistorted reference image. This set is then augmented with a
complete basis arising from a linear approximation of the CIELAB color space. Each basis function is weighted
by a scale factor to convert it into units corresponding to JIDs. Each patch of the error image is represented
using this weighted overcomplete basis, and the overall distortion metric is computed by summing the squared
coefficients over all such (overlapping) patches. We implement an example of this metric, incorporating invariance
to small changes in the viewing and lighting conditions, and demonstrate that the resulting distortion values
are more consistent with human perception than those produced by CIELAB or S-CIELAB.
Recent years have seen a resurgent interest in eye movements during natural scene viewing. Aspects of eye movements that are driven by low-level image properties are of particular interest due to their applicability to biologically motivated artificial vision and surveillance systems. In this paper, we report an experiment in which we recorded observers’ eye movements while they viewed calibrated greyscale images of natural scenes. Immediately after viewing each image, observers were shown a test patch and asked to indicate if they thought it was part of the image they had just seen. The test patch was either randomly selected from a different image from the same database or, unbeknownst to the observer, selected from either the first or last location fixated on the image just viewed. We find that several low-level image properties differed significantly relative to the observers’ ability to successfully designate each patch. We also find that the differences between patch statistics for first and last fixations are small compared to the differences between hit and miss responses. The goal of the paper was to, in a non-cognitive natural setting, measure the image properties that facilitate visual memory, additionally observing the role that temporal location (first or last fixation) of the test patch played. We propose that a memorability map of a complex natural scene may be constructed to represent the low-level memorability of local regions in a similar fashion to the familiar saliency map, which records bottom-up fixation attractors.
Seemingly complex tasks like visual search can be analyzed using a cognition-free, bottom-up framework. We sought to reveal strategies used by observers in visual search tasks using accurate eye tracking and image analysis at point of gaze. Observers were instructed to search for simple geometric targets embedded in 1/f noise. By
analyzing the stimulus at the point of gaze using the classification image (CI) paradigm, we discovered CI templates that indeed resembled the target. No such structure emerged for a random-searcher. We demonstrate, qualitatively and quantitatively, that these CI templates are useful in predicting stimulus regions that draw
human fixations in search tasks. Filtering a 1/f noise stimulus with a CI results in a 'fixation prediction map'. A qualitative evaluation of the prediction was obtained by overlaying k-means clusters of observers' fixations on the prediction map. The fixations clustered around the local maxima in the prediction map. To obtain a quantitative comparison, we computed the Kullback-Leibler distance between the recorded fixations and the prediction. Using random-searcher CIs in Monte Carlo simulations, a distribution of this distance was obtained. The z-scores for the human CIs and the original target were -9.70 and -9.37 respectively indicating that even
in noisy stimuli, observers deploy their fixations efficiently to likely targets rather than casting them randomly hoping to fortuitously find the target.