Proc. SPIE. 10410, Unconventional and Indirect Imaging, Image Reconstruction, and Wavefront Sensing 2017
KEYWORDS: 3D image reconstruction, Computer simulations, 3D modeling, Inverse problems, Machine vision, Visual system, Human vision and color perception, Virtual reality, 3D vision, 3D image processing
Virtual reality applications provide an opportunity to test human vision in well-controlled scenarios that would be
difficult to generate in real physical spaces. This paper presents a study intended to evaluate the importance of the
regularity priors used by the human visual system. Using a CAVE simulation, subjects viewed virtual objects in a variety
of experimental manipulations. In the first experiment, the subject was asked to count the objects in a scene that was
viewed either right-side-up or upside-down for 4 seconds. The subject counted more accurately in the right-side-up
condition regardless of the presence of binocular disparity or color. In the second experiment, the subject was asked to
reconstruct the scene from a different viewpoint. Reconstructions were accurate, but the position and orientation error
was twice as high when the scene was rotated by 45°, compared to 22.5°. Similarly to the first experiment, there was
little difference between monocular and binocular viewing. In the third experiment, the subject was asked to adjust the
position of one object to match the depth extent to the frontal extent among three objects. Performance was best with
symmetrical objects and became poorer with asymmetrical objects and poorest with only small circular markers on the
floor. Finally, in the fourth experiment, we demonstrated reliable performance in monocular and binocular recovery of
3D shapes of objects standing naturally on the simulated horizontal floor. Based on these results, we conclude that
gravity, horizontal ground, and symmetry priors play an important role in veridical perception of scenes.
We present an approach to figure/ground organization using mirror symmetry as a general purpose and biologically motivated prior. Psychophysical evidence suggests that the human visual system makes use of symmetry in producing three-dimensional (3-D) percepts of objects. 3-D symmetry aids in scene organization because (i) almost all objects exhibit symmetry, and (ii) configurations of objects are not likely to be symmetric unless they share some additional relationship. No general purpose approach is known for solving 3-D symmetry correspondence in two-dimensional (2-D) camera images, because few invariants exist. Therefore, we present a general purpose method for finding 3-D symmetry correspondence by pairing the problem with the two-view geometry of the binocular correspondence problem. Mirror symmetry is a spatially global property that is not likely to be lost in the spatially local noise of binocular depth maps. We tested our approach on a corpus of 180 images collected indoors with a stereo camera system. K-means clustering was used as a baseline for comparison. The informative nature of the symmetry prior makes it possible to cluster data without a priori knowledge of which objects may appear in the scene, and without knowing how many objects there are in the scene.
To achieve high perceptual quality of compressed images, many objective image quality metrics for compression artifacts evaluation and reduction have been developed based on characterization of local image features. However, it is the end user who is judging the image quality in various applications, so the validation of how well these metrics predict human perception is important and necessary. In this paper, we present a preliminary psychophysics experiment method to capture human perception of local ringing artifacts in JPEG images with different severity levels. Observers are asked to annotate the compressed image where they perceive artifacts along the edges, directly on the screen using an interactive tablet display. They are asked to catalog the severity of artifacts into one of the three levels: Strong, Medium, and Light. We process the hand-marked data into a ringing visibility edge map showing a ringing severity mean opinion score (MOS) at every edge pixel. The perceptual information captured in this experiment, enables us to study the correlation between human perception and local image features, which is an important step towards the goal of developing a non-reference (NR) objective metric to predict the visibility of JPEG ringing artifacts in alignment with the assessments of human observers.
Automatically quantifying the aesthetic appeal of images is an interesting problem in computer science and image
processing. In this paper, we incorporate aesthetic properties and convert them into computable image features for
classifying photographs taken by amateur and professional photographers. In particular, color histograms, spatial edge
distribution, and repetition identification are used as features. Results of experiments on professional and amateur
photograph data sets confirm the discriminative power of these features.
Applications that classify and search documents based on their visual appearance need to recognize what
document features are the most critical to human perception when humans compare the documents. This paper presents
the results of a psychophysical experiment where subjects were asked to group the documents based on their visual
similarity. Results from 15 subjects were saved into similarity matrices, and tested for inter-rater agreement. The
similarity matrix averaged across the subjects was analyzed using agglomerative hierarchical clustering to identify the
clusters. The humans' clustering was approximated with the weighted sum of four distance matrices that we calculated
based on four document features. We identified the relative importance of the document features using an optimization
method. Then, we tested the approximation using K-fold cross validation and the K-nearest neighbor algorithm. The
results of the testing confirm the effectiveness of our approach.
The use of color electrophotographic (EP) laser printing systems is growing because of their declining cost. Thus, the print quality of color EP laser printers has become increasingly important. Since text and lines are indispensable to print quality, many studies have proposed methods for measuring these print quality attributes. Toner scatter caused by toner overdevelopment in color EP laser printers can significantly impact print quality. A conventional approach to reduce toner overdevelopment is to restrict the color gamut of printers. However, this can result in undesired color shifts and the introduction of halftone texture. Coring, defined as a process where the colorant level is reduced in the interior of text or characters, is a remedy for these shortcomings. The desired amount of reduction for coring depends on line width and overall nominal colorant level. In previous work, these amounts were chosen on the basis of data on the perception of edge blur obtained from softcopy simulation of the blurring. We describe psychophysical studies to directly establish optimal coring values as a function of line width and nominal colorant level. For each line width and nominal colorant level, this is done by asking human subjects to choose the minimum amount of coring that is necessary to eliminate the perception of toner scatter. We conduct four separate psychophysical studies to address different aspects of this question.
The use of color electrophotographic (EP) laser printing systems is growing because of their declining cost.
Thus, the print quality of color EP laser printers is more important than ever before. Since text and lines are
indispensable to print quality, many studies have proposed methods for measuring these print quality attributes.
Toner scatter caused by toner overdevelopment in color EP laser printers can significantly impact print quality.
A conventional approach to reduce toner overdevelopment is to restrict the color gamut of printers. However,
this can result in undesired color shifts and the introduction of halftone texture in light regions. Coring, defined
as a process whereby the colorant level is reduced in the interior of text or characters, is a remedy for these
shortcomings. The desired amount of reduction for coring depends on line width and overall nominal colorant
level. In previous work, these amounts were chosen on the basis of data on the perception of edge blur that was
published over 25 years ago.
Fine-pitch banding is one of the most unwanted artifacts in laser electrophotographic (EP) printers. It is
perceived as a quasiperiodic fluctuation in the process direction. Therefore, it is essential for printer vendors to
know how banding is perceived by humans in order to improve print quality. Monochrome banding has been
analyzed and assessed by many researchers; but there is no literature that deals with the banding of color laser
printers as measured from actual prints. The study of color banding is complicated by the fact that the color
banding signal is physically defined in a three-dimensional color space, while banding perception is described
in a one-dimensional sense such as more banding or less banding. In addition, the color banding signal arises
from the independent contributions of the four primary colorant banding signals. It is not known how these four
distinct signals combine to give rise to the perception of color banding. In this paper, we develop a methodology
to assess the banding visibility of the primary colorant cyan based on human visual perception. This is our
first step toward studying the more general problem of color banding in combinations of two or more colorants.
According to our method, we print and scan the cyan test patch, and extract the banding profile as a one
dimensional signal so that we can freely adjust the intensity of banding. Thereafter, by exploiting the pulse
width modulation capability of the laser printer, the extracted banding profile is used to modulate a pattern
consisting of periodic lines oriented in the process direction, to generate extrinsic banding. This avoids the
effect of the halftoning algorithm on the banding. Furthermore, to conduct various banding assessments more
efficiently, we also develop a softcopy environment that emulates a hardcopy image on a calibrated monitor, which
requires highly accurate device calibration throughout the whole system. To achieve the same color appearance
as the hardcopy, we perform haploscopic matching experiments that allow each eye to independently adapt to
different viewing conditions; and we find an appearance mapping function in the adapted XYZ space. Finally, to
validate the accuracy of the softcopy environment, we conduct a banding matching experiment at three different
banding levels by the memory matching method, and confirm that our softcopy environment produces the same
banding perception as the hardcopy. In addition, we perform two more separate psychophysical experiments
to measure the differential threshold of the intrinsic banding in both the hardcopy and softcopy environments,
and confirm that the two thresholds are statistically identical. The results show that with our target printer,
human subjects can see a just noticeable difference with a 9% reduction in the banding magnitude for the cyan
Retinal image of a symmetric object is itself symmetric only for a small set of viewing directions. Interestingly, human
subjects have little difficulty in determining whether a given retinal image was produced by a symmetric object,
regardless of the viewing direction. We tested perception of planar (2D) symmetric figures (dotted patterns and
polygons) when the figures were slanted in depth. We found that symmetry could be detected reliably with polygons,
but not with dotted patterns. Next, we tested the role image features representing the symmetry of the pattern itself
(orientation of projected symmetry axis and symmetry lines) vs. those representing the 3D viewing direction
(orientation of the axis of rotation). We found that symmetry detection is improved when the projected symmetry axis
or lines are known to the subject, but not when the axis of rotation is known. Finally, we showed that performance with
orthographic images is higher than that with perspective images. A computational model, which measures the
asymmetry of the presented polygon based on its single orthographic or perspective image, is presented. Performance
of the model is similar to the performance of human subjects.
We present a new algorithm for reconstructing 3D shapes. The algorithm takes one 2D image of a 3D shape and
reconstructs the 3D shape by applying a priori constraints: symmetry, planarity and compactness. The shape is
reconstructed without using information about the surfaces, such as shading, texture, binocular disparity or motion.
Performance of the algorithm is illustrated on symmetric polyhedra, but the algorithm can be applied to a very wide
range of shapes. Psychophysical plausibility of the algorithm is discussed.
We test perception of 3-D spatial relations in 3-D images rendered by a 3-D display and compare it to that of a high-resolution flat panel display. Our 3-D display is a device that renders a 3-D image by displaying, in rapid succession, radial slices through the scene on a rotating screen. The image is contained in a glass globe and can be viewed from virtually any direction. We conduct a psychophysical experiment where objects with varying complexity are used as stimuli. On each trial, an object or a distorted version is shown at an arbitrary orientation. The subject's task is to decide whether or not the object is distorted under several viewing conditions (monocular/binocular, with/without motion parallax, and near/far). The subject's performance is measured by the detectability d, a conventional dependent variable in signal detection experiments. Highest d values are measured for the 3-D display when the subject is allowed to walk around the display.
We will provide psychophysical evidence that recognition of parts of object contours is a necessary component of object recognition. It seems to be obvious that the recognition of parts of object contours is performed by applying a partial shape similarity measure to the query contour part and to the known contour parts. The recognition is completed once a sufficiently similar contour part is found in the database of known contour parts. We will derive necessary requirements for any partial shape similarity measure based on this scenario. We will show that existing shape similarity measures do not satisfy these requirements, and propose a new partial shape similarity measure.
The quality of the prints produced by an inkjet printer is highly dependent on the characteristics of the dots produced by the inkjet pens. While some literature discusses metrics for the objective evaluation of print quality, few of the efforts have combined automated quality tests with subjective assessment. We develop an algorithm for analyzing printed dots and study the effects of the dot characteristics on the perceived print alignment. We establish the perceptual preferences of human observers via a set of psychophysical experiments.
Binocular reconstruction of a 3D shape is an ill-conditioned inverse problem: in the presence of visual and oculomotor noise the reconstructions based solely on visual data are very unstable. A question, therefore, arises about the nature of a priori constraints that would lead to accurate and stable solutions. Our previous work showed that planarity of contours, symmetry of an object and minimum variance of angles are useful priors in binocular reconstruction of polyhedra. Specifically, our algorithm begins with producing a 3D reconstruction from one retinal image by applying priors. The second image (binocular disparity) is then used to correct the monocular reconstruction. In our current study, we performed psychophysical experiments to test the importance of these priors. The subjects were asked to recognize shapes of 3D polyhedra from unfamiliar views. Hidden edges of the polyhedra were removed. The recognition performance, measured by detectability measure d¢, was high when shapes satisfied regularity constraints, and was low otherwise. Furthermore, the binocular recognition performance was highly correlated with the monocular one. The main aspects of our model will be illustrated by a demo, in which binocular disparity and monocular priors are put in conflict.
Banding is a printer artifact perceived as one dimensional luminance variations across the print-out caused by the vibrations of different printer components. In the printing industry, banding is considered to be one of the worst defects that dominates overall perceived image quality. Understanding the visibility of banding will help us in developing strategies to reduce the banding artifact. We developed a softcopy environment to conduct various experiments for investigating the visibility of banding. This environment includes the methodology to duplicate the print on the monitor, and a banding extraction technique. This technique enables us to freely adjust the magnitude of banding of any printer. We validated the accuracy of this methodology by conducting a banding matching experiment. We used this platform to conduct banding visibility assessment experiments. One of them was a banding discrimination experiment. The results showed that
for the printers investigated, a reduction of 6.5% in the banding magnitude will be just noticeable by an average observer. We were also able to find the detection thresholds of banding in grayscale images for three laser electrophotographic printers. The detection threshold of the best printer was about 50% of its original banding. So there is still plenty of room to reduce the visibility of the banding artifact. We were also able to compare the banding visibility of different printers quantitatively by conducting a cross-platform experiment. This methodology can form the basis for a metric for visibility of banding.
Sharpness is an important attribute that contributes to the
overall impression of image quality. As digital photography
becomes more and more popular, digital photo enhancement has been
a topic of great interest. In this paper, we investigate two
issues related to digital photo sharpness. 1) How do we
quantitatively measure the sharpness of a digital image? 2) What is
the preferred sharpness of a digital image, and what is the relation
between preferred sharpness and sharpness detection threshold? Both
issues are of practical use to the digital photography market.
First, we present the design and properties of three sharpness metrics
to answer the first question. Next, we describe psychophysical experiments to investigate the second question. It is found that 1) the sharpness metric Digital Sharpness Scale (DSS) and Average Edge Transition Slope (AETS) are highly correlated to the perceived sharpness; 2) Both DSS and AETS predict sharpness equality with acceptable error; 3) the sharpness detection threshold is relatively consistent across subjects and across image contents, compared with the sharpness preference; 4) the average level of preferred sharpness is consistently higher than the detection threshold across image contents and across subjects, which implies that observers in general prefer a sharpened image to the original image; and 5) the preferred level of sharpness has a strong dependency on image content.
We test perception of 3D spatial relations in 3D images rendered by a 3D display (Perspecta from Actuality Systems) and compare it to that of a high-resolution flat panel display. 3D images provide the observer with such depth cues as motion parallax and binocular disparity. Our 3D display is a device that renders a 3D image by displaying, in rapid succession, radial slices through the scene on a rotating screen. The image is contained in a glass globe and can be viewed from virtually any direction. In the psychophysical experiment several families of 3D objects are used as stimuli: primitive shapes (cylinders and cuboids), and complex objects (multi-story buildings, cars, and pieces of furniture). Each object has at least one plane of symmetry. On each trial an object or its “distorted” version is shown at an arbitrary orientation. The distortion is produced by stretching an object in a random direction by 40%. This distortion must eliminate the symmetry of an object. The subject's task is to decide whether or not the presented object is distorted under several viewing conditions (monocular/binocular, with/without motion parallax, and near/far). The subject's performance is measured by the discriminability d', which is a conventional dependent variable in signal detection experiments.
Prior theories have assumed that human problem solving involves estimating distances among states and performing search through the problem space. The role of mental representation in those theories was minimal. Results of our recent experiments suggest that humans are able to solve some difficult problems quickly and accurately. Specifically, in solving these problems humans do not seem to rely on distances or on search. It is quite clear that producing good solutions without performing search requires a very effective mental representation. In this paper we concentrate on studying the nature of this representation. Our theory takes the form of a graph pyramid. To verify the psychological plausibility of this theory we tested subjects in a Euclidean Traveling Salesman Problem in the presence of obstacles. The role of the number and size of obstacles was tested for problems with 6-50 cities. We analyzed the effect of experimental conditions on solution time per city and on solution error. The main result is that time per city is systematically affected only by the size of obstacles, but not by their number, or by the number of cities.
We present a new CRT characterization technique that improves the accuracy of the characterization. This is achieved by optimizing the linear transformation matrix of the two stage model in the uniform CIE/L*a*b* space. We also introduce an approach to improve the characterization performance for low luminance colors. These methods are used to calibrate two CRT monitors and better accuracies are obtained compared to existing methods, especially for low luminance colors. We present a systematic way to adjust the white point of the monitor using hardware settings. This allows us to adjust the
monitor white accurately without losing any digital counts which is the case if a software approach is used. We propose a novel search algorithm to achieve very high accuracy calibration for experiments where a limited number of colors has to be displayed. We apply this search algorithm to the case of monochrome image display application and verify the performance our method.
There is growing body of experimental evidence showing that human perception and cognition involves mechanisms that can be adequately modeled by pyramid algorithms. The main aspect of those mechanisms is hierarchical clustering of information: visual images, spatial relations, and states as well as transformations of a problem. In this paper we review prior psychophysical and simulation results on visual size transformation, size discrimination, speed-accuracy tradeoff, figure-ground segregation, and the traveling salesman problem. We also present our new results on graph search and on the 15-puzzle.
Brunelleschi (1413) was the first to demonstrate that a 3D scene can be represented by a 2D perspective picture in such a way that retinal images produced by the scene and the picture are identical (subsequently, Leonardo pointed out that this is true only when the observer's eye is placed at the center of perspectivity that was used to produce this picture). It follows that in the absence of depth cues, the percepts are identical as well. A question arises as to the effect on the percept of viewing the picture from a point different from the center of perspectivity. According to Pirenne's (1970) theory, the percept involves taking the cues to the orientation and position of the picture relative to the observer into account, in order to compensate for the incorrect viewing point; when these cues are available, the percept is accurate. We will demonstrate a new visual phenomenon called `cuboid illusion' which contradicts Pirenne's theory. Our experimental results show that the percept of a 3D object from its picture systematically depends on the orientation and position of the picture relative to the observer even in the presence of many cues.
In recent years a number of image fidelity measures have been developed. These measures are designed to predict a person's ability to perceive differences between two nearly identical images. Successful image fidelity measures allow digital imaging developers to replace difficult and time consuming subjective evaluations with automated evaluations. Although a number of image fidelity measures have been developed, no method for evaluating and comparing the accuracy of these measures has been commonly accepted. In this paper we describe a new method for evaluating image fidelity measures. The method involves comparing spatially localized ratings from a human subject with distortion maps generated by an image fidelity measure.
Reliable image quality assessments are necessary for evaluating digital imaging methods (halftoning techniques) and products (printers, displays). Typically the quality of the imaging method or product is evaluated by comparing the fidelity of an image before and after processing by the imaging method or product. It is well established that simple approaches like mean squared error do not provide meaningful measures of image fidelity. A number of image fidelity metrics have been developed whose goal was to predict the amount of differences that would be visible to a human observer. In this paper we outline a new model of the human visual system (HVS) and show how this model can be used in image quality assessment. Our model departs from previous approaches in three ways: (1) We use a physiologically and psychophysically plausible Gabor pyramid to model a receptive field decomposition; (2) We use psychophysical experiments that directly assess the percept we wish to model; and (3) We model discrimination performance by using discrimination thresholds instead of detection thresholds. The first psychophysical experiment tested the visual system's sensitivity as a function of spatial frequency, orientation, and average luminance. The second experiment tested the relation between contrast detection and contrast discrimination.
This paper considers shape constancy, which is a fundamental perceptual phenomenon. Shape constancy refers to the fact that the percept of the shape of a given object remains constant despite changes in the object's retinal image. The image may change because of changes in the orientation of the object relative to the observer. In conventional approaches based on the concept of groups of transformations, formulating a theory of shape constancy requires that a group that adequately represents the viewing conditions be identified first. As a results, a given theory holds only for the assumed group, and therefore many theories are needed, one theory for each viewing condition. It is conjectured that explaining shape constancy requires a new approach, in which one theory holds across different viewing conditions. In this new approach, shape constancy is explained by using geometrical properties of image formation under perspective. Image formation is not represented by a group of transformations, but it leads to a theory that is more general than prior theories it explains shape constancy in the case of monocular and binocular viewing for both the stationary and the active observer.
To determine the effectiveness of stereo imaging in aiding the detection of objects in a scene, we are conducting experiments in which subjects are shown computer-generated stereo and mono images and are asked to determine if there is an object with particular characteristics in the image. The experimental data are analyzed using receiver operating characteristic (ROC) approaches to determine which types of objects may be easier to detect using stereo viewing. In this paper, issues that rise in the design of ROC studies to determine the statistical effectiveness of stereo imagery are discussed. These include traps and pitfalls such as varying viewing conditions, image intensity differences, ghosting, flicker, the speed/accuracy tradeoff, subjects' stereo acuity, and degree of difficulty in the discrimination task. Our experimental results show that when these problems are properly addressed, stereo viewing increases the sensitivity and specificity of observer performance in detecting subtle features in simulated x- ray transmission images.
We have conducted a systematic study of the effectiveness of stereo imaging in aiding the detection of abnormalities in soft tissue geometry and density. Experiments have been designed whereby subjects are shown computer generated stereo mammograms and are asked to determine if there is an abnormal object (brighter or regularly arranged) in the mammogram. The experimental data have been analyzed using ROC approaches to determine which types of soft tissue abnormalities may be easier to detect using stereo viewing. Preliminary results suggest that stereo imaging may be more effective than current single view or biplane mammography for detecting abnormal densities and arrangements in soft tissue elements.
Shape recognition plays a central role in object recognition and while a large number of technical papers on shape representation and similarity exist, shape recognition still remains an unsolved problem. The main goal of this course is to present results on human shape perception, including 2D as well as 3D shape representation and similarity. This course will provide needed background knowledge about important features of shape recognition from the point of view of human visual perception. It will include a tutorial about human shape perception with an emphasis on the most important psychophysical experiments on shape recognition and reconstruction that have been performed during the last 100 years, as well as on computational models of human shape perception. The course will also include an overview of computational approaches to shape similarity and to shape-based retrieval in multimedia databases. We will report an experimental evaluation of their performance on the dataset used in MPEG-7 Core Experiment CE-Shape-1. This dataset provides a unique opportunity to compare various shape descriptors.