In this paper we will present an overview of our research into perception and biologically inspired modeling of illumination (flow) from 3D textures and the influence of roughness and illumination on material perception. Here 3D texture is defined as an image of an illuminated rough surface. In a series of theoretical and empirical papers we studied how we can estimate the illumination orientation (in the image plane) from 3D textures of globally flat samples. We found that the orientation can be estimated well by humans and computers using an approach based on second order statistics. This approach makes use of the dipole-like structures in 3D textures that are the results of illumination of bumps / throughs. For 3D objects, the local illumination direction varies over the object, resulting in surface illuminance flow. This again results in image illuminance flow in the image of a rough 3D object: the observable projection in the image of the field of local illumination orientations. Here we present results on image illuminance flow analysis for images from the Utrecht Oranges database, the Curet database and two vases. These results show that the image illuminance flow can be estimated robustly for various rough materials. In earlier studies we have shown that the image illuminance flow can be used to do shape and illumination inferences. Recently, in psychophysical experiments we found that adding 3D texture to a matte spherical object improves judgments of the direction and diffuseness of its illumination by human observers. This shows that human observers indeed use the illuminance flow as a cue for the illumination.
Looking at a picture fills part of the visual field. In the case of straight photographs there is a notion of the “Field of View” of the camera at the time of exposure. Is there a corresponding notion for the perception of the picture? In most cases the part of the visual field (as measured in degrees) filled by the picture will be quite different from the field of view of the camera. The case of works of arts is even more complicated, there need not even exist a well defined central view point. With several examples we show that there is essentially no notion of a corresponding “field of view” in pictorial perception. This is even the case for drawings in conventional linear perspective. Apparently the “mental eye” of the viewer is often unrelated to the geometry of the camera (or perspective center used in drawing). Observers often substitute templates instead of attempting an analysis of perspective.
The world is all physical reality (Higgs bosons, and so forth), the “environment” is a geographical locality (your city, …), the “Umwelt” is the totality of possible actions of the environment on the sensitive body surface of an agent (you, your dog, …) and the possible actions of the agent on the environment (mechanical, chemical, …), whereas the “innerworld” is what it is for the agent to be, that is awareness. Awareness is pre-personal, proto-conscious, and (perhaps) proto-rational. The various “worlds” described above are on distinct ontological levels. The world, and the environment are studied in the exact sciences, the Umwelt is studied by physiology and ethology. Ethology is like behavioristic psychology, with the difference that it applies to all animals. It skips the innerworld, e.g., it considers speech to be a movement of air molecules.The innerworld can only be known through first person reports, thus is intrinsically subjective. It can only be approached through “experimental phenomenology”, which is based on intersubjectivity among humans. In this setting speech may mean something in addition to the movements of molecules. These views lead to a model of vision as an “optical user interface”. It has consequences for many applications.
We present a novel setup in which real objects made of different materials can be mixed optically. For the materials we
chose mutually very different materials, which we assume to represent canonical modes. The appearance of 3D objects
consisting of any material can be described as linear superposition of 3D objects of different canonical materials, as in
"painterly mixes". In this paper we studied mixtures of matte, glossy and velvety objects, representing diffuse, forward
and asperity scattering modes.
Observers rated optical mixtures on four scales: matte-glossy, hard-soft, cold-warm, light-heavy. The ratings were done
for the three combinations of glossy, matte, and velvety green birds. For each combination we tested 7 weightings.
Matte-glossy ratings varied most over the stimuli and showed highest (most glossy) scores for the rather glossy bird and
lowest (most matte) for the rather velvety bird. Hard-soft and cold-warm were rated highest (most soft and warm) for
rather velvety and lowest (most hard and cold) for rather glossy birds. Light-heavy was rated only somewhat higher
(heavier) for rather glossy birds. The ratings varied systematically with the weights of the contributions, corresponding to
gradually changing mixtures of material modes. We discuss a range of possibilities for our novel setup.
A picture" is a at object covered with pigments in a certain pattern. Human observers, when looking "into" a
picture (photograph, painting, drawing, . . . say) often report to experience a three-dimensional "pictorial space."
This space is a mental entity, apparently triggered by so called pictorial cues. The latter are sub-structures of
color patterns that are pre-consciously designated by the observer as "cues," and that are often considered to
play a crucial role in the construction of pictorial space. In the case of the visual arts these structures are
often introduced by the artist with the intention to trigger certain experiences in prospective viewers, whereas
in the case of photographs the intentionality is limited to the viewer. We have explored various methods to
operationalize geometrical properties, typically relative to some observer perspective. Here perspective" is
to be understood in a very general, not necessarily geometric sense, akin to Gombrich's beholder's share".
Examples include pictorial depth, either in a metrical, or a mere ordinal sense. We nd that different observers
tend to agree remarkably well on ordinal relations, but show dramatic differences in metrical relations.
The egg-rolling behavior of the graylag goose is an often quoted example of a fixed-action pattern. The bird will
even attempt to roll a brick back to its nest! Despite excellent visual acuity it apparently takes a brick for an
egg." Evolution optimizes utility, not veridicality. Yet textbooks take it for a fact that human vision evolved
so as to approach veridical perception. How do humans manage to dodge the laws of evolution? I will show
that they don't, but that human vision is an idiosyncratic user interface. By way of an example I consider the
case of pictorial perception. Gleaning information from still images is an important human ability and is likely
to remain so for the foreseeable future. I will discuss a number of instances of extreme non-veridicality and
huge inter-observer variability. Despite their importance in applications (information dissemination, personnel
selection,...) such huge effects have remained undocumented in the literature, although they can be traced to
artistic conventions. The reason appears to be that conventional psychophysics-by design-fails to address the
qualitative, that is the meaningful, aspects of visual awareness whereas this is the very target of the visual arts.
In this study we demonstrate that touch decreases the ambiguity in a visual image. It has been previously
found that visual perception of three-dimensional shape is subject to certain variations. These variations can
be described by the affine transformation. While the visual system thus seems unable to capture the Euclidean
structure of a shape, touch could potentially be a useful source to disambiguate the image. Participants performed
a so-called 'attitude task' from which the structure of the perceived three-dimensional shape was calculated. One
group performed the task with only vision and a second group could touch the stimulus while viewing it. We found
that the consistency within the haptics+vision group was higher than in the vision-only group. Thus, haptics
decreases the visual ambiguity. Furthermore, we found that the touched shape was consistently perceived as
having more relief than the untouched the shape. It was also found that the direction of affine shear differences
within the two groups was more consistent when touch was used. We thus show that haptics has a significant
influence on the perception of pictorial relief.
Pictorial relief depends strongly on “cues” in the image. For isoluminant renderings some cues are missing, namely all information that is related to luminance contrast (<i>e.g.</i>, shading, atmospheric perspective). It has been suggested that spatial discrimination and especially pictorial space suffer badly in isoluminant conditions. We have investigated the issue through quantitative measurement of pictorial depth-structure under normal and isoluminant conditions. As stimuli we used monochrome halftone photographs, either as such, or “transposed” to Red/Green or Green/Red hue modulations. We used two distinct methods, one to probe pictorial pose (by way of correspondences settings between pictures of an object in different poses), the other to probe pictorial depth (by way of attitude settings of a gauge figure to a perceptual “fit”). In both experiments the depth reconstructions for Red/Green, Green/Red and monochrome conditions were very similar. Moreover, observers performed equally well in Red/Green, Green/Red and monochrome conditions. Thus, the general conclusion is that observers did not do markedly worse with the isoluminant Red/Green and Green/Red transposed images. Whereas the transposed images certainly looked weird, they were easily interpreted. Much of the structure of pictorial space was apparently preserved. Thus the notion that spatial representations are not sustained under isoluminant conditions should be applied with caution.
Shading is an important shape cue. Theories of 'shape from shading' assume that the shading is due to collimated beams irradiating opaque smooth Lambertian surface. Many objects are not at all opaque though. In cases of translucent objects photons penetrate the surface and enter the volume of the object, perhaps to re-emerge from the surface at another location. In such cases there can be no 'shading' proper. In the limit of very strong scattering these materials approach opaque Lambertian surfaces, in the limit of very weak scattering they approach transparent objects such as glass or water. A general theory of 'shading' in the case of translucent objects is not available. We study the optical properties for a number of geometries. In simple cases the scattering problem can be solved and we obtain models of 'shading' of translucent material that are distinct from the opaque Lambertian case. In more general cases one needs to make certain approximations. We show how to develop rules of thumb for generic cases. Such rules are likely candidates for models of human visual perception of wrinkles in human skin or articulations of cumulus clouds.
Observers routinely perceive 3D pictorial spaces when looking at 2D photographs. If an object is photographed in different poses, the photographs are different and so are the pictorial spaces. Observers can easily identify corresponding points in photographs of a single object in different poses. This is perhaps surprising, since no algorithm can presently do this except when extreme constraints are met. In this study we find correspondences and subsequently probe the pictorial surface attitude at corresponding points. Since we can fit a surface at a dens field of surface attitude samples, we obtain two surfaces in pictorial space that correspond to the two poses of the object. We explore the relation between these two surfaces. In Euclidean space the surfaces of an object in different poses are related through an isometry. Since pictorial space has a non-Euclidian structure the empirical correspondence is not an isometry though. The results allow us to draw conclusions concerning the geometrical structure of pictorial space. The results are of practical importance because many scenes are routinely documented through a sequence of photographs taken from different vantage points.
The optical structure sampled by the human observer is insufficient to determine the structure of a scene. The equivalence class of scenes that lead to the same optical structure can be worked out precisely for specific 'cues' (shading, texture, ...). If the 'observed scene' is a member of the correct class, the observation must be considered 'veridical,' even if the observed scene differs from the actual one. In many cases it is impossible to indicate the equivalence classes, and it therefore must remain undecided whether observations that deviate from physical reality should be denoted 'veridical' or not. For observations on the basis of images, such as straight photographs of physical scenes, the equivalence classes are unknown, but are certainly large. This case is especially important in the design of computer interfaces where a scene is being presented to the user as an image. We find that observations differ appreciably according to the precise task. The observer uses the freedom resulting from the iconic underdetermination to choose some idiosyncratic perspective by directing the 'mind's eye.' This can be demonstrated with simple means. Stretchings of depth by a few hundred percent and changes in viewing direction (not rotations, but shears) of tens of degrees are quite common.
We analyze material properties underlying visual appearance, such as surface bidirectional reflection distribution function (BRDF) and texture. We perform gonioradiometric measurements on bricks and fit the data to sets of models of specular and diffuse reflectance on rough surfaces in order to describe the composite reflection mechanisms, of the surfaces under study. We also acquire images and perform image texture statistical discrimination techniques to determine the textural differences in the surface appearance, resulting from the variation of illumination and viewing.
If one direction of (three-dimensional) space is singled out, it makes sense to formulate the description of embedded curves and surfaces in a frame that is adapted both to the embedded manifold and to the special direction, rather than a frame based upon the curvature landscape. Such a case occurs often in computer vision, where the image plane plays a role that differs essentially from the direction of view. The classical case is that of geomorphology, where the vertical is the singled out dimension. In computer vision the `ridges' and `(water-)courses' are recognized as important entities and attempts have been made to make the intuitive notions precise. These attempts repeat the unfortunate misunderstandings that marked the course of the late 19th century struggle to define the `Talweg' (equals `valley path' or `(water-)course'). We elucidate the problems and their solution via novel examples.