In this keynote I will present some of the work from our virtual reality laboratory at the Max Planck Institute for Biological Cybernetics in Tübingen. Our research philosophy to understand the brain is to study human information processing in an experimental setting as close as possible to our natural environment. Using computer graphics and virtual reality technology we can now study perception not only in a well controlled natural setting but also in a closed perception-action loop, in which the action of the observer will also change the input to our senses. In psychophysical studies we could show that humans can integrate multimodal sensory information in a statistically optimal way, in which cues are weighted according to their reliability. A better understanding of multimodal sensor fusion will allow us to build new virtual reality platforms in which the design effort for visual, auditory, haptic, vestibular and proprioceptive simulation is influenced by the weight of each cue in multimodal sensor fusion.
Many current optical flow algorithms are not suited for practical implementations such as tracking because they either require massively parallel supercomputers, specialized hardware, or up to several hours on a scientific workstation. One particular reason for this is the quadratic nature of the search algorithms used in these problems. We present two modifications to these types of algorithms which can convert quadratic-time optical flow algorithms into linear-time ones. The first uses a variable image sampling rate which trades space for time and yields an algorithm that is at worst linear, and at best constant, in the speed of the moving objects in the image. This technique finds the fastest motion in an image and is ideal for tracking, since the fastest moving objects in a robot's environment are generally the most interesting. The second modification extends this approach to create a multiple-speed optical flow field by transforming quadratic searches over space into linear searches in time. This space-time inversion has the effect of searching for faster moving objects in each image earlier than for slower moving ones, with additional effort being exerted to search for slower objects only when desired. A system of velocity masking allows a tradeoff of angular resolution (but not magnitude resolution) for an optical flow algorithm only linear in the range of velocities present.
The Bayesian approach to vision provides a fruitful theoretical framework for integrating different depth modules. In this formulation depth can be represented by one or more surfaces. Prior probabilities, corresponding to natural constraints, can be defined on these surfaces to avoid the ill-posedness of vision. We advocate strong coupling between different depth cues, so that the different modules can interact during computation. This framework is rich enough to accommodate straightforwardly both consonant and contradictory cue integration, by the use of binary decision units. These units can be interpreted in terms of robust statistics. A number of existing psychophysical experiments can be understood within this framework.