We are building a cognitive vision system for mobile robots that works in a manner similar to the human vision
system, using saccadic, vergence and pursuit movements to extract information from visual input. At each fixation,
the system builds a 3D model of a small region, combining information about distance, shape, texture and motion to
create a local dynamic spatial model. These local 3D models are composed to create an overall 3D model of the
robot and its environment. This approach turns the computer vision problem into a search problem whose goal is the
acquisition of sufficient spatial understanding for the robot to succeed at its tasks.
The research hypothesis of this work is that the movements of the robot’s cameras are only those that are necessary
to build a sufficiently accurate world model for the robot’s current goals. For example, if the goal is to navigate
through a room, the model needs to contain any obstacles that would be encountered, giving their approximate
positions and sizes. Other information does not need to be rendered into the virtual world, so this approach trades
model accuracy for speed.