Many sensor systems such as security cameras and satellite photography are faced with the problem of where they should point their sensors at any given time. With directional control of the sensor, the amount of space available to cover far exceeds the field-of-view of the sensor. Given a task domain and a set of constraints, we seek coverage strategies that achieve effective area coverage of the environment. We develop metrics that measure the quality of the strategies and give a basis for comparison. In addition, we explore what it means for an area to be "covered" and how that is affected by the domain, the sensor constraints, and the algorithms. We built a testbed in which we implement and run various sensor coverage strategies and take measurements on their performance. We modeled the domain of a camera mounted on pan and tilt servos with appropriate constraints and time delays on movement. Next, we built several coverage strategies for selecting where the camera should look at any given time based on concepts such as force-mass systems, scripted movements, and the time since an area was last viewed. Finally, we describe several metrics with which we can compare the effectiveness of different coverage strategies. These metrics are based on such things as how well the whole space is covered, how relevant the covered areas are to the domain, how much time is spent acquiring data, how much time is wasted while moving the servos, and how well the strategies detect new objects moving through space.
Many vision research projects involve a sensor or camera doing one thing and doing it well. Fewer research projects have been done involving a sensor trying to satisfy simultaneous and conflicting tasks. Satisfying a task involves pointing the sensor in the direction demanded by the task. We seek ways to mitigate and select between competing tasks and also, if possible, merge the tasks together to be simultaneously achieved by the sensor. This would make a simple pan-tilt camera a very powerful instrument. These two approaches are task-selection and task-merging respectively. We built a simple testbed to implement our task-selection and task-merging schemes. We use a digital camera as our sensor attached to pan and tilt servos capable of pointing the sensor in different directions. We use three different types of tasks for our research: target tracking, surveillance coverage, and initiative. Target tracking is the task of following a target with a known set of features. Surveillance coverage is the task of ensuring that all areas of the space are routinely scanned by the sensor. Initiative is the task of focusing on new things of potential interest should they appear in the course of other activities. Given these heterogeneous task descriptions, we achieve task-selection by assigning priority functions to each task and letting the camera select among the tasks to service. To achieve task-merging, we introduce a concept called "task maps" that represent the regions of space the tasks wish to attend with the sensor. We then merge the task maps and select a region to attend that will satisfy multiple tasks at the same time if possible.
We are developing a distributed system for the tracking of people and objects in complex scenes and environments using biologically based algorithms. An important component of such a system is its ability to track targets from multiple cameras at multiple viewpoints. As such, our system must be able to extract and analyze the features of targets in a manner that is sufficiently invariant of viewpoints, so that they can share information about targets, for purposes such as tracking. Since biological organisms are able to describe targets to one another from very different visual perspectives, by discovering the mechanisms by which they understand objects, it is hoped such abilities can be imparted on a system of distributed agents with many camera viewpoints. Our current methodology draws from work on saliency and center surround competition among visual components that allows for real time location of targets without the need for prior information about the targets visual features. For instance, gestalt principles of color opponencies, continuity and motion form a basis to locate targets in a logical manner. From this, targets can be located and tracked relatively reliably for short periods. Features can then be extracted from salient targets allowing for a signature to be stored which describes the basic visual features of a target. This signature can then be used to share target information with other cameras, at other viewpoints, or may be used to create the prior information needed for other types of trackers. Here we discuss such a system, which, without the need for prior target feature information, extracts salient features from a scene, binds them and uses the bound features as a set for understanding trackable objects.