The estimation of human attention has recently been addressed in the context of human robot interaction. Today, joint
work spaces already exist and challenge cooperating systems to jointly focus on common objects, scenes and work
niches. With the advent of Google glasses and increasingly affordable wearable eye-tracking, monitoring of human
attention will soon become ubiquitous. The presented work describes for the first time a method for the estimation of
human fixations in 3D environments that does not require any artificial landmarks in the field of view and enables
attention mapping in 3D models. It enables full 3D recovery of the human view frustum and the gaze pointer in a
previously acquired 3D model of the environment in real time. The study on the precision of this method reports a mean
projection error ≈1.1 cm and a mean angle error ≈0.6° within the chosen 3D model - the precision does not go below the
one of the technical instrument (≈1°). This innovative methodology will open new opportunities for joint attention
studies as well as for bringing new potential into automated processing for human factors technologies.
A vision system designed to detect people in complex backgrounds is presented. The purpose of the proposed algorithms is to allow the identification and tracking of single persons under difficult conditions - in crowded places, under partial occlusion and in low resolution images. In order to detect people reliably, we combine different information channels from video streams. Most emphasis for the initialization of trajectories and the subsequent pedestrian recognition is placed on the detection of the head-shoulder contour. In the first step a simple and fast shape model selects promising candidates, then a local active shape model is matched against the gradients found in the image with the help of a cost function.
Texture analysis in the form of co-occurrence features ensures that shape candidates form coherent trajectories over time. In order to reduce the amount of false positives and to become more robust, a pattern analysis step based on Eigenimage analysis is presented.
The cues which form the basis of pedestrian detection are integrated into a tracking algorithm which uses the shape information for initial pedestrian detection and verification, propagates positions into new frames using local motion and matches pedestrians with the help of texture information.
We describe a mobile vision system that is capable of automated object identification using images captured from a PDA or a camera phone. We present a solution for the enabling technology of outdoors vision based object recognition that will extend state-of-the-art location and context aware services towards object based awareness in urban environments. In the proposed application scenario, tourist pedestrians are equipped with GPS, W-LAN and a camera attached to a PDA or a camera phone. They are interested whether their field of
view contains tourist sights that would point to more detailed information. Multimedia type data about related history, the architecture, or other related cultural context of historic or artistic relevance might be explored by a mobile user who is intending to learn within the urban environment. Learning from ambient cues is in this way achieved by pointing the device towards the urban sight, capturing an image, and consequently getting
information about the object on site and within the focus of attention, i.e., the users current field of view.
A highly challenging object detection task is the recognition of relevant events in outdoor applications, such as it is the case in sport broadcasts. Changing illumination, different weather conditions, and noise in the imaging process are the most important issues that require a truly robust detection system. The original contribution of this work is to take advantage of a dynamic integration of object beliefs from different evidences of spatial and temporal context to receive a recursively updated object hypothesis, with the aim to render object detection more robust. The object representation is outlined in a probabilistic framework to enable reasoning on multiple instances of detection results and decision making based on statistical evaluations. The representation is based on the local appearances of the objects, and therefore makes the interpretation more robust to occlusion by enabling reasoning based on spatial context between the appearances of individual object parts. Reasoning is driven by the evaluation of Bayesian decision fusion of the single probabilistic local image interpretations.
The detection system is evaluated on the detection of company logos in extensive video material from Formula One broadcasts. The experimental results demonstrate that fusion is crucial to improve robustness and accuracy of the outdoor detection system.
Object detection is an enabling technology that plays a key role in many application areas, such as content based media retrieval. Attentive cognitive vision systems are here proposed where the focus of attention is directed towards the most relevant target. The most promising information is interpreted in a sequential process that dynamically makes use of knowledge and that enables spatial reasoning on the local object information. The presented work proposes an innovative application of attention mechanisms for object detection which is most general in its understanding of information and action selection. The attentive detection system uses a cascade of increasingly complex classifiers for the stepwise identification of regions of interest (ROIs) and recursively refined object hypotheses. While the most coarse classifiers are used to determine first approximations on a region of interest in the input image, more complex classifiers are used for more refined ROIs to give more confident estimates. Objects are modelled by local appearance based representations and in terms of posterior distributions of the object samples in eigenspace. The discrimination function to discern between objects is modeled by a radial basis functions (RBF) network that has been compared with alternative networks and been proved consistent and superior to other artifical neural networks for appearance based object recognition. The experiments were led for the automatic detection of brand objects in Formula One broadcasts within the European Commission's cognitive vision project DETECT.
Omnidirectional visual sensors have been successfully introduced recently to robot navigation, providing improved localization performances and a more stable path following behavior. As a consequence of the sensor characteristics, occlusion of the entire panoramic visual field becomes very unlikely. The presented work exploits these characteristics providing a Bayesian framework to gain even partial evidence about a current location by applying decision fusion on the multidirectional visual context. The panoramic image is first partitioned into a fixed number of overlapping unidirectional camera views, i.e., appearance sectors. For each sector image one learns then a posterior distribution over potential locations within a predefined environment. The ambiguity in a local sector interpretation is then resolved by Bayesian reasoning over the spatial context of the current position, discriminating occlusions which do not fit to the appearance model of subsequent sector views. The results from navigation experiments in an office using a robot equipped with an omnidirectional camera demonstrate that the Bayesian reasoning allows highly occlusion tolerant localization to enable visual navigation of autonomous robots even at crowded places such as offices, factories and urban environments.