Computer systems that have the capability of analyzing complex and dynamic scenes play an essential role in video annotation. Scenes can be complex in such a way that there are many cluttered objects with different colors, shapes and sizes, and can be dynamic with multiple interacting moving objects and a constantly changing background. In reality, there are many scenes that are complex, dynamic, and challenging enough for computers to describe. These scenes include games of sports, air traffic, car traffic, street intersections, and cloud transformations. Our research is about the challenge of inventing a descriptive computer system that analyzes scenes of hockey games where multiple moving players interact with each other on a constantly moving background due to camera motions. Ultimately, such a computer system should be able to acquire reliable data by extracting the players’ motion as their trajectories, querying them by analyzing the descriptive information of data, and predict the motions of some hockey players based on the result of the query. Among these three major aspects of the system, we primarily focus on visual information of the scenes, that is, how to automatically acquire motion trajectories of hockey players from video. More accurately, we automatically analyze the hockey scenes by estimating parameters (i.e., pan, tilt, and zoom) of the broadcast cameras, tracking hockey players in those scenes, and constructing a visual description of the data by displaying trajectories of those players. Many technical problems in vision such as fast and unpredictable players' motions and rapid camera motions make our challenge worth tackling. To the best of our knowledge, there have not been any automatic video annotation systems for hockey developed in the past. Although there are many obstacles to overcome, our efforts and accomplishments would hopefully establish the infrastructure of the automatic hockey annotation system and become a milestone for research in automatic video annotation in this domain.
Our goal is to enable queries about the motion of objects in a video sequence. Tracking objects in video is a difficult task, involving signal analysis, estimation and often semantic information particular to the targets. That is not our focus-rather, we assume that tracking is done, and turn to the task of representing the motion for query. The position over time of an object result in a motion trajectory, i.e., a sequence of locations. We propose a novel representation of trajectories: we use the path and speed curves as the motion representation. The path curve records the position of the object while the speed curve records the magnitude of its velocity. This separates positional information from temporal information, since position may be more important in specifying a trajectory than the actual velocity of a trajectory. Velocity can be recovered from our representation. We derive a local geometric description of the curves invariant under scaling and rigid motion. We adopt a warping method in matching so that it is roust to variation in feature vectors. We show that R-trees can be used to index the multidimensional features so that search will be efficient and scalable to a large database.
Proc. SPIE. 3974, Image and Video Communications and Processing 2000
KEYWORDS: Image processing algorithms and systems, Data compression, Data storage, Image segmentation, Image processing, Computer science, Image filtering, Object recognition, Binary data, RGB color model
The goal of our work is efficient clustering of object pixels from a sequence of live images for use in real-time applications including object recognition and tracking. We propose a novel approach to clustering object pixels into separate objects using density and spatial cues. The suggested method runs in linear time, accounts for image noise and yields real-time performance.
Two stereo-vision-based mobile robots navigate and autonomously explore their environment safely while building occupancy grid maps of the environment. The robots maintain position estimates within a global coordinate frame using landmark recognition. This allows them to build a common map by sharing position information and stereo data. Stereo vision processing and map updates are done at 3 Hz and the robots move at speeds of 200 cm/s. Cooperative mapping is achieved through autonomous exploration of unstructured and dynamic environments. The map is constructed conservatively, so as to be useful for collision-free path planning. Each robot maintains a separate copy of a shared map, and then posts updates to the common map when it returns to observe a landmark at home base. Issues include synchronization, mutual localization, navigation, exploration, registration of maps, merging repeated views (fusion), centralized vs decentralized maps.
We describe a new interaction technique based on head-coupling designed to free a user's hands for other tasks such as 3D tracing. We used head motion to manipulate the 3D view of confocal microscope data of neurons. The simplest interaction mode shows the projected view from the user's eye point. Other modes implement more sophisticated motions such as ratcheting. Tracing is done by marking a point of interest in two different views controlled by the head. Under suitable restrictions this locates a point in 3D corresponding to a feature of interest in the data set. By repeatedly marking points in this way the user traces a 3D path through the data. Because of pointing errors and rendering artifacts, the path traced by a user needs to be refined by consulting the original data set. Using an rough data model, along with several successive points in a feature, the program can automatically supply the next point thereby implementing a form of semi-automatic tracing. An advantage of our method is that we have access to the entire 3D volume data, so feature geometry can be computed with reference to the complete 3D data and not just to a 2D projected view.
Many computational vision routines can be regarded as recognition and retrieval of echoes in space or time. Cepstral analysis is a powerful nonlinear adaptive signal processing methodology widely used in many areas such as: echo retrieval and removal, speech processing and phoneme chunking, radar and sonar processing, seismology, medicine, image deblurring and restoration, and signal recovery. The aim of this paper is: (1) To provide a brief mathematical and historical review of cepstral techniques. (2) To introduce computational and performance improvements to power and differential cepstrum for use in detection of echoes; and to provide a comparison between these methods and the traditional cepstral techniques. (3) To apply cepstrum to visual tasks such as motion analysis and trinocular vision. And (4) to draw a brief comparison between cepstrum and other matching techniques. The computational and performance improvements introduced in this paper can e applied in other areas that frequently utilize cepstrum.
Design of a high-speed stereo vision system in analog VLSI technology is reported. The goal is to determine how the advantages of analog VLSI--small area, high speed, and low power-- can be exploited, and how the effects of its principal disadvantages--limited accuracy, inflexibility, and lack of storage capacity--can be minimized. Three stereo algorithms are considered, and a simulation study is presented to examine details of the interaction between algorithm and analog VLSI implementation. The Marr-Poggio-Drumheller algorithm is shown to be best suited for analog VLSI implementation. A CCD/CMOS stereo system implementation is proposed, capable of operation at 6000 image frame pairs per second for 48 X 48 images, and faster than frame rate operation on 256 X 256 binocular image pairs.