We describe a method for searching videos in large video databases based on the activity contents present in the videos.
Being able to search videos based on the contents (such as human activities) has many applications such as security,
surveillance, and other commercial applications such as on-line video search. Conventional video content-based retrieval
(CBR) systems are either feature based or semantics based, with the former trying to model the dynamics video contents
using the statistics of image features, and the latter relying on automated scene understanding of the video contents.
Neither approach has been successful. Our approach is inspired by the success of visual vocabulary of "Video Google"
by Sivic and Zisserman, and the work of Nister and Stewenius who showed that building a visual vocabulary tree can
improve the performance in both scalability and retrieval accuracy for 2-D images. We apply visual vocabulary and
vocabulary tree approach to spatio-temporal video descriptors for video indexing, and take advantage of the
discrimination power of these descriptors as well as the scalability of vocabulary tree for indexing. Furthermore, this
approach does not rely on any model-based activity recognition. In fact, training of the vocabulary tree is done off-line
using unlabeled data with unsupervised learning. Therefore the approach is widely applicable. Experimental results using
standard human activity recognition videos will be presented that demonstrate the feasibility of this approach.
Volitional search systems that assist the analyst by searching for specific targets or objects such as vehicles, factories,
airports, etc in wide area overhead imagery need to overcome multiple problems present in current manual and automatic
approaches. These problems include finding targets hidden in terabytes of information, relatively few pixels on targets,
long intervals between interesting regions, time consuming analysis requiring many analysts, no a priori representative
examples or templates of interest, detecting multiple classes of objects, and the need for very high detection rates and
very low false alarm rates.
This paper describes a conceptual analyst-centric framework that utilizes existing technology modules to search and
locate occurrences of targets of interest (e.g., buildings, mobile targets of military significance, factories, nuclear plants,
etc.), from video imagery of large areas. Our framework takes simple queries from the analyst and finds the queried
targets with relatively minimum interaction from the analyst. It uses a hybrid approach that combines biologically
inspired bottom up attention, socio-biologically inspired object recognition for volitionally recognizing targets, and
hierarchical Bayesian networks for modeling and representing the domain knowledge. This approach has the benefits of
high accuracy, low false alarm rate and can handle both low-level visual information and high-level domain knowledge
in a single framework. Such a system would be of immense help for search and rescue efforts, intelligence gathering,
change detection systems, and other surveillance systems.
We describe cognitive swarms, a new method for efficient visual recognition of objects in an image or video sequence
that combines feature-based object classification with search mechanisms based on swarm intelligence. Our approach
utilizes the particle swarm optimization algorithm (PSO), a population based evolutionary algorithm, which is effective
for optimization of a wide range of functions. PSO searches a multi-dimensional solution space for a global optimum
using a population or swarm of "particles" that cooperate using a low overhead communication scheme to search the
solution space efficiently. We use a system of local and global swarms to detect and track multiple objects in video
sequences. In our implementation, each particle in the swarm consists of a cascade of classifiers that utilize wavelet and
edge-symmetry features to recognize objects. PSO update equations are used to control the movement of the swarm in
solution space as the particles cooperate to find objects efficiently by maximizing classification confidence. By
performing this optimization, the classifier swarm finds objects in the scene, determines their size, and optimizes other
classifier parameters such as the object rotation angle. Map-based attention feedback is used to further increase the
efficiency of cognitive swarms. Performance results are presented for human and vehicle detection.
Behavior analysis deals with understanding and parsing a video sequence to generate a high-level description of object
actions and inter-object interactions. We describe a behavior recognition system that can model and detect spatio-temporal
interactions between detected entities in a visual scene by using ideas from swarm optimization, fuzzy graphs,
and object recognition. Two extensions of the Particle Swarm Optimization algorithm are explored, one uses classifier
based object recognition to first detect entities in video scenes and then employs fuzzy graphs to model the associations
while the second extension directly searches for graph based object associations. Our hierarchical generic event
detection scheme uses fuzzy graphical models for representing the spatial associations as well as the temporal dynamics
of the discovered scene entities. The spatial and temporal attributes of associated objects and groups of objects are
handled in separate layers in the hierarchy. We also describe a new behavior specification language that helps the user
easily describe the event that needs to be detected using simple linguistic or graphical queries. Preliminary results are
promising and studies are underway to evaluate the use of the system in more complicated scenarios.
Behavior analysis deals with understanding and parsing a video sequence to generate a high-level
description of object actions and inter-object interactions. In this paper, we describe a behavior recognition system that
can model and detect spatio-temporal interactions between detected entities in a visual scene by using ideas from swarm
optimization, fuzzy graphs, and object recognition. Extensions of Particle Swarm Optimization based approaches for
object recognition are first used to detect entities in video scenes. Our hierarchical generic event detection scheme uses
fuzzy graphical models for representing the spatial associations as well as the temporal dynamics of the discovered scene
entities. The spatial and temporal attributes of associated objects and groups of objects are handled in separate layers in
the hierarchy. We also describe a new behavior specification language that helps the user analyst easily describe the
event that needs to be detected using either simple linguistic queries or graphical queries. Our experimental results show
that the approach is promising for detecting complex behaviors.
Good pedestrian classifiers that analyze static images for presence of pedestrians are in existence. However, even a low false positive error rating is sufficient to flood a real system with false warnings. We address the problem of pedestrian motion (gait) modeling and recognition using sequences of images rather than static individual frames, thereby exploiting information in the dynamics. We use two different representations and corresponding distances for gait sequences. In the first a gait is represented as a manifold in a lower dimensional space corresponding to gait images. In the second a gait image sequence is represented as the output of a dynamical system whose underlying driving process is an action like walking or running. We examine distance functions corresponding to these representations. For dynamical systems we formulate distances derived based on parameters of the system taking into account both the structure of the output space and the dynamics within it. Given appearance based models we present results demonstrating the discriminative power of the proposed distances
We present a new approach for extending the particle swarm optimization algorithm to multi-optima problems by using ideas from possibility theory. An elastic constraint is used to let the particles dynamically explore the solution space in two phases. In the exploratory phase, particles explore the space in an effort to track the global minima while also traversing the local minima. In the exploitatory phase, particles disperse in the local neighborhoods to locate the best local minima. The proposed PPSO has been applied to data clustering and object detection. Our preliminary results indicate that the proposed approach is efficient and robust.