We present a prototype video tracking and person categorization system that uses face and person soft biometric features
to tag people while tracking them in multiple camera views. Our approach takes advantage of temporal aspect of video
by extracting and accumulating feasible soft biometric features for each person in every frame to build a dynamic soft
biometric feature list for each tracked person in surveillance videos. We developed algorithms for extracting face soft
biometric features to achieve gender and ethnicity classification and session soft biometric features to aid in camera
hand-off in surveillance videos with low resolution and uncontrolled illumination. To train and test our face soft
biometry algorithms, we collected over 1500 face images from both genders and three ethnicity groups with various
sizes, poses and illumination. These soft biometric feature extractors and classifiers are implemented on our existing
video content extraction platform to enhance video surveillance tasks. Our algorithms achieved promising results for
gender and ethnicity classification, and tracked person re-identification for camera hand-off on low to good quality
surveillance and broadcast videos. By utilizing the proposed system, a high level description of extracted person's soft
biometric data can be stored to use later for different purposes, such as to provide categorical information of people, to
create database partitions to accelerate searches in responding to user queries, and to track people between cameras.
In this paper we describe a prototype surveillance system that leverages smart sensor motes, intelligent video, and
Sensor Web technologies to aid in large area monitoring operations and to enhance the security of borders and
critical infrastructures. Intelligent video has emerged as a promising tool amid growing concern about border
security and vulnerable entry points. However, numerous barriers exist that limit the effectiveness of surveillance
video in large area protection; such as the number of cameras needed to provide coverage, large volumes of data to
be processed and disseminated, lack of smart sensors to detect potential threats and limited bandwidth to capture and
distribute video data. We present a concept prototype that addresses these obstacles by employing a Smart Video
Node in a Sensor Web framework. Smart Video Node (SVN) is an IP video camera with automated event detection
capability. SVNs are cued by inexpensive sensor motes to detect the existence of humans or vehicles. Based on
sensor motes' observations cameras are slewed in to observe the activity and automated video analysis detects
potential threats to be disseminated as "alerts". Sensor Web framework enables quick and efficient identification of
available sensors, collects data from disparate sensors, automatically tasks various sensors based on observations or
events received from other sensors, and receives and disseminates alerts from multiple sensors. The prototype
system is implemented by leveraging intuVision's intelligent video, Northrop Grumman's sensor motes and
SensorWeb technologies. Implementation of a deployable system with Smart Video Nodes and sensor motes within
the SensorWeb platform is currently underway. The final product will have many applications in commercial,
government and military systems.
Proc. SPIE. 6806, Human Vision and Electronic Imaging XIII
KEYWORDS: Visual process modeling, Detection and tracking algorithms, Visualization, Video, Error analysis, Visual system, Zoom lenses, Human vision and color perception, Information visualization, Neurons
Multiple Object Tracking (MOT) experiments show that human observers can track over several seconds up to five
moving targets among several moving distractors. We extended these studies by designing modified MOT experiments
to investigate the spatio-temporal characteristics of human visuo-cognitive mechanisms for tracking and applied the
findings and insights obtained from these experiments in designing computational multiple object tracking algorithms.
Recent studies indicate that attention both enhances the neural activity of relevant information and suppresses the
irrelevant visual information in the surround. Results of our experiments suggest that the suppressive surround of
attention extends up to 4 deg from the target stimulus, and it takes at least 100 ms to build it. We suggest that when the
attentional windows corresponding to separate target regions are spatially close, they can be grouped to form a single
attentional window to avoid interference originating from suppressive surrounds. The grouping experiment results
indicate that the attentional windows are grouped into a single one when the distance between them is less than 1.5 deg.
Preliminary implementation of the suppressive surround concept in our computational video object tracker resulted in
less number of unnecessary object merges in computational video tracking experiments.
This paper describes an automated video analysis based monitoring system with processing at the sensor edge to watch and report certain predetermined events and unusual activities in remote areas that may be in unfriendly zones. The prototype system developed here involves content extraction from video streams collected by unattended ground cameras, tracking of objects, detection of events, and assessment of scenes for anomalous situations. The application requirements impose efficiency constraints on video analysis algorithms due to low-power on the sensor processing board. We present efficient video analysis algorithms for detection, tracking and classification of objects, analysis of extracted object and scene information to detect specific events as well as anomalous or novel situations at the video camera level. Our multi-tier and modular video analysis approach uses fast a space-based peripheral vision component for quick spatially based tracking of objects, detailed object- or scene-based feature extractors and data driven Support Vector Machine (SVM) classifiers that handle feature-based analysis at multiple data levels. Our algorithms are developed and tested on PC platform but designed to match the processing and power limitations of the target hardware platform. The video object detection and tracking components have been implemented on Texas Instruments DM642 evaluation board for assessing the feasibility of the prototype system.
In this paper we describe an Advanced Digital Video Surveillance system based on TASC-developed object behavior based video analysis and indexing prototype. The advantages of using video analysis in surveillance and physical security applications are twofold. Firstly, the ability to automatically analyze the surveillance video contents facilitates timely detection of events that require immediate attention. Secondly, the amount of video to be archived can be reduced considerably by recording only the portions of video that include behaviors and events of interest from vast amount of surveillance data being collected everyday. Our object-behavior and event based indexing paradigm for video data treats an identifiable object behavior, action or event as the basic indexing unit facilitating efficient querying and report generation as well as derivation of statistical information about the behavior patterns over periods of time. We describe our methodology and present preliminary results in near real time behavior and event detection.
In this paper we present a new method for indexing video data based on the descriptions of object behaviors within a scene. The behaviors of objects over time is the most important feature in a scene and make up the semantic contents of the scene. The indices generated based on these temporal features are complementary to still image features for achieving fast and efficient search and retrieval of the video contents. To provide the conceptual framework for indexing based on the object behaviors we propose an object Behavior Description Scheme (BDS) for video data. The Behavior Description Scheme is based on spatio-temporal features and the associated object behavior descriptions are directly extractable from the video data itself (and support data) facilitating automatic indexing.