Our Multi-INT Data Association Tool (MIDAT) learns patterns of life (POL) of a geographical area from video analyst observations called out in textual reporting. Typical approaches to learning POLs from video make use of computer vision algorithms to extract locations in space and time of various activities. Such approaches are subject to the detection and tracking performance of the video processing algorithms. Numerous examples of human analysts monitoring live video streams annotating or “calling out” relevant entities and activities exist, such as security analysis, crime-scene forensics, news reports, and sports commentary. This user description typically corresponds with textual capture, such as chat. Although the purpose of these text products is primarily to describe events as they happen, organizations typically archive the reports for extended periods. This archive provides a basis to build POLs. Such POLs are useful for diagnosis to assess activities in an area based on historical context, and for consumers of products, who gain an understanding of historical patterns. MIDAT combines natural language processing, multi-hypothesis tracking, and Multi-INT Activity Pattern Learning and Exploitation (MAPLE) technologies in an end-to-end lab prototype that processes textual products produced by video analysts, infers POLs, and highlights anomalies relative to those POLs with links to “tracks" of related activities performed by the same entity. MIDAT technologies perform well, achieving, for example, a 90% F1-value on extracting activities from the textual reports.
Continuous classification of dismount types (including gender, age, ethnicity) and their activities (such as walking, running) evolving over space and time is challenging. Limited sensor resolution (often exacerbated as a function of platform standoff distance) and clutter from shadows in dense target environments, unfavorable environmental conditions, and the normal properties of real data all contribute to the challenge. The unique and innovative aspect of our approach is a synthesis of multimodal signal processing with incremental non‐parametric, hierarchical Bayesian machine learning methods to create a new kind of target classification architecture. This architecture is designed from the ground up to optimally exploit correlations among the multiple sensing modalities (multimodal data fusion) and rapidly and continuously learns (online self‐tuning) patterns of distinct classes of dismounts given little a priori information. This increases classification performance in the presence of challenges posed by anti‐access/area denial (A2/AD) sensing. To fuse multimodal features, Long-range Dismount Activity Classification (LODAC) develops a novel statistical information theoretic approach for multimodal data fusion that jointly models multimodal data (i.e., a probabilistic model for cross‐modal signal generation) and discovers the critical cross‐modal correlations by identifying components (features) with maximal mutual information (MI) which is efficiently estimated using non‐parametric entropy models. LODAC develops a generic probabilistic pattern learning and classification framework based on a new class of hierarchical Bayesian learning algorithms for efficiently discovering recurring patterns (classes of dismounts) in multiple simultaneous time series (sensor modalities) at multiple levels of feature granularity.
Military operations in urban areas often require detailed knowledge of the location and identity of commonly occurring
objects and spatial features. The ability to rapidly acquire and reason over urban scenes is critically important to such
tasks as mission and route planning, visibility prediction, communications simulation, target recognition, and inference
of higher-level form and function. Under DARPA's Urban Reasoning and Geospatial ExploitatioN Technology
(URGENT) Program, the BAE Systems team has developed a system that combines a suite of complementary feature
extraction and matching algorithms with higher-level inference and contextual reasoning to detect, segment, and classify
urban entities of interest in a fully automated fashion. Our system operates solely on colored 3D point clouds, and
considers object categories with a wide range of specificity (fire hydrants, windows, parking lots), scale (street lights,
roads, buildings, forests), and shape (compact shapes, extended regions, terrain). As no single method can recognize the
diverse set of categories under consideration, we have integrated multiple state-of-the-art technologies that couple
hierarchical associative reasoning with robust computer vision and machine learning techniques. Our solution leverages
contextual cues and evidence propagation from features to objects to scenes in order to exploit the combined descriptive
power of 3D shape, appearance, and learned inter-object spatial relationships. The result is a set of tools designed to
significantly enhance the productivity of analysts in exploiting emerging 3D data sources.
Even though the definition of the Joint Director of Laboratories (JDL) "fusion levels" were established in 1987,
published 1991, revised in 1999 and 2004, the meaning, effects, control and optimization of interactions among the
fusion levels have not as yet been fully explored and understood. Specifically, this is apparent from the abstract JDL
definitions of "Levels 2/3 Fusion" - situation and threat assessment (SA/TA), which involve deriving relations among
entities, e.g., the aggregation of object states (i.e., classification and location) in SA, while TA uses SA products to
estimate/predict the impact of actions/interactions effects on situations taken by the participant entities involved. Given
all the existing knowledge in the information fusion and human factors literature, (both prior to and after the introduction
of "fusion levels" in 1987) there are still open questions remaining in regard to implementation of knowledge
representation and reasoning methods under uncertainty to afford SA/TA. Therefore, to promote exchange of ideas and
to illuminate the historical, current and future issues associated with Levels 2/3 implementations, leading experts were
invited to present their respective views on various facets of this complex problem. This paper is a retrospective
annotated view of the invited panel discussion organized by Ivan Kadar (first author), supported by John Salerno, in
order to provide both a historical perspective of the evolution of the state-of-the-art (SOA) in higher-level "Levels 2/3"
information fusion implementations by looking back over the past ten or more years (before JDL), and based upon the
lessons learned to forecast where focus should be placed to further enhance and advance the SOA by addressing key
issues and challenges. In order to convey the panel discussion to audiences not present at the panel, annotated position
papers summarizing the panel presentation are included.
SeeCoast is a prototype US Coast Guard port and coastal area surveillance system that aims to reduce operator workload while maintaining optimal domain awareness by shifting their focus from having to detect events to being able to analyze and act upon the knowledge derived from automatically detected anomalous activities. The automated scene understanding capability provided by the baseline SeeCoast system (as currently installed at the Joint Harbor Operations Center at Hampton Roads, VA) results from the integration of several components. Machine vision technology processes the real-time video streams provided by USCG cameras to generate vessel track and classification (based on vessel length) information. A multi-INT fusion component generates a single, coherent track picture by combining information available from the video processor with that from surface surveillance radars and AIS reports. Based on this track picture, vessel activity is analyzed by SeeCoast to detect user-defined unsafe, illegal, and threatening vessel activities using a rule-based pattern recognizer and to detect anomalous vessel activities on the basis of automatically learned behavior normalcy models. Operators can optionally guide the learning system in the form of examples and counter-examples of activities of interest, and refine the performance of the learning system by confirming alerts or indicating examples of false alarms. The fused track picture also provides a basis for automated control and tasking of cameras to detect vessels in motion. Real-time visualization combining the products of all SeeCoast components in a common operating picture is provided by a thin web-based client.
SeeCoast extends the US Coast Guard Port Security and Monitoring system by adding capabilities to detect, classify, and
track vessels using electro-optic and infrared cameras, and also uses learned normalcy models of vessel activities in
order to generate alert cues for the watch-standers when anomalous behaviors occur. SeeCoast fuses the video data with
radar detections and Automatic Identification System (AIS) transponder data in order to generate composite fused tracks
for vessels approaching the port, as well as for vessels already in the port. Then, SeeCoast applies rule-based and
learning-based pattern recognition algorithms to alert the watch-standers to unsafe, illegal, threatening, and other
anomalous vessel activities. The prototype SeeCoast system has been deployed to Coast Guard sites in Virginia. This
paper provides an overview of the system and outlines the lessons learned to date in applying data fusion and automated
pattern recognition technology to the port security domain.