Herein we describe theory and algorithms for detecting covariance structures in large, noisy data sets. Our work uses ideas from matrix completion and robust principal component analysis to detect the presence of low-rank covariance matrices, even when the data is noisy, distorted by large corruptions, and only partially observed. In fact, the ability to handle partial observations combined with ideas from randomized algorithms for matrix decomposition enables us to produce asymptotically fast algorithms. Herein we will provide numerical demonstrations of the methods and their convergence properties. While such methods have applicability to many problems, including mathematical finance, crime analysis, and other large-scale sensor fusion problems, our inspiration arises from applying these methods in the context of cyber network intrusion detection.
This paper presents an approach to attribute estimation incorporating data association ambiguity. In modern tracking
systems, time pressures often leave all but the most likely data association alternatives unexplored, possibly producing track
inaccuracies. Numerica's Bayesian Network Tracking Database, a key part of its Tracker Adjunct Processor, captures and
manages the data association ambiguity for further analysis and possible ambiguity reduction/resolution using subsequent
Attributes are non-kinematic discrete sample space sensor data. They may be as distinctive as aircraft ID, or as broad as
friend or foe. Attribute data may provide improvements to data association by a process known as Attribute Aided Tracking
(AAT). Indeed, certain uniquely identifying attributes (e.g. aircraft ID), when continually reported, can be used to define
data association (tracks are the collections of observations with the same ID). However, attribute data arriving infrequently,
combined with erroneous choices from ambiguous data associations, can produce incorrect attribute and kinematic state
Ambiguous data associations define the tracks that are entangled with each other. Attribute data observed on an entangled
track then modify the attribute estimates on all tracks entangled with it. For example, if a red track and a blue
track pass through a region of data association ambiguity, these tracks become entangled. Later red observations on one
entangled track make the other track more blue, and reduce the data association ambiguity. Methods for this analysis have
been derived and implemented for efficient forward filtering and forensic analysis.
Effective multi-sensor, multi-target, distributed composite tracking requires the management of limited network
bandwidth. In this paper we derive from first principles a value of information for measurements that
can be used to sort the measurements in order from most to least valuable. We show the information metric
must account for the models and filters used by the composite tracking system. We describe how this value
of information can be used to optimize bandwidth utilization and illustrate its effectiveness using simulations
that involve lossy and latent network models.
We present a theory and algorithm for detecting and classifying weak, distributed patterns in network data
that provide actionable information with quantiable measures of uncertainty. Our work demonstrates the
eectiveness of space-time inference on graphs, robust matrix completion, and second order analysis for the
detection of distributed patterns that are not discernible at the level of individual nodes. Motivated by the
importance of the problem, we are specically interested in detecting weak patterns in computer networks related
to Cyber Situational Awareness. Our focus is on scenarios where the nodes (terminals, routers, servers, etc.)
are sensors that provide measurements (of packet rates, user activity, central processing unit usage, etc.) that,
when viewed independently, cannot provide a denitive determination of the underlying pattern, but when fused
with data from across the network both spatially and temporally, the relevant patterns emerge. The approach
is applicable to many types of sensor networks including computer networks, wireless networks, mobile sensor
networks, and social networks, as well as in contexts such as databases and disease outbreaks.
Chemical and biological monitoring systems are faced with the challenge of detecting weak signals from contam-
inants of interest while at the same time maintaining extremely low false alarm rates. We present methods to
control the number of false alarms while maintaining power to detect; evaluating these methods on a fixed sensor
grid. Contaminants are detected using signals produced from underlying sensor-specific detection algorithms.
By learning from past data, an adaptive background model is constructed and used with a multi-hypothesis
testing method to control the false alarm rate.
Detection methods for chemical/biological releases often depend on specific models for release types and
missed detection rates at the sensors. This can be problematic in field situations where environment specific
effects can alter both a sensor's false alarm and missed detection characteristics. Using field data, the false
alarm statistics of a given sensor can be learned and used for inference; however the missed detection statistics
for a sensor are not observable while in the field. As a result, we pursue methods that do not rely on accurate
estimates of a sensor's missed detection rate. This leads to the development of the Adaptive Regions Method
that under certain assumptions is designed to conservatively control the expected rate of false alarms produced
by a fusion system over time, while maintaining power to detect.
In this paper we describe an approach for the detection and classication of weak, distributed patterns in sensor
networks. Of course, before one can begin development of a pattern detection algorithm, one must rst dene the
term "pattern", which by nature is a broad and inclusive term. One of the key aspects of our work is a denition
of pattern that has already proven eective in detecting anomalies in real world data. While designing detection
algorithms for all classes of patterns in all types of networks sounds appealing, this approach would almost
certainly require heuristic methods and only cursory statements of performance. Rather, we have specically
studied the problem of intrusion detection in computer networks in which a pattern is an abnormal or unexpected
spatio-temporal dependence in the data collected across the nodes. We do not attempt to match an a priori
template, but instead have developed algorithms that allow the pattern to reveal itself in the data by way of
dependence or independence of observed time series. Although the problem is complex and challenging, recent
advances in ℓ<sub>1</sub> techniques for robust matrix completion, compressed sensing, and correlation detection provide
promising opportunities for progress. Our key contribution to this body of work is the development of methods
that make an accounting of uncertainty in the measurements on which the inferences are based. The performance
of our methods will be demonstrated on real world data, including measured data from the Abilene Internet2
False alarms generated by sensors pose a substantial problem to a variety of fusion applications. We focus
on situations where the frequency of a genuine alarm is "rare" but the false alarm rate is high. The goal is
to mitigate the false alarms while retaining power to detect true events. We propose to utilize data streams
contaminated by false alarms (generated in the field) to compute statistics on a single sensor's misclassification
rate. The nominal misclassification rate of a deployed sensor is often suspect because it is unlikely that these
rates were tuned to the specific environmental conditions in which the sensor was deployed. Recent categorical
measurement error methods will be applied to the collection of data streams to "train" the sensors and provide
point estimates along with confidence intervals for the parameters characterizing sensor performance. By pooling
a relatively small collection of random variables arising from a single sensor and using data-driven misclassification
rate estimates along with estimated confidence bands, we show how one can transform the stream of
categorical random variables into a test statistic with a limiting standard normal distribution. The procedure
shows promise for normalizing sequences of misclassified random variables coming from different sensors (with
a priori unknown population parameters) to comparable test statistics; this facilitates fusion through various
downstream processing mechanisms. We have explored some possible downstream processing mechanisms that
rely on false discovery rate (FDR) methods. The FDR methods exploit the test statistics we have computed in a
chemical sensor fusion context where reducing false alarms and maintaining substantial power is important. FDR
methods also provide a framework to fuse signals coming from non-chem/bio sensors in order to improve performance.
Simulation results illustrating these ideas are presented. Extensions, future work and open problems are
also briefly discussed.
Most modern maximum likelihood multiple target tracking systems (e.g., Multiple Hypothesis Tracking (MHT) and Numerica's
Multiple Frame Assignment (MFA)) need to determine how to separate their input measurements into subsets
corresponding to the observations of individual targets. These observation sets form the tracks of the system, and the
process of determining these sets is known as data association. Real-time constraints frequently force the use of only the
maximum likelihood choice for data association (over some time window), although alternative data association choices
may have been considered in the process of choosing the most likely.
This paper presents a Tracker Adjunct Processing (TAP) system that captures and manages the uncertainty encountered
in making data association decisions. The TAP combines input observation data and the data association alternatives
considered by the tracker into a dynamic Bayesian network (DBN). The network efficiently represents the combined
alternative tracking hypotheses. Bayesian network evidence propagation methods are used to update the network in light of
new evidence, which may consist of new observations, new alternative data associations, newly received late observations,
hypothetical connections, or other flexible queries. The maximum likelihood tracking hypothesis can then be redetermined,
which may result in changes to the best tracking hypothesis. The recommended changes can then be communicated back
to the associated tracking system, which can then update its tracks. In this manner, the TAP's interpretation makes the firm,
fixed (formerly maximum likelihood) decisions of the tracker "softer," i.e., less absolute. The TAP can also assess (and
reassess) track purity regions by ambiguity level.
We illustrate the working of the TAP with several examples, one in particular showing the incorporation of critical, late
or infrequent data. These data are critical in the sense that they are very valuable in resolving ambiguities in tracking and
combat identification; thus, the motivation to use these data is high even though there are complexities in applying it. Some
data may be late because of significant network delays, while other data may be infrequently reported because they come
from "specialized" sensors that provide updates only every once in a while.
The fusion of Chemical, Biological, Radiological, and Nuclear (CBRN) sensor readings from both point and
stand-off sensors requires a common space in which to perform estimation. In this paper we suggest a common
representational space that allows us to properly assimilate measurements from a variety of different sources
while still maintaining the ability to correctly model the structure of CBRN clouds. We design this space with
sparse measurement data in mind in such a way that we can estimate not only the location of the cloud but also
our uncertainty in that estimate. We contend that a treatment of the uncertainty of an estimate is essential in
order to derive actionable information from any sensor system; especially for systems designed to operate with
minimal sensor data. A companion paper1 further extends and evaluates the uncertainty management introduced
here for assimilating sensor measurements into a common representational space.
Proc. SPIE. 7698, Signal and Data Processing of Small Targets 2010
KEYWORDS: Detection and tracking algorithms, Sensors, Spectrometers, Signal processing, Pollution control, Chemical detection, Chemical analysis, Biological research, Environmental sensing, Chemical fiber sensors
Reliable detection of hazardous materials is a fundamental requirement of any national security program. Such
materials can take a wide range of forms including metals, radioisotopes, volatile organic compounds, and
biological contaminants. In particular, detection of hazardous materials in highly challenging conditions - such
as in cluttered ambient environments, where complex collections of analytes are present, and with sensors lacking
specificity for the analytes of interest - is an important part of a robust security infrastructure. Sophisticated
single sensor systems provide good specificity for a limited set of analytes but often have cumbersome hardware
and environmental requirements. On the other hand, simple, broadly responsive sensors are easily fabricated
and efficiently deployed, but such sensors individually have neither the specificity nor the selectivity to address
analyte differentiation in challenging environments. However, arrays of broadly responsive sensors can provide
much of the sensitivity and selectivity of sophisticated sensors but without the substantial hardware overhead.
Unfortunately, arrays of simple sensors are not without their challenges - the selectivity of such arrays can only
be realized if the data is first distilled using highly advanced signal processing algorithms. In this paper we will
demonstrate how the use of powerful estimation algorithms, based on those commonly used within the target
tracking community, can be extended to the chemical detection arena. Herein our focus is on algorithms that
not only provide accurate estimates of the mixture of analytes in a sample, but also provide robust measures of
ambiguity, such as covariances.
The coordinated use of multiple distributed sensors by network communication has the potential to substantially
improve track state estimates even in the presence of enemy countermeasures. In the modern electronic warfare
environment, a network-centric tracking system must function in a variety of jamming scenarios. In some
scenarios hostile electronic countermeasures (ECM) will endeavor to deny range and range rate information,
leaving friendly sensors to depend on passive angle information for tracking. In these cases the detrimental
effects of ECM can be at least partially ameliorated through the use of multiple networked sensors, due to the
inability of the ECM to deny angle measurements and the geometric diversity provided by having sensors in
distributed locations. Herein we demonstrate algorithms for initiating and maintaining tracks in such hostile
operating environments with a focus on maximum likelihood estimators and provide Cramer-Rao bounds on
the performance one can expect to achieve.
In multi-sensor tracking systems, observations are often exchanged over a network for processing. Network delays
create situations in which measurements arrive out-of-sequence. The out-of-sequence measurement (OOSM)
update problem is of particular significance in networked multiple hypothesis tracking (MHT) algorithms. The
advantage of MHT is the ability to revoke past measurement assignment decisions as future information becomes
available. Accordingly, we not only have to deal with network delays for initial assignment, but must also address
delayed assignment revocations. We study the performance of extant algorithms and two algorithm modifications
for the purpose of OOSM filtering in MHT architectures.
In distributed tracking systems, multiple non-collocated trackers cooperate to fuse local sensor data into a
global track picture. Generating this global track picture at a central location is fairly straightforward, but the
single point of failure and excessive bandwidth requirements introduced by centralized processing motivate the
development of decentralized methods. In many decentralized tracking systems, trackers communicate with their
peers via a lossy, bandwidth-limited network in which dropped, delayed, and out of order packets are typical.
Oftentimes the decentralized tracking problem is viewed as a local tracking problem with a networking twist;
we believe this view can underestimate the network complexities to be overcome. Indeed, a subsequent 'oversight'
layer is often introduced to detect and handle track inconsistencies arising from a lack of robustness to network
We instead pose the decentralized tracking problem as a distributed database problem, enabling us to draw
inspiration from the vast extant literature on distributed databases. Using the two-phase commit algorithm, a
well known technique for resolving transactions across a lossy network, we describe several ways in which one
may build a distributed multiple hypothesis tracking system from the ground up to be robust to typical network
intricacies. We pay particular attention to the dissimilar challenges presented by network track initiation vs.
maintenance and suggest a hybrid system that balances speed and robustness by utilizing two-phase commit for
only track initiation transactions. Finally, we present simulation results contrasting the performance of such a
system with that of more traditional decentralized tracking implementations.
Fusion of data from multiple sensors can be hindered by systematic bias errors. This may lead to severe degradation
in data association and track quality and may result in a large growth of redundant and spurious tracks.
Multi-sensor networks will generally attempt to estimate the relevant bias values (usually, during sensor registration),
and use the estimates to debias the sensor measurements and correct the reference frame transformations.
Unfortunately, the biases and navigation errors are stochastic, and the estimates of the means account only
for the "deterministic" part of the biases. The remaining stochastic errors are termed "residual" biases and
are typically modeled as a zero-mean random vector. Residual biases may cause inconsistent covariance estimates,
misassociation, multiple track swaps, and redundant/spurious track generation; we therefore require
some efficient mechanism for mitigating the effects of residual biases. We present here results based on the
Schmidt-Kalman filter for mitigating the effects of residual biases. A key advantage of this approach is that it
maintains the cross-correlation between the state and the bias errors, leading to a realistic covariance estimate.
The current work expands on the work previously performed by Numerica through an increase in the number
of bias terms used in a high fidelity simulator for air defense. The new biases considered revolve around the
transformation from the global earth-centered-earth-fixed (ECEF) coordinate frame to the local east-north-up
(ENU) coordinate frame. We examine not only the effect of bias mitigation for the full set of biases, but also
analyze the interplay between the various bias components.