The success of the use motion imagery sensors this past ten years has created high demand from operational forces and
emergency responders. The demand has been especially high for motion imagery sensors mounted on unmanned aerial
systems (UAS). Motion imagery sensing has led to new, more effective ways for our forces to fight and our emergency
responders to handle crises. This high demand for motion imagery has created dramatic growth of bandwidth, large
volumes of data storage, and more capable analytic tools. The National Geospatial-Intelligence Agency (NGA) has
unveiled a new vision to provide data and services to its users that will better provide for their needs and enable NGA to
focus its expertise to focus on much tougher problems.
An examination of the potentialities, benefits and challenges of the confluence, integration and operation of Geospatial
Intelligence (GEOINT) capabilities, products and techniques within the larger context of the Intelligence, Surveillance and
Reconnaissance (ISR) arena, particularly in regards to persistent surveillance and Full Motion Video (FMV).
Activity-Based Intelligence (ABI) was defined by the Office of the Undersecretary of Defense for Intelligence as “a
discipline of intelligence where the analysis and subsequent collection is focused on activity and transactions associated
with an entity, population, or area of interest.” ABI is inherently multi-INT, and motion imagery is a rich data source for
ABI analysis. Motion imagery provides a unique temporal aspect which is critical for activity detection and
classification. Additionally, motion imagery tends to have high spatial oversampling useful for determining activities
and patterns above the noise threshold.
A key to any robust automated surveillance system is continuous, wide field-of-view sensor coverage and high accuracy
target detection algorithms. Newer systems typically employ an array of multiple fixed cameras that provide individual
data streams, each of which is managed by its own processor. This array can continuously capture the entire field of
view, but collecting all the data and back-end detection algorithm consumes additional power and increases the size,
weight, and power (SWaP) of the package. This is often unacceptable, as many potential surveillance applications have
strict system SWaP requirements. This paper describes a wide field-of-view video system that employs multiple fixed
cameras and exhibits low SWaP without compromising the target detection rate. We cycle through the sensors, fetch a
fixed number of frames, and process them through a modified target detection algorithm. During this time, the other
sensors remain powered-down, which reduces the required hardware and power consumption of the system. We show
that the resulting gaps in coverage and irregular frame rate do not affect the detection accuracy of the underlying
algorithms. This reduces the power of an N-camera system by up to approximately N-fold compared to the baseline
normal operation. This work was applied to Phase 2 of DARPA Cognitive Technology Threat Warning System
(CT2WS) program and used during field testing.
Transport of live video requires a robust backbone as live video decoders are subject to dropouts and buffer starvation.
A short duration packet loss will many times cause a decoder to go black for many seconds as it reacquires the stream
and clock. IP networks due to their connectionless approach and support for variable length packets, inherently display
packet delivery variability. These characteristics most typically include packet loss, packet delay variation, and packets
being delivered out of order.
Deep Packet Recovery (DPR) techniques provide correction to IP network induced errors and issues. DPR can provide a
much broader and stronger protection than traditional Forward Error Correction techniques enabling transport of live
video across severely impaired networks.
Digital modulation technologies, maximum ratio combining diversity reception and special purpose modulation
schemes, such as deep interleaving, provide a reliable and dynamic downlink solution. Multiple receive antennae are
employed in the application, simultaneously combining the energy from all antenna inputs; blending individual carriers
on a COFDM/DBV-T symbol basis to generate optimum signal strength.
TCP I/P remote control and configuration management permit remote reconfiguration of the network through Webbased
graphical user interfaces.
Airborne platforms are increasingly being used as vehicles to capture intelligence data for defense, state and civil
applications. The aerial vehicles are equipped with technology for both video and sensor data collection; the data is then
sent to a ground mission control center for further processing. When the airborne platform is outside the reach of direct
data relay due to distance or environment, satellite communications is used for Beyond Line of Sight (BLoS)
It is a key requirement for the satellite link in ISR (Intelligence, Surveillance and Reconnaissance) operations to get as
much data and video as possible through the available bandwidth. The satellite link also needs to be available at all times
during operations to insure mission critical communications and not endanger ground operations. Only by using robust
satellite technology can the demand for more data and highest efficiency be satisfied while keeping OPEX costs under
This paper will highlight both technical and practical challenges of operators in the airborne ISR missions, going from
technical requirements to efficiency-driven solutions. It will also look at what the final results in the field are when
transmitting ISR data and video from the airborne platform over satellite in highly adaptive environments. The existing
qualified and deployed BLoS airborne solution already achieves over 20Mbps from the aircraft to the ground in active
operations, but requirements and capabilities continue to increase as more comprehensive ISR data is being transmitted.
With the pervasiveness of still and full-motion imagery in commercial and military applications, the need to ingest and
analyze these media has grown rapidly in recent years. Additionally, video hosting and live camera websites provide a
near real-time view of our changing world with unprecedented spatial coverage. To take advantage of these controlled
and crowd-sourced opportunities, sophisticated visual analytics (VA) tools are required to accurately and efficiently
convert raw imagery into usable information. Whether investing in VA products or evaluating algorithms for potential
development, it is important for stakeholders to understand the capabilities and limitations of visual analytics tools.
Visual analytics algorithms are being applied to problems related to Intelligence, Surveillance, and Reconnaissance
(ISR), facility security, and public safety monitoring, to name a few. The diversity of requirements means that a onesize-
fits-all approach to performance assessment will not work. We present a process for evaluating the efficacy of
algorithms in real-world conditions, thereby allowing users and developers of video analytics software to understand
software capabilities and identify potential shortcomings. The results-based approach described in this paper uses an
analysis of end-user requirements and Concept of Operations (CONOPS) to define Measures of Effectiveness (MOEs),
test data requirements, and evaluation strategies. We define metrics that individually do not fully characterize a system,
but when used together, are a powerful way to reveal both strengths and weaknesses. We provide examples of data
products, such as heatmaps, performance maps, detection timelines, and rank-based probability-of-detection curves.
Geolocation of objects or points of interest on the ground from airborne sensors is an enabler to
support many useful purposes. While many commercial handheld cameras today perform
rudimentary geo-tagging of images, few outside of commercial or military tactical airborne sensors
have implemented the methods necessary to produce full three-dimensional coordinates as well as
perform rigorous metric error propagation to estimate the uncertainties of those calculated
coordinates. The critical ingredients for this fully metric capability include careful characterization
of the sensor system, capturing and disseminating a complete metadata profile with the imagery, and
having a validated sensor model to support the necessary transformations between the image space
and the ground space. This paper describes important characteristics of metadata, the methods of
geopositioning which can be applied, and including advantages and limitations. In addition, it will
present the benefits of using active sensors and some recent efforts focusing on geopositioning from
full-motion video (FMV) sensors.
Just as we face a global environment where the latest technology is most needed by Department of Defense (DoD) and
the Intelligence Community (IC), we find ourselves hindered by the lack of smart, useful data that our intelligent
systems and workforce can fully exploit. Consider the ISR Enterprise “system of systems.” Our inability to properly
populate metadata fields for making data discoverable and useful is as harmful to system performance as putting lowgrade
fuel in a race car. In order for our downstream ISR Enterprise systems (and analysts) to achieve their full
performance potential, we must take measures upstream to make the data stream smart. This paper will examine the
challenges and ongoing efforts that will benefit analysts at all echelons.
Using multi-frame change detection methods, we estimate which pixels include objects that are in motion relative to
the background. We utilize both a sequential statistical change detection method and a sparsity-based change detection
method. We perform foreground estimation in videos in which the background is static as well as in images in which
apparent background motion is induced by camera motion. We show the results of our techniques on the background
subtraction data set from the Statistical Visual Computing Lab at the University of California, San Diego(UCSD).
A frequently occurring interaction task in UAS video exploitation is the marking or selection of objects of interest in the
video. If an object of interest is visually detected by the image analyst, its selection/marking for further exploitation,
documentation and communication with the team is a necessary task. Today object selection is usually performed by
mouse interaction. As due to sensor motion all objects in the video move, object selection can be rather challenging,
especially if strong and fast and ego-motions are present, e.g., with small airborne sensor platforms. In addition to that,
objects of interest are sometimes too shortly visible to be selected by the analyst using mouse interaction. To address this
issue we propose an eye tracker as input device for object selection. As the eye tracker continuously provides the gaze
position of the analyst on the monitor, it is intuitive to use the gaze position for pointing at an object. The selection is
then actuated by pressing a button. We integrated this gaze-based “gaze + key press” object selection into Fraunhofer
IOSB's exploitation station ABUL using a Tobii X60 eye tracker and a standard keyboard for the button press.
Representing the object selections in a spatial relational database, ABUL enables the image analyst to efficiently query
the video data in a post processing step for selected objects of interest with respect to their geographical and other
properties. An experimental evaluation is presented, comparing gaze-based interaction with mouse interaction in the
context of object selection in UAS videos.
This paper describes the role that the National Geospatial Intelligence Agency (NGA) has in motion imagery research
and development (RD). Motion imagery RD is ubiquitous. Commercial technology is strongly leveraged by the
Department of Defense (DoD) and each component in DoD has unique needs that they invest RD dollars against.
DoD Directive 5106.60 gives NGA full responsibility for geospatial intelligence (GEOINT), including a wide range of
RD functions. InnoVision, NGA’s RD component has specific areas of focus for motion imagery RD that are
designed to complement and enhance service and industry efforts.
Airborne Wide Area Motion Imagery (WAMI) sensors provide the opportunity for continuous high-resolution
surveillance of geographic areas covering tens of square kilometers. This is both a blessing and a curse. Data volumes
from “gigapixel-class” WAMI sensors are orders of magnitude greater than for traditional “megapixel-class” video
sensors. The amount of data greatly exceeds the capacities of downlinks to ground stations, and even if this were not
true, the geographic coverage is too large for effective human monitoring. Although collected motion imagery is
recorded on the platform, typically only small “windows” of the full field of view are transmitted to the ground; the full
set of collected data can be retrieved from the recording device only after the mission has concluded. Thus, the WAMI
environment presents several difficulties: (1) data is too massive for downlink; (2) human operator selection and control
of the video windows may not be effective; (3) post-mission storage and dissemination may be limited by inefficient file
formats; and (4) unique system implementation characteristics may thwart exploitation by available analysis tools. To
address these issues, the National Geospatial-Intelligence Agency’s Motion Imagery Standards Board (MISB) is
developing relevant standard data exchange formats: (1) moving target indicator (MTI) and tracking metadata to support
tipping and cueing of WAMI windows using “watch boxes” and “trip wires”; (2) control channel commands for
positioning the windows within the full WAMI field of view; and (3) a full-field-of-view spatiotemporal tiled file format
for efficient storage, retrieval, and dissemination. The authors previously provided an overview of this suite of
standards. This paper describes the latest progress, with specific concentration on a detailed description of the
spatiotemporal tiled file format.