Recent flight tests of the Airborne Reconfigurable Imaging System (ARIS) developed by BAE Systems have
demonstrated several key capabilities for day-night multiple target detection and tracking of maritime surface vessels.
The flight test system includes the integration of BAE Systems-developed real-time image processing algorithms with
multispectral band electro-optical (EO) (RGB)/mid-wave infrared (MWIR) sensors and a COTS turreted system.
Presented here are significant flight test results that demonstrate real-time multispectral sensor image co-registration,
geo-registration, fusion, target detection and tracking. High performance geo-pointing and line-of-sight stabilization
capabilities enable the airborne system to provide maritime domain awareness objectives for search and rescue,
persistent surveillance of moving and stationary targets, contraband traffic control through detection and tracking of
concealed vessels, and autonomous tracking of both moving and stationary vessels. Solar glint and cloud coverage false
alarms are minimized while multiple target detections and tracks are maintained. The fundamental real-time processing
methodologies used for ARIS are applied to a COTS multiple field of view system with high-resolution RGB and mid-wave
infrared (MWIR) video rate imaging. The techniques discussed for this four spectral band system can be applied to
both extended multispectral systems (greater than four spectral bands) and hyperspectral systems to further enhance
system capabilities for analogous terrestrial applications.
SeeCoast is a prototype US Coast Guard port and coastal area surveillance system that aims to reduce operator workload while maintaining optimal domain awareness by shifting their focus from having to detect events to being able to analyze and act upon the knowledge derived from automatically detected anomalous activities. The automated scene understanding capability provided by the baseline SeeCoast system (as currently installed at the Joint Harbor Operations Center at Hampton Roads, VA) results from the integration of several components. Machine vision technology processes the real-time video streams provided by USCG cameras to generate vessel track and classification (based on vessel length) information. A multi-INT fusion component generates a single, coherent track picture by combining information available from the video processor with that from surface surveillance radars and AIS reports. Based on this track picture, vessel activity is analyzed by SeeCoast to detect user-defined unsafe, illegal, and threatening vessel activities using a rule-based pattern recognizer and to detect anomalous vessel activities on the basis of automatically learned behavior normalcy models. Operators can optionally guide the learning system in the form of examples and counter-examples of activities of interest, and refine the performance of the learning system by confirming alerts or indicating examples of false alarms. The fused track picture also provides a basis for automated control and tasking of cameras to detect vessels in motion. Real-time visualization combining the products of all SeeCoast components in a common operating picture is provided by a thin web-based client.
The theory of opponent-sensor image fusion is based on neural circuit models of adaptive contrast enhancement and opponent-color interaction, as developed and previously presented by Waxman, Fay et al. This approach can directly fuse 2, 3, 4, and 5 imaging sensors, e.g., VNIR, SWIR, MWIR, and LWIR for fused night vision. The opponent-sensor images also provide input to a point-and-click fast learning approach for target fingerprinting (pattern learning and salient feature discovery) and subsequent target search. We have recently developed a real-time implementation of multi-sensor image fusion and target learning & search on a single board attached processor for a laptop computer. In this paper we will review our approach to image fusion and target learning, and demonstrate fusion and target detection using an array of VNIR, SWIR and LWIR imagers. We will also show results from night data collections in the field. This opens the way to digital fused night vision goggles, weapon sights and turrets that fuse multiple sensors and learn to find targets designated by the operator.
We have continued development of a system for multisensor image fusion and interactive mining based on neural models of color vision processing, learning and pattern recognition. We pioneered this work while at MIT Lincoln Laboratory, initially for color fused night vision (low-light visible and uncooled thermal imagery) and later extended it to multispectral IR and 3D ladar. We also developed a proof-of-concept system for EO, IR, SAR fusion and mining. Over the last year we have generalized this approach and developed a user-friendly system integrated into a COTS exploitation environment known as ERDAS Imagine. In this paper, we will summarize the approach and the neural networks used, and demonstrate fusion and interactive mining (i.e., target learning and search) of low-light Visible/SWIR/MWIR/LWIR night imagery, and IKONOS multispectral and high-resolution panchromatic imagery. In addition, we will demonstrate how target learning and search can be enabled over extended operating conditions by allowing training over multiple scenes. This will be illustrated for the detection of small boats in coastal waters using fused Visible/MWIR/LWIR imagery.
This paper presents a novel approach to higher-level (2+) information fusion and knowledge representation using
semantic networks composed of coupled spiking neuron nodes. Networks of spiking neurons have been shown to
exhibit synchronization, in which sub-assemblies of nodes become phase locked to one another. This phase locking
reflects the tendency of biological neural systems to produce synchronized neural assemblies, which have been
hypothesized to be involved in feature binding. The approach in this paper embeds spiking neurons in a semantic
network, in which a synchronized sub-assembly of nodes represents a hypothesis about a situation. Likewise, multiple
synchronized assemblies that are out-of-phase with one another represent multiple hypotheses. The initial network is
hand-coded, but additional semantic relationships can be established by associative learning mechanisms. This
approach is demonstrated with a simulated scenario involving the tracking of suspected criminal vehicles between
meeting places in an urban environment.
We have continued development of a system for multisensor image fusion and interactive mining based on neural models of color vision processing, learning and pattern recognition. We pioneered this work while at MIT Lincoln Laboratory, initially for color fused night vision (low-light visible and uncooled thermal imagery) and later extended it to multispectral IR and 3D ladar. We also developed a proof-of-concept system for EO, IR, SAR fusion and mining. Over the last year we have generalized this approach and developed a user-friendly system integrated into a COTS exploitation environment known as ERDAS <i>Imagine</i>. In this paper, we will summarize the approach and the neural networks used, and demonstrate fusion and interactive mining (i.e., target learning and search) of low-light visible/SWIR/MWIR/LWIR night imagery, and IKONOS multispectral and high-resolution panchromatic imagery. In addition, we will demonstrate how target learning and search can be enabled over extended operating conditions by allowing training over multiple scenes. This will be illustrated for the detection of small boats in coastal waters using fused visible/MWIR/LWIR imagery.
We have extended our previous capabilities for fusion of multiple passive imaging sensors to now include 3D imagery obtained from a prototype flash ladar. Real-time fusion of low-light visible + uncooled LWIR + 3D LADAR, and SWIR + LWIR + 3D LADAR is demonstrated. Fused visualization is achieved by opponent-color neural networks for passive image fusion, which is then textured upon segmented object surfaces derived from the 3D data. An interactive viewer, coded in Java3D, is used to examine the 3D fused scene in stereo. Interactive designation, learning, recognition and search for targets, based on fused passive + 3D signatures, is achieved using Fuzzy ARTMAP neural networks with a Java-coded GUI. A client-server web-based architecture enables remote users to interact with fused 3D imagery via a wireless palmtop computer.
We present recent work on methods for fusion of imagery from multiple sensors for night vision capability. The fusion system architectures are based on biological models of the spatial and opponent-color processes in the human retina and visual cortex. The real-time implementation of the dual-sensor fusion system combines imagery from either a low-light CCD camera (developed at MIT Lincoln Laboratory) or a short-wave infrared camera (from Sensors Unlimited, Inc.) With thermal long-wave infrared imagery (from a Lockheed Martin microbolometer camera). Example results are shown for an extension of the fusion architecture to include imagery from all three of these sensors as well as imagery from a mid- wave infrared imager (from Raytheon Amber Corp.). We also demonstrate how the results from these multi-sensor fusion systems can be used as inputs to an interactive tool for target designation, learning, and search based on a Fuzzy ARTMAP neural network.
As part of an advanced night vision program sponsored by DARPA, a method for real-time color night vision based on the fusion of visible and infrared sensors has been developed and demonstrated. The work, based on principles of color vision in humans and primates, achieves an effective strategy for combining the complementary information present in the two sensors. Our sensor platform consists of a 640 X 480 low- light CCD camera developed at MIT Lincoln Laboratory and a 320 X 240 uncooled microbolometer thermal infrared camera from Lockheed Martin Infrared. Image capture, data processing, and display are implemented in real-time (30 fps) on commercial hardware. Recent results from field tests at Lincoln Laboratory and in collaboration with U.S. Army Special Forces at Fort Campbell will be presented. During the tests, we evaluated the performance of the system for ground surveillance and as a driving aid. Here, we report on the results using both a wide-field of view (42 deg.) and a narrow field of view (7 deg.) platforms.
We present an approach to color night vision through fusion of information derived from visible and thermal infrared sensors. Building on the work reported at SPIE in 1996 and 1997, we show how opponent-color processing and center-surround shunting neural networks can achieve informative multi-band image fusion. In particular, by emulating spatial and color processing in the retina, we demonstrate an effective strategy for multi-sensor color-night vision. We have developed a real- time visible/IR fusion processor from multiple C80 DSP chips using commercially available Matrox Genesis boards, which we use in conjunction with the Lincoln Lab low-light CCD and a Raytheon TI Systems uncooled IR camera. Limited human factors testing of visible/IR fusion is presented showing improvements in human performance using our color fused imagery relative to alternative fusion strategies or either single image modality alone. We conclude that fusion architectures that match opponent-sensor contrast to human opponent-color processing will yield fused image products of high image quality and utility.
Two recently developed color image fusion techniques, the TNO fusion scheme and the MIT fusion scheme, are applied to visible and thermal images of military relevant scenarios. An observer experiment is performed to test if the increased amount of detail in the fused images can yield an improved observer performance in a task that requires situational awareness. The task that is devised involves the detection and localization of a person in the displayed scene relative to some characteristic details that provide the spatial context. Two important results are presented. First, it is shown that color fused imagery leads to improved target detection over all other modalities. Second, results show that observers can indeed determine the relative location of a person in a scene with a significantly higher accuracy when they perform with fused images, compared to the original image modalities. The MIT color fusion scheme yields the best overall performance. Even the most simple fusion scheme yields an observer performance that is better than that obtained for the individual images.
MIT Lincoln Laboratory is developing new electronic night vision technologies for defense applications which can be adapted for civilian applications such as night driving aids. These technologies include (1) low-light CCD imagers capable of operating under starlight illumination conditions at video rates, (2) realtime processing of wide dynamic range imagery (visible and IR) to enhance contrast and adaptively compress dynamic range, and (3) realtime fusion of low-light visible and thermal IR imagery to provide color display of the night scene to the operator in order to enhance situational awareness. This paper compares imagery collected during night driving including: low-light CCD visible imagery, intensified-CCD visible imagery, uncooled long-wave IR imagery, cryogenically cooled mid-wave IR imagery, and visible/IR dual-band imagery fused for gray and color display.
We report progress on our development of a color night vision capability, using biological models of opponent-color processing to fuse low-light visible and thermal IR imagery, and render it in realtime in natural colors. Preliminary results of human perceptual testing are described for a visual search task, the detection of embedded small low-contrast targets in natural night scenes. The advantages of color fusion over two alterative grayscale fusion products is demonstrated in the form of consistent, rapid detection across a variety of low- contrast (+/- 15% or less) visible and IR conditions. We also describe advances in our development of a low-light CCD camera, capable of imaging in the visible through near- infrared in starlight at 30 frames/sec with wide intrascene dynamic range, and the locally adaptive dynamic range compression of this imagery. Example CCD imagery is shown under controlled illumination conditions, from full moon down to overcast starlight. By combining the low-light CCD visible imager with a microbolometer array LWIR imager, a portable image processor, and a color LCD on a chip, we can realize a compact design for a color fusion night vision scope.
We introduce an apparatus and methodology to support realtime color imaging for night operations. Registered imagery obtained in the visible through near IR band is combined with thermal IR imagery using principles of biological color vision. The visible imagery is obtained using a Gen III image intensifier tube optically coupled to a conventional CCD, while the thermal IR imagery is obtained using an uncooled thermal imaging array, the two fields of view being matched and imaged through a dichroic beam splitter. Remarkably realistic color renderings of night scenes are obtained, and examples are given in the paper. We also describe a compact integrated version of our system in the form of a color night vision device, in which the intensifier tube is replaced by a high resolution low-light sensitive CCD. Example CCD imagery obtained under starlight conditions is also shown. The system described here has the potential to support safe and efficient night flight, ground, sea and search & rescue operations, as well as night surveillance.
Neural network models of early visual computation have been adapted for processing single polarization (VV channel) SAR imagery, in order to assess their potential for enhanced target detection. In particular, nonlinear center-surround shunting networks and multi-resolution boundary contour/feature contour system processing has been applied to a spotlight sequence of tactical targets imaged by the Lincoln ADT sensor at 1 ft resolution. We show how neural processing can modify the target and clutter statistics, thereby separating the poplulations more effectively. ROC performance curves indicating detection versus false alarm rate are presented, clearly showing the potential benefits of neural pre-processing of SAR imagery.
This paper describes how the similarities and differences among similar objects can be discovered during learning to facilitate recognition. The application domain is single views of flying model aircraft captured in silhouette by a CCD camera. The approach was motivated by human psychovisual and monkey neurophysiological data. The implementation uses neural net processing mechanisms to build a hierarchy that relates similar objects to superordinate classes, while simultaneously discovering the salient differences between objects within a class. Learning and recognition experiments both with and without the class similarity and difference learning show the effectiveness of the approach on this visual data. To test the approach, the hierarchical approach was compared to a non-hierarchical approach, and was found to improve the average percentage of correctly classified views from 77% to 84%.
This work describes the implementation of some of the neural systems that will enable a mobile robot to actively explore and learn its environment visually. These systems perform the real-time extraction of robust visual features, the segmentation of landmarks from the background and from each other using binocular attentional mechanisms, the predictive binocular tracking of landmarks, and the learning and recognition of landmarks from their features. Also described are preliminary results of incorporating most of these systems into a mobile robot called MAVIN, which can demonstrate the visual exploration of simplified landmarks. Finally, we discuss plans for using similar neural strategies to extend MAVIN's capabilities by implementing a biologically plausible system for navigating through an environment that has been learned by exploration. This explorational learning consists of quantizing the environment into orientation-specific place fields generated by the view-based spatial distribution of landmarks, and associating these place fields in order to form qualitative, behavioral, spatial maps.
We summarize a recently developed modular neural system which exploits sequences of 2D views for learning and recognizing 3D objects. An aspect network is an unsupervised module of our complete artificial vision system for detecting and learning the view transitions (as the appearance of a rotating object changes), and for later recognizing objects from sequences of views. By processing sequences of views, the system accumulates evidence over time, thereby increasing the confidence of its recognition decisions. Also, when new views are revealed following views recognized previously by an aspect network during the course of observation, the new views and view-transitions are used to refine the evolving 3D object representation automatically. Recognition is possible even from novel (previously unexperienced) view sequences. The objects used for illustration are model aircraft in flight. The computations are formulated as differential equations among analog nodes and synapses to model the temporal dynamics explicitly.
In many situations, only some parts of an object are visible while other parts are occluded. In other situations, information about an object is available piecemeal as the parts are scanned sequentially, such as when eye-motions are used to explore an object. Part information is also crucially important for objects with articulating parts, or with removable parts. In all of these cases, the sensor-scanner system must divide an object into subcomponents, and must also be able to integrate the part-information using appropriate data concerning the spatial relationships among the parts as well as the temporal scan sequences. This work describes how such issues are addressed in recognizing human faces from their parts using a neural network approach. Parallels are drawn between neurophysiological and psychophysical experiments, as well as deficits in visual object recognition. This work extends our existing modular system, developed for learning and recognizing 3D objects from multiple views, by investigating the capabilities which need to be augmented for coping with objects which are represented hierarchically. The ability of the previous system to learn and recognize 3D objects invariant to their apparent size, orientation, position, perspective projection, and 3D pose serves as a strong foundation for the extension to more complex 3D objects.
A network is described that can be used for multiple targets grouping and tracking or directing a vision system's focus of attention. The network models a biologically plausible astroglial- neural network in the visual cortex whose parameters are tuned to match a psychophysical database on apparent motion. The architecture consists of a diffusion layer and a contrast- enhancement layer coupled by feedforward and feedback connections; input is provided by a separate feature extracting layer. The dynamics of the diffusion-enhancement bilayer exhibit grouping of static features on multiple scales as a function of time, and long-range apparent motion between time varying inputs. The model is cast as a parallel analog circuit which is realizable in VLSI. We present simulations that reproduce static grouping phenomena useful for multiple target grouping and tracking over multiple scales, demonstrate several long-range apparent motion phenomena, and discuss single targets that split, and multiple targets that merge.
This paper addresses the problem of generating models of 3D objects automatically from exploratory view-sequences of the objects. Neural network techniques are described which cluster the frames of video-sequences into view-categories, called aspects, representing the 2D characteristic views. Feedforward processes insure that each aspect is invariant to the apparent position, size, orientation, and foreshortening of an object in the scene. The aspects are processed in conjunction with their associated aspect-transitions by the Aspect Network to learn and refine the 3D object representations on-the-fly. Recognition is indicated by the object-hypothesis which has accumulated the maximum evidence. The object-hypothesis must be'consistent with the current view, as well as the recent history of view transitions stored in the Aspect Network. The “winning” object refines its representation until either the attention of the camera is redirected or another hypothesis accumulates greater evidence.