Fixed targets such as bridges, airfields, and buildings are of military significance and their value is constantly being appraised as the battle scenario evolves. For examples, a building thought to be of no significance may be reappraised, through intelligence reports, as a military command center. The ability to quickly strike these targets with a minimal amount of a priori information is necessary. The requirements placed on such a system are: (1) Rapid turnaround time from the moment the decision is made to attack. Depending on the user organization, this time ranges from fifteen minutes to twelve hours. (2) Minimal a priori target information. There is likely to be no imagery data base of the target, and the system may be required to operate with as little information as an overhead photograph. (3) Real time recognition of the target. Terminal guidance of the weapons delivery system to a specified destructive aimpoint of will be impacted by the recognition system. (4) Flexibility to attack a variety of targets. A data base of known high value fixed targets (HVFT) may be stored, but the sudden inclusion of new targets must be accommodated. This paper will discuss a real time implementation of a model based approach to automatically recognize high value fixed targets in forward looking infrared (FLIR) imagery. This approach generates a predictive model of the expected target features to be found in the image, extracts those feature types from the image, and matches the predictive model with the image features. A generic approach to the description of the target features has been taken to allow for rapid preparation of the models from minimal a priori target information. The real time aspect has been achieved by implementing the system on a massively parallel single instruction, multiple data architecture. An overview of an entire system approach to attack high value fixed targets will be discussed. The automatic target recognizer (ATR), which is a part of this system, will be discussed in detail and results of the ATR operating against HVFT in FLIR imagery will be shown.
We will briefly review the design of the Image Understanding Architecture (IUA), a massively parallel, multi-level processor for real-time, knowledge-based, computer vision applications. A proof-of-concept prototype of the IUA has been built jointly by the University of Massachusetts and Hughes Research Laboratories in Malibu. We then describe the programming environment that has been created for the IUA.
A recent DARPA initiative has sparked interested in software environments for computer vision. The goal is a single environment to support both basic research and technology transfer. This paper lays out six fundamental attributes such a system must possess: (1) support for both C and Lisp, (2) extensibility, (3) data sharing, (4) data query facilities tailored to vision, (5) graphics, and (6) code sharing. The first three attributes fundamentally constrain the system design. Support for both C and Lisp demands some form of database or data-store for passing data between languages. Extensibility demands that system support facilities, such as spatial retrieval of data, be readily extended to new user-defined datatypes. Finally, data sharing demands that data saved by one user, including data of a user-defined type, must be readable by another user.
A massively parallel neural network architecture is currently being developed as a potential component of a distributed information system in support of NASA's Earth Observing System. This architecture can be trained, via an iterative learning process, to recognize objects in images based on texture features, allowing scientists to search for all patterns which are similar to a target pattern in a database of images. It may facilitate scientific inquiry by allowing scientists to automatically search for physical features of interest in a database through computer pattern recognition, alleviating the need for exhaustive visual searches through possibly thousands of images. The architecture is implemented on a Connection Machine such that each physical processor contains a simulated 'neuron' which views a feature vector derived from a subregion of the input image. Each of these neurons is trained, via the perceptron rule, to identify the same pattern. The network output gives a probability distribution over the input image of finding the target pattern in a given region. In initial tests the architecture was trained to separate regions containing clouds from clear regions in 512 by 512 pixel AVHRR images. We found that in about 10 minutes we can train a network to perform with high accuracy in recognizing clouds which were texturally similar to a target cloud group. These promising results suggest that this type of architecture may play a significant role in coping with the forthcoming flood of data from the Earth-monitoring missions of the major space-faring nations.
The large-scale evaluation and gradual optimization (LEGO) model of software development was designed to be capabilities-driven. Although the motivation for its structure was support of research projects, this model is applicable to development of any system where the identification and implementation of incremental levels of functionality is critical. It differs from other existing development models in that system functionality is emphasized over prototyping. Given the time and level-of-effort constraints inherent to contracted work, ground-up system development is neither desirable nor possible. At the same time, a cursory technology survey and assessment is inadequate and unacceptable. Maximal results are realized when work is focused on the judicious extension of pre-existing technologies and the development of new capabilities where required. The LEGO methodology was designed to approach system development from this perspective. This paper will describe some current models of software development, the research environment which motivated design of LEGO, the LEGO model itself, as well as application of LEGO to research and development of RADIUS, an image-understanding project currently being worked by Hughes.
The Office of Research and Development, with major involvement and support from the Defense Advanced Research Projects Agency (DARPA), has begun a highly applications- oriented project intended to provide image understanding (IU) technology in a fully and semi- automated support system of human-machine interface interactive tools to the photo interpreter and imagery analyst (IA). The central concept of Research and Development for Image Understanding Systems (RADIUS) is that of model supported exploitation. Two- and/or three dimensional site models are developed and/or maintained by analysts using imagery source data, imagery-derived information, and appropriate non-imagery sourced information (often called collateral). IU technology and necessary non-IU technology are used where feasible to integrate this base of information which can be accessed spatially via the now-developed site model and displayed or rendered in support of the IA during the imagery exploitation and reporting process. As new imagery is obtained, it may be registered to the site model, or through the site model to other images, to support the specific exploitation tasks and applications which will be developed. The current effort is the concept definition phase. This phase will determine the viability of current technology to perform these tasks and will define future activities.
This paper describes a parallel algorithm for ranking the pixels on a curve in O(log N) time using an EREW PRAM model. The algorithms accomplish this with N processors for a (root)N X (root)N image. After applying such an algorithm to an image, we are able to move the pixels from a curve into processors having consecutive addresses. This is important on hypercube connected machines like the Connection Machine because we can subsequently apply many algorithms to the curve using powerful segmented scan operations (i.e., parallel prefix operations). We shall illustrate this by first showing how we can find piecewise linear approximations of curves using Ramer's algorithm. This process has the effect of converting closed curves into simple polygons. As another example, we shall describe a more complicated parallel algorithm for computing the visibility graph of a simple planar polygon. The algorithm accomplishes this in O(k log N) time using O(N2/log N) processors for an N vertex polygon, where k is the link-diameter of the polygon in consideration. Both of these algorithms require only scan operations (as well as local neighbor communication) as the means of inter-processor communication. Thus, the algorithms can not only be implemented on an EREW PRAM, but also on a hypercube connected parallel machine, which is a more practical machine model. All these algorithms were implemented on the Connection Machine, and various performance tests were conducted.
Retrieval of image data from a centralized database may be subject to bandwidth limitations, whether due to a low-bandwidth communications link or to contention from simultaneous accesses over a high-bandwidth link. Progressive transmission can alleviate this problem by encoding image data so that any prefix of the data stream approximates the complete image at a coarse level of resolution. The longer the prefix, the finer the resolution. In many cases, as little at 1 percent of the image data may be sufficient to decide whether to discard the image, to permit the retrieval to continue, or to restrict retrieval to a subsection of the image. Our approach treats resolution not as a fixed attribute of the image, but rather as a resource which may be allocated to portions of the image at the direction of a user-specified priority function. The default priority function minimizes error by allocating more resolution to regions of high variance. The user may also point to regions of interest requesting priority transmission. More advanced target recognition strategies may be incorporated at the user's discretion. Multispectral imagery is supported. The user engineering implications are profounded. There is immediate response to a query that might otherwise take minutes to complete. The data is transmitted in small increments so that no single user dominates the communications bandwidth. The user-directed improvement means that bandwidth is focused on interesting information. The user may continue working with the first coarse approximations while further image data is still arriving. The algorithm has been implemented in C on Sun, Silicon Graphics, and NeXT workstations, and in Lisp on a Symbolics. Transmission speeds reach as high as 60,000 baud using a Sparc or 68040 processor when storing data to memory; somewhat less if also updating a graphical display. The memory requirements are roughly five bytes per image pixel. Both computational and memory costs may be reduced by increasing the time between priority computations. Progressive transmission improves the performance of lossless LZW or Huffman compression. If exact reconstruction of the image is not needed, the transmitted values may be quantized to achieve further compression. Our experience shows the technique to be flexible enough to support a variety of situations.
A computational framework is presented to extract geometric structures and a region of interest from a monochrome image for the detection of man-made objects in a non-urban scene. The framework is based on the principles of perceptual organization. Several new techniques are developed to implement this framework. Examples using real complex images are presented to show the effectiveness of the approach.
This work presents an information theoretic approach to the problem of building extraction in aerial imagery. Boundary curvature is selected as a suitable measure of shape information and its relationship to Shannon's concept of entropy is discussed. A novel concept of global and local shape entropy is defined. A mathematically tractable approach to connected region growing is developed utilizing this novel concept of global shape entropy. An algorithm for the automatic extraction of building silhouettes in aerial imagery is presented. The algorithm is shown to be shape independent by successfully extracting rectangular buildings and a circular storage tank in real imagery. The limitations of the algorithm are discussed and possible future research is proposed.
Most current object recognition systems are based on a 3D model which is used to describe the image projection of an object over all viewpoints. We introduce a new technique which can predict the geometry of an object under projective transformation. The object geometry is represented by a set of corresponding features taken from two views. The projected geometry can be constructed in any third view, using a viewpoint invariant derived from the correspondences.
A fundamental question in the ability to determine the effectiveness of any computer vision algorithm is the construction and application of proper test data suites. The purpose of this paper is to develop an understanding of the underlying requirements necessary in forming test suites, and the limitations that restricted sample sizes have on determining the testability of computer vision algorithms. With the relatively recent emergence of high performance computing, it is now highly desirable to perform statistically significant testing of algorithms using a test suite containing a full range of data, from simple binary images to textured images and multi-scale images. Additionally, a common database of test suites would enable direct comparisons of competing imagery exploitation algorithms. The initial step necessary in building a test suite is the selection of adequate measures necessary to estimate the subjective attributes of images, similar to the quantitative measures from speech quality. We will discuss image measures, their relation to the construction of test suites and the use of real sensor data or computer generated synthetic images. By using the latest technology in computer graphics, synthetically generated images varying in degrees of distortion both from sensors models and other noise source models can be formed if ground-truth information of the images is known. Our eventual goal is to intelligently construct statistically significant test suites which would allow for A/B comparisons between various computer vision algorithms.
This paper presents the results of a comparative study of various Fourier descriptor representations and their use in the recognition of unconstrained handwritten digits. Certain characteristics of five Fourier descriptor representations of handwritten digits are discussed, and illustrations of ambiguous digit classes introduced by use of these Fourier descriptor representations are presented. It is concluded that Fourier descriptors are practically effective only within the framework of an intelligent system capable of reasoning about digit hypotheses. We describe a hypothesis-generating algorithm based on Fourier descriptors which allows the classifier to associate more than one possible digit class with each input. Such hypothesis-generating schemes can be very effective in systems employing multiple classifiers. We compare the performance of the five Fourier descriptor representations based on experiment results produced by the hypothesis-generating classifier for a test set of 14,000 handwritten digits. It is found that some Fourier descriptor formulations are more successful than others for handwritten digit recognition.
Many Hughes tactical and strategic programs need high performance image processing. For example, photo-interpretation applications can require up to four orders of magnitude speedup over conventional computer architectures. Therefore, parallel processing systems are needed to help close the processing gap. Vision applications can usually be decomposed into three levels of processing called high, intermediate, and low level vision. Each processing level typically requires different types of numeric/symbolic computation, processing task granularities, and communications bandwidths. No parallel processing system is commercially available that is optimized for the entire range of computations. To meet these processing challenges, the image understanding architecture (IUA) has been developed by Hughes in collaboration with the University of Massachusetts. The IUA is a heterogeneous, hierarchical, associative parallel processor that is organized in three levels corresponding to the vision problem. Its lowest level consists of a large content addressable array parallel processor. This array of 'per pixel' bit serial processors is used for fixed point, low level numeric, and symbolic computations. The middle level is an interface communications array processor (ICAP). ICAP is an array of digital signal processing chips from TI TMS320Cx line, used for high speed number crunching. The highest level is the symbolic processing array. It is an array of general purpose microprocessors in which the artificial intelligence content of the image understanding software resides. A set of benchmarks from the DARPA/ORD sponsored SCORPIUS program were developed using the IUA. The set of algorithms included low level image processing as well as high level matching algorithms. Benchmark performance on the second generation IUA hardware is over four orders of magnitude faster than equivalent algorithms implemented on a DEC VAX 8650. The first generation hardware is operational. Development of the second generation hardware and software for DARPA's Unmanned Ground Vehicle Program is under way in collaboration with the University of Massachusetts and Amerinex Artificial Intelligence.
This paper is a case history of spray paint optimization system based on machine vision technology in a factory automation application. The system is implemented as an industrial control for a reciprocating electrostatic sprayer used for priming and painting of armor plate for military ground vehicles. Incoming plates are highly variable in size, shape, and orientation, and are processes in very small production lots. A laser imager is used to digitize visual cross sections of each plate one line at a time. The raster lines are then assembled into a two dimensional image and processed. The spray pattern is optimized for precise paint coverage with minimum overspray. The paint optimizer system has yielded a measured 25 percent savings in bulk paint use, resulting in less booth and equipment maintenance, reduced paint fumes in the atmosphere, and reduced waste disposal, and now has several months of successful production history.
The central goal of the Research and Development for Image Understanding Systems (RADIUS) project is to focus image understanding (IU) research efforts on the needs of operational intelligence applications. Model-supported exploitation (MSE) is the underlying concept to be validated, developed, demonstrated, and evaluated. Site modeling is the principle component of the MSE concept that RADIUS must develop. IU research and development will be directed towards the creation and maintenance of site models, which will be used directly by the image analyst (IA) or by other IU algorithms as an intelligent assistant to the IA. Site models will provide a geometric means of associating collateral data with site components. Hughes Aircraft Company is teamed with BDM International, Inc. (BDM), Washington, D.C., Control Data Corporation (CDC), Minneapolis, Minnesota, Hughes Research Laboratories (HRL), and the University of Southern California (USC).
The Pap smear is the universally accepted test used for cervical cancer screening. In the United States alone, about 50 to 70 million of these test are done annually. Every one of the tests is done manually be a cytotechnologist looking at cells on a glass slide under a microscope. This paper describes PAPNET, an automated microscope system that combines a high speed image processor and a neural network processor. The image processor performs an algorithmic primary screen of each image. The neural network performs a non-algorithmic secondary classification of candidate cells. The final output of the system is not a diagnosis. Rather it is a display screen of suspicious cells from which a decision about the status of the case can be made.
Nearest neighbor approaches and a new neural network, the Binary Diamond, are used for the classification of images of ground pixels obtained by LANDSAT satellite. The performances are evaluated by comparing classifications of a scene in the vicinity of Washington DC. The problem of optimal selection of categories is addressed as a step in the classification process.
Image pattern recognition is presented as three sequential tasks: feature extraction, object plausibility estimation (determining class likelihoods), and decision processing. Several data- driven techniques yield discriminant functions to produce object plausibility estimates from image features, including traditional statistical methods and neural network approaches. A statistical learning algorithm which integrates multiple-regression algorithms, functional networking strategies, and a statistical modeling criterion is presented. It provides a non- parametric learning algorithm for the synthesis of discriminant functions. Image understanding tasks such as object plausibility estimation require robust modeling techniques to deal with the uncertainty prevalent in real-world data. Specifically, these complex tasks require robust and cost-effective techniques to successfully integrate multi-source information. AbTech and others have shown that implementation of the statistical learning concepts discussed provide a modeling approach ideal for information fusion tasks such as autonomous object recognition for tactical targets and space-based assets. The results of using this approach to develop a prototype aircraft recognition system is presented.
Rome Laboratory has designed and implemented a neural network based automatic target recognition (ATR) system under contract F30602-89-C-0079 with Booz, Allen & Hamilton (BAH), Inc., of Arlington, Virginia. The system utilizes a combination of neural network paradigms and conventional image processing techniques in a parallel environment on the IE- 2000 SUN 4 workstation at Rome Laboratory. The IE-2000 workstation was designed to assist the Air Force and Department of Defense to derive the needs for image exploitation and image exploitation support for the late 1990s - year 2000 time frame. The IE-2000 consists of a developmental testbed and an applications testbed, both with the goal of solving real world problems on real-world facilities for image exploitation. To fully exploit the parallel nature of neural networks, 18 Inmos T800 transputers were utilized, in an attempt to provide a near- linear speed-up for each subsystem component implemented on them. The initial design contained three well-known neural network paradigms, each modified by BAH to some extent: the Selective Attention Neocognitron (SAN), the Binary Contour System/Feature Contour System (BCS/FCS), and Adaptive Resonance Theory 2 (ART-2), and one neural network designed by BAH called the Image Variance Exploitation Network (IVEN). Through rapid prototyping, the initial system evolved into a completely different final design, called the Neural Network Image Exploitation System (NNIES), where the final system consists of two basic components: the Double Variance (DV) layer and the Multiple Object Detection And Location System (MODALS). A rapid prototyping neural network CAD Tool, designed by Booz, Allen & Hamilton, was used to rapidly build and emulate the neural network paradigms. Evaluation of the completed ATR system included probability of detections and probability of false alarms among other measures.
Multiple sensor imaging systems are changing the approach to the challenging problem of automatic target recognition (ATR). This paper summarizes a research effort to demonstrate the utility of neural networks in processing hyperspectral imagery for target detection and classification. Pixel registered imagery containing 32 spectral bands in the 2.0 to 2.5 mm range was used to train and test a backpropagation neural network for detection of camouflaged targets. An initial neural network was trained and tested using all 32 spectral bands resulting in a probability of correct classification (Pcc) at the pixel level of 98.7 percent. Because of the high degree of correlation between features (i.e., spectral bands), the dimensionality of the feature set was reduced to 11 spectral bands using a Karhunen-Loeve expansion. The neural network was reconfigured and retrained resulting in a Pcc of 99.8 percent. This second neural network was implemented in hardware on the Intel ETANN chip, a special purpose analog neural network chip resulting in a Pcc of 96.3 percent. A single ETANN chip is capable of classifying 400,000 pixels per second. The capability of classifying each individual pixel in a hyperspectral image in real time radically alters the possible approaches in an ATR scenario.
Complex computer vision systems, such as those used for automatic target recognition, have a need to aggregate evidence from a variety of sources when making decisions. We present two complementary methodologies, based on the theory of fuzzy sets, to fuse partial support for a decision or hypothesis from a variety of sources into a final degree of confidence. The sources may include different sensors, pattern recognition algorithms, expert systems, features; evidence over time, intelligence data, etc. Both methodologies--the fuzzy integral and fuzzy aggregation networks--are powerful and flexible means of combining partial degrees of support into a final decision. The choice of technique is governed by the demands of the problem, availability and type of training data, and a priori knowledge of the situation. The fuzzy integral is a nonlinear functional which combines (possibly subjective) knowledge concerning the worth of subsets of sources, in the form of a fuzzy measure, with objective evidence supplied by the sources. Under certain conditions, the fuzzy measure are actually Dempster Shafer belief or plausibility measures. Means to automatically determine the measures from histograms of training data have been developed. Fuzzy aggregation networks are best utilized for hierarchical decision making where input/output training sets (much like those for neural networks) are available. Each node in the network is modeled as a general fuzzy set theoretic connective: union operator, intersection operator, generalized mean, or hybrid operator. The type of node, its parameters, and connection weights are learned during training. Both approaches have been applied to multisensor automatic target recognition problems with excellent results. They have also been successfully employed in image segmentation and general multicriteria decision making. Theoretical and practical issues, along with results, will be discussed.
The detection of human-made objects with low false alarm rates in lit imagery remains a technically challenging problem. In addition, many currently proposed systems for autonomous air vehicles require the algorithms to process images at the rate of 30 frames/second (real-time). Parallel distributed processes, such as neural networks, offer potential solutions to problems of this complexity. The current algorithm takes advantage of the presence of both long straight lines and curvature points in human-made objects. These features are among those recognized pre-attentively by the human visual system. It is a generalization of work done by Sha'Ashua and Ullman at MIT on the extraction of so-called salient features. The addition of curvature detection, however, is what allows the algorithm to achieve acceptable false alarm rates. On simulated FUR imagery taken from the U.S. Army C2NVEO terrain board, low false alarm rates have been achieved while maintaining 100% target detection.
RCDE is a software environment for the development of image understanding algorithms. The application focus of RCDE is on image exploitation where the exploitation tasks are supported by 2D and 3D models of the geographic site being analyzed. An initial prototype for RCDE is SRI's Cartographic Modeling Environment (CME). This paper reviews the CME design and illustrates the application of CME to site modeling scenarios.