Machine vision remains a remarkably elusive goal. There are two separate categories in which the state of current machine vision systems fall short. Using a familiar dichotomy I shall refer to these two categories as hardware and software. In a more descriptive sense I mean by hardware and software
The sequential processing paradigm limits current solutions for computer vision by restricting the number of functions which naturally map onto Von Neumann computing architectures. A variety of physical computing structures underlie the massive parallelism inherent in many visual functions. Therefore, further advances in general purpose vision must assume inseparability of function from structure. To combine function and structure we are investigating connectionist architectures using PUNNS (Perception Using Neural Network Simulation). Our approach is inspired and constrained by the analysis of visual functions that are computed in the neural networks of living things. PUNNS represents a massively parallel computer architecture which is evolving to allow the execution of certain visual functions in constant time, regardless of the size and complexity of the image. Due to the complexity and cost of building a neural net machine, a flexible neural net simulator is needed to invent, study and understand the behavior of complex vision algorithms. Some of the issues involved in building a simulator are how to compactly describe the interconnectivity of the neural network, how to input image data, how to program the neural network, and how to display the results of the network. This paper describes the implementation of PUNNS. Simulation examples and a comparison of PUNNS to other neural net simulators will be presented.
Four optical implementations of bidirectional associative memories (BAMs) are presented. BAMs are heteroassociative content addressable memories (CAMs). A BAM stores the m binary associations (A1, B1), ..., (Am, Bm) , where A is a point in the Boolean n-cube and B is a point in the Boolean p-cube. A is a neural network of n bivalent or continuous neurons ai; B is a network of p bivalent or continuous neurons bi. The fixed synaptic connections between the A and B networks are represented by some n-by-p real matrix M. Bidirectionality, forward and backward information flow, in neural nets produces two-way associative search for the nearest stored pair (Ai, Bi) to an input key. Every matrix is a bidirectionally stable hetero-associative CAM for boh bivalent and continuous networks. This generalizes the well-known unidirectional stability for autoassociative networks with square symmetric M. When the BAM neurons are activated, the network quickly evolves to a stable state of two-pattern reverberation, or pseudo-adaptive resonance. The stable reverberation corresponds to a system energy local minimum. Heteroassociative pairs (Ai, Bi) are encoded in a BAM M by summing bipolar correlation matrices, M = X1T Y1 + ... + XmT Ym , where Xi (Yi) is the bipolar version of Ai (Bi), with -1s replacing Os. the BAM storage capacity for reliable recall is roughly m < min(n, p)--pattern number is bounded by pattern dimensionality. BAM optical implementations are divided into two approaches: matrix vector multipliers and holographic correlators. The four optical BAMs described respectively emphasize a spatial light modulator, laser diodes and high-speed detectors, a reflection hologram, and a transmission hologram.
Any effort to develop efficient schemes for image representation must begin by pondering the nature of image structure and image information. The fundamental insight which makes compact coding possible is that the statistical complexity of images does not correspond to their resolution (number of resolvable states) if they contain nonrandom structure, coherence, or local auto-correlation. These are respects in which real images differ from random noise: they are optical projections of 3-D objects whose physical constitution and material unity ensure locally homogeneous image structure, whether such local correlations are as simple as luminance value, or a more subtle textural signature captured by some higher-order statistic. Except in the case of synthetic white noise, it is not true that each pixel in an image is statistically independent from its neighbors and from every other pixel; yet that is the default assumption in the standard image representations employed in video transmission channels or the data structures of storage devices. This statistical fact - that the entropy of the channel vastly exceeds the entropy of the signal - has long been recognized, but it has proven difficult to reduce channel bandwidth without loss of resolution. In practical terms, the consequence is that the video data rates (typically 8 bits for each one of several hundred thousand pixels in an image mosaic, resulting in information bandwidths in the tens of millions of bits per second) are far more costly informationally than they need to be, and moreover, no image structure more complex than a single pixel at a time is explicitly extracted or encoded.
A function and its Fourier transform are related by the "uncertainty inequality" of Heisenberg. The autocorrelation and spectral power density of a function satisfy a similar relation which can be interpreted as a duality between noise and aliasing. This duality can be applied to stochastic collections of image signals that are discretely sampled- either by biological retinas or by pixelated computer displays. It follows that alias artifacts due to sampling an image at less than the Nyquist frequency can be diminished or entirely avoided at the expense of noise introduced by the sampling process. We suggest that the human daylight vision system employs this strategy, in part in order to eliminate inconsistencies in pattern recognition between the foveal high spatial resolution subsystem and the extra-foveal low spatial resolution subsystem. Implications of these ideas for the design of robot and other machine imaging and display systems in the context of limited bandwidth are discussed.
We summarize our recent work on a) a model based approach for estimation of kinematics and structure of a rigid object from a sequence of noisy images, b) extraction of edges and linear features from noise free and noisy real images using the directional derivatives estimated from a local non stationary random field model, and c) segmentation of textured images using Gauss Markov random field models and simulated annealing techniques. The research work summarized in this paper is supported in part by the NSF Grant DCI-84-51010, match-ing funds from IBM, AT&T, Hughes Aircraft Company, and TRW and by the Airforce Office of Scientific Research under the Contract F 49620-85-C-0071. The author acknowledges the contributions of T.J. Broida, S. Chatterjee, A. Rangarajan, T. Simchony, V. Venkateswar and Y. T. Zhou to the research results reported here.
The overall objective of the Strategic Computing (SC) Program of the Defense Advanced Research Projects Agency (DARPA) is to develop and demonstrate a new generation of machine intelligence technology which can form the basis for more capable military systems in the future and also maintain a position of world leadership for the US in computer technology. Begun in 1983, SC represents a focused research strategy for accelerating the evolution of new technology and its rapid prototyping in realistic military contexts. Among the very ambitious demonstration prototypes being developed within the SC Program are: 1) the Pilot's Associate which will aid the pilot in route planning, aerial target prioritization, evasion of missile threats, and aircraft emergency safety procedures during flight; 2) two battle management projects one for the for the Army, which is just getting started, called the AirLand Battle Management program (ALBM) which will use knowledge-based systems technology to assist in the generation and evaluation of tactical options and plans at the Corps level; 3) the other more established program for the Navy is the Fleet Command Center Battle Management Program (FCCBIVIP) at Pearl Harbor. The FCCBMP is employing knowledge-based systems and natural language technology in a evolutionary testbed situated in an operational command center to demonstrate and evaluate intelligent decision-aids which can assist in the evaluation of fleet readiness and explore alternatives during contingencies; and 4) the Autonomous Land Vehicle (ALV) which integrates in a major robotic testbed the technologies for dynamic image understanding, knowledge-based route planning with replanning during execution, hosted on new advanced parallel architectures. The goal of the Strategic Computing computer vision technology base (SCVision) is to develop generic technology that will enable the construction of complete, robust, high performance image understanding systems to support a wide range of DoD applications. Possible applications include autonomous vehicle navigation, photointerpretation, smart weapons, and robotic manipulation. This paper provides an overview of the technical and program management plans being used in evolving this critical national technology.
This paper presents an overview of the DARPA sponsored SCORPIUS program. The SCORPIUS program is a research effort whose goal is to combine mature technologies from DARPA's Image Understanding and Computer Architecture research areas in a demonstration of a real world application, automated exploitation of aerial imagery. The overall vision system under development is described. The system architecture on which the vision system is being implemented is discussed. This system is utilizing two machines developed under the Strategic Computing Initiative, the WARP systolic array and the Butterfly multiprocessor.
Automated information extraction from aerial imagery has proven to be a very difficult problem. Two decades of statistical pattern recognition research have indicated the need for more robust approaches. Image understanding, which integrates low level pattern recognition symbolizers and knowledge based artificial intelligence techniques, is the favored research approach for the future. Presently, however, the bulk of the image understanding efforts have not exploited the knowledge based aspects of information extraction from aerial imagery, but rather have been stymied in efforts devoted to pixel to symbol transformations that treat most landscape patterns as background noise. This paper outlines the physical structuring of landscapes which is believed to be a necessary ingredient for knowledge based aerial image understanding. A successful approach for manual interpretation of terrain information from aerial imagery is outlined. Two examples of computer terrain analysis that utilize terrain structural information are briefly introduced and sources of information on the physical structure of landscapes and digital terrain data are presented.
A modified version of the split-and-merge algorithm is described. During the merging and splitting phases, an F test is used to test the uniformity of the four quads. During the region formation phase, a compound predicate consisting of the F test and the mean is used. Results are presented for several images.
This paper presents a concept for automated model-based image registration. The overall approach relies on computing a correspondence between a three-dimensional (1-D) data base and a reconnaissance image in terms of an appropriate sensor model as a function of such parameters as sensor location, orientation, scale, etc. The construction of such models is illustrated for frame camera and SAR sensors. Initial (platform ephemeris) parameter estimates are refined to achieve accurate correspondence by techniques which optimally match a collection of 3-D lineal features to an edge set extracted from the image. In the case of 3-D data bases consisting of stereo imagery (such as PPDBs) 3-D lineal features are automatically generated by applying the Marr-Poggio-Grimson computational models for stereo vision. The computed 3-D data-base-to-image correspondence can be used to predict accurately the image location of any 3-D point and to develop an elevation surface model associated with the image. This also leads to establishing automated model-based image-to-image registration.
Smoothing splines have been used in machine vision to reconstruct visible surfaces of objects in the scene from depth data. While they remove noise from various sources, they exhibit poor performance along edges and boundaries. To cope with such anomalies, we study a more general class of smoothing splines, which preserve corners and discontinuities. Cubic splines are investigated in detail since they are easy to implement and provide satisfactory results for most applications. In particular they produce smooth curves near all data points except those marked as discontinuities or creases. We also introduce a discrete regularization method which is used to locate corners and discontinuities in the data points before the continuous regularization is applied.
Image processing places enormous computational demands on current computer systems and in part motivates the design of fine-grained, massively parallel machines. While several architectures have been proposed for image applications, the approach taken with the Connection Machine has been to use a very large number of simple processors, each with a small amount of local memory, and each capable of communicating in an efficient manner with all other processors. In this report, we give our experience with the Connection Machine on a number of computer vision and vision related problems.
The parallel processor embodied in the GAPP device was originally developed by Martin Marietta for military target recognition applications and is commercially available in integrated circuit form from NCR. The device represents the most powerful hardware technology available for extracting the size and shape information necessary to describe objects in an image. The focus of this paper is how to develop a system solution to an image understanding problem using GAPP devices. The paper discusses the basic system architecture issues, as well as hardware/software tradeoffs and development cycle considerations.
This paper will describe PIPE and how its capabilities can be used for applications involving moving, dynamic imagery. PIPE is a high performance parallel processor with an architecture specifically designed for the processing of video images at up to 60 fields per second. The unit is modular and programmable. It can process sequences of images with multiple parallel stages. Multiple data pathways between the stages in forward, recursive and backward directions allow images to interact in many useful ways. Due to its architecture, PIPE inherently allows the processing of many images simultaneously for working with dynamic scenes or multiple combinations of the same image. Originally designed for robot guidance applications, PIPE has many features that make it well suited for use in other applications where the input images are moving. Illustrations of three types of dynamic image processing examples will be presented in this paper.
This paper describes an interactive programming environment and tools designed to facilitate the rapid implementation, testing and evaluation of algorithms and systems for image processing, image understanding, and 2- and 3-D graphics processing. The environment, termed Scope, is Lisp-based, resides on a Symbolics 36xx Lisp machine, and provides a tightly-coupled interface between the Symbolics Lisp machine and a Pixar 2D Image Computer. In particular, the environment provides an integrated set of utilities for program development and program maintenance based on the Symbolics Genera operating system. In addition, a wide range of near-real-time image and symbolic operations are provided, and a variety of image and symbolic representations are supported. The environment is specifically designed to facilitate crosstalk between numeric and symbolic data representations and processes. This paper discusses the major features of the environment and their use in developing and investigating selected image understanding capabilities.
This project examines some parallel architectures designed for image processing, and then addresses their applicability to the problem of image segmentation by texture analysis. Using this information, and research into the structure of the human visual system, an architecture for textural segmentation is proposed. The underlying premise is that textural segmentation can be achieved by recognizing local differences in texture elements (texels). This approach differs from most of the previous work where the differences in global, second-order statistics of the image points are used as the basis for segmentation. A realistic implementation of this approach requires a parallel computing architecture which consists of a hierarchy of functionally different nodes. First, simple features are extracted from the image. Second, these simple features are linked together to form more complex texels. Finally, local and more global differences in texels or their organization are enhanced and linked into boundaries.
Progress in the field of image understanding is partly hindered by the gap between numerical and symbolic processing, both of which are required. Image processing algorithms are best expressed by procedural programming languages, such as C or FORTRAN, while "understanding" is perhaps better addressed by functional languages, such as LISP. Image understanding involves both processing and analysis of images, thus including numeric and symbolic computation. Hence the need for both programming paradigms. This paper describes the development of a new vision workstation - a tool that combines numeric (in C) and symbolic (in LISP) programs in one package. The system described is based on an IBM Personal Computer AT. It provides a low-cost solution to computer vision educational needs. A more powerful implementation, based on an IBM RT/PC, is currently under development.
This study addresses the question of whether static and dynamic stereopsis require the perception of form. The retinal image requirements of the visual mechanisms subserving form perception and stereopsis are not only distinct but potentially antagonistic. Form perception requires the retinal image to have luminance gradients that are steep enough to produce suprathreshold temporal transients in the receptors during normal eye movements. Stereopsis, on the other hand, requires identification of corresponding luminance gradients in the two retinal images so that their retinal disparity can be calculated. Thus, while the motion of the retinal image caused by normal eye movements is essential to form perception, it may be detrimental to stereopsis. We eliminated the motion of the retinal image that would normally have occurred during eye movements by using a pair of SRI dual-Purkinje-image eyetrackers and stimulus deflectors to stabilize the retinal images of selected form elements. We examined the thresholds for perceiving motion in depth under stabilized and unstabilized conditions and found that the perception of motion in depth continues in the absence of monocular form perception. Likewise, when we stabilized the disparate images of a line stereogram, stereopsis persisted in the absence of form perception of those elements whose retinal disparity deter-mined their perceived depth. These results imply profound separation of the form-perception and stereopsis mechanisms.
The interpretation of aerial photographs requires a lot of knowledge about the scene under consideration. Knowledge about the type of scene: airport, suburban housing development, urban city, aids in low-level and intermediate level image analysis, and will drive high-level interpretation by constraining search for plausible consistent scene models. Collecting and representing large knowledge bases requires specialized tools. In this paper we describe the organization of a set of tools for interactive knowledge acquisition of scene primitives and spatial constraints for interpretation of aerial imagery. These tools include a user interface for interactive knowledge acquisition, the automated compilation of that knowledge from a schema-based representation into productions that are directly executable by our interpretation system, and a performance analysis tool that generates a critique of the final interpretation. Finally, the generality of these tools is demonstrated by the generation of rules for a new task, suburban house scenes, and the analysis of a set of imagery by our interpretation system.
A shared representation is crucial to effective, natural man-machine interaction. Unfortunately, little is known about peoples mental representation of shape; this prevents construction of truly effective computer-supported 3-D design media. How do people represent shape? We observe that design is typically an iterative process, starting with sketching and proceeding through detailing. Thus several different representations are required to support the design process. We investigate peoples naive, preattentive representation of shape to understand what sort of representation would best support the initial sketching stage of design. We then use our conclusions to build and evaluate a multi-representation 3-D CAD tool, and demonstrate enhanced performance.
A program in computer-assisted photo interpretation research (CAPIR) was initiated at the U.S. Army Engineer Topographic Laboratories (USAETL) in 1979. The primary objective was to develop and implement concepts to increase the productivity of the human photo interpreter tasked with extracting terrain data from aerial imagery. Early efforts focused on implementation of interactive software for direct capture of digital map data from stereo mapping and reconnaissance photography. The development of a hardcopy stereo workstation with integral computer graphic superposition has been essential to provide a suitable interactive, closed-loop environment for the photo interpreter. This paper provides a brief introduction to the problem domain of terrain analysis and outlines the evolution of the CAPIR program emphasizing the special requirements for an effective man-machine interface. The application of current CAPIR technology to typical terrain data capture problems is discussed with selected examples of technology transfer and examples of new commercial photogrammetric instrumentation incorporating graphic superposition capabilities. Opportunities to extend CAPIR concepts to softcopy terrain analysis based on adaptation of digital image processing technology are then presented. Digital image source materials and potential softcopy processing advantages are discussed and key requirements for effective softcopy terrain analysis are summarized.
An image analysis workstation is described which supports the object-oriented representation and manipulation of grayscale sensed image objects. Segmented grayscale images are transformed automatically into a novel list-oriented representation. Image objects resemble jigsaw puzzle pieces which fit together to form a grayscale image. An image object contains information about its appearance and methods for its manipulation. Image objects are defined within the Flavors object-oriented language and are implemented on a Symbolics LISP machine. Applications include multi-sensor image fusion, data fusion, automated cartography, automated reconnaissance, and image understanding system development.
The rapid increase in the volume of digital imagery data has created the need for automated image analysis systems. The following paper suggests the design of such a system, and it also discusses the role of the imagery analyst in the development and maintenance of the system. The use of digital imagery, charting, and geodesy (MC&G) data is combined with the digital imagery data to aid in change detection tasks. The imagery analyst remains the key component in an automated system, but is relieved from performing repetitive, often trivial tasks that can best be performed by the system.