In its evolution, the primate visual system has developed impressive capabilities for recognizing complex patterns in natural images. This process involves many stages of analysis and a variety of information processing strategies. This paper concentrates on the importance of 'information bottlenecks,' which restrict the amount of information that can be handled at different stages of analysis. These steps are crucial for reducing the overwhelming computational complexity associated with recognizing countless objects from arbitrary viewing angles, distances, and perspectives. The process of directed visual attention is an especially important information bottleneck because of its flexibility in determining how information is routed to high-level pattern recognition centers.
The most significant property of these networks is their differential response to stimuli moving in opposite directions. A quantitative analysis shows that this directional response adapts to mean luminance levels and varies with size and speed of moving objects, as well as with coupling order among elements of a network. Both biophysical and analog hardware implementations of this class of networks are given here. Implementation of unidirectional coupling and the response to directional edges are demonstrated and shown to accord well with that of the neural network.
Psychophysical studies provide evidence of preattentive visual processing characterized by parallel operations performed on a limited set of features. Since these operations extend well beyond the foveal or high-resolution area of the visual field, one may assume that they are based on lower-resolution features. Such parallelism and data reduction imply computationally efficient processing that could be emulated for machine vision pattern recognition purposes. Several models of preattentive texture segmentation have recently been presented in the computational vision literature. This paper presents a model of human preattentive visual detection of pattern anomalies. Operating on a low-frequency, band-pass filtered image, the model detects singularities by comparing local to global statistics of contrast and edge orientation. The model has been applied to simple schematic images. It successfully predicts the asymmetry in search latencies whereby a target characterized by a preattentively detectable feature 'pops out' of a field of distractors not containing the feature, but when target and distractors are switched, serial search is required to locate the 'odd man out.' The model has also been shown to detect pattern defects on periodic, multilevel integrated circuits.
The central problem faced by the retina is to encode reliably small local differences in image intensity over a several-decade range of background illumination. The distal layers of the retina adjust the transducing elements to make this encoding possible. Several generations of silicon retinae that integrate phototransducers and CMOS processing elements in the focal plane are modeled after the distal layers of the vertebrate retina. A silicon retina with an adaptive photoreceptor that responds with high gain to small spatial and temporal variations in light intensity is described. Comparison with a spatial and temporal average of receptor response extends the dynamic range of the receptor. Continuous, slow adaptation centers the operating point of the photoreceptor around its time-average intensity and compensates for static transistor mismatch.
The authors have designed and tested a one-dimensional 64 pixel, analog CMOS VLSI chip which localizes intensity edges in real-time. This device exploits on-chip photoreceptors and the natural filtering properties of resistive networks to implement a scheme similar to and motivated by the Difference of Gaussians (DOG) operator proposed by Marr and Hildreth (1980). The chip computes the zero-crossings associated with the difference of two exponential weighting functions and reports only those zero-crossings at which the derivative is above an adjustable threshold. A real-time motion detection system based on the zero- crossing chip and a conventional microprocessor provides linear velocity output over two orders of magnitude of light intensity and target velocity.
Novel use of an analog motion detection circuit is presented. The circuit, developed by Tanner and Mead, computes motion by dividing the time derivative of intensity by its spatial derivative; the four-quadrant division is realized with a multiplier within a negative feedback loop. The authors have opened the loop and characterized the circuit as a multiplication-based motion detector, in which the output is the product of the temporal and spatial derivatives of intensity, for various light levels and various moving patterns. An application to the time-to- contact computation is presented.
A custom integrated circuit that performs imaging and feature extraction is under development as a component of an automatic object recognition system. Images are focussed directly onto the custom chip, which contains a photosensor array. On the same chip are structures to scan images out of the array, multiple lines at a time, in such a way as to present the image values of a small window of the array to a set of analog neural computational elements which are also on the chip. Each neuron sees the same image window as the others, but is controlled by an individual set of synaptic weights. Overlapping windows are sequentially scanned into the neurons, which develop a corresponding sequential set of outputs. These outputs represent a set of scanned feature maps, where the set of features is defined by the set of neural weights. Possible extracted features include oriented edges, lines, and points. The feature extraction computation is only the first step of a multi-level object recognition system, but the first stage requires large computational bandwidth. A 512 X 512 pixel imager with 16 neurons, each of which simultaneously looks at a 5 X 5 pixel window, is planned. The resulting data rate for 60 frames/sec is 6 billion multiplies and additions per second. This computation, and the imaging, will all be performed by a single chip. The authors have designed a simplified proof-of-concept chip that contains a 3 X 3 pixel imager hard-wired to a single neuron. The chip, in a simple imaging test fixture, demonstrates clearly the detection of simple features and illustrates the feasibility of combining neural processing circuitry on-chip with an imaging array.
While charge-coupled devices (CCDs) are used extensively in many image sensor applications, they are also capable of performing many basic signal processing functions. This paper presents some analog CCD circuitry for implementing an edge detection algorithm and a boundary-preserving image filter. In particular, the parallel, pipelined architecture is introduced as an efficient architecture for integrating CCD imagers with analog CCD processors for high-throughput image processing applications.
A circuit which emulates the functioning of cone photoreceptors in the vertebrate retina has been designed and tested. Cone photoreceptors exhibit a local adaptation to background illumination over many orders of magnitude while retaining a high degree of instantaneous contrast sensitivity. This behavior permits visual discrimination of objects against difficult lighting situations such as bright backgrounds. This effort includes an examination of the trade-offs in various photodetection techniques available to the designer in silicon. Photodetection is followed by separate filter and gain stages which provide the appropriate temporal behavior. The filters and gain are independently tunable to permit extensions in operation to those environments which may fall outside the capability of human vision. The circuit also includes a UV writable floating gate which uses a locally generated error signal to provide cancellation of circuit offsets due to process variability.
In the last ten years, significant progress has been made in understanding the first steps in visual processing. Thus, a large number of algorithms exist that locate edges, compute disparities, estimate motion fields and find discontinuities in depth, motion, color and intensity. However, the application of these algorithms to real-life vision problems has been less successful, mainly because the associated computational cost prevents real-time machine vision implementations on anything but large-scale expensive digital computers. We here review the use of analog, special-purpose vision hardware, integrating image acquisition with early vision algorithms on a single VLSI chip. Such circuits have been designed and successfully tested for edge detection, surface interpolation, computing optical flow and sensor fusion. Thus, it appears that real-time, small, power-lean and robust analog computers are making a limited comeback in the form of highly dedicated, smart vision chips.
Parameter studies for Markov random field (MRF) models of early vision performed using a DAP array processor are discussed. A simple two-dimensional formulation for MRFs with coupled line processes on a rectangular grid is analyzed. Numerous experimental results are presented for real and synthetic intensity images. We empirically analyze the interrelationships between parameters controlling the degree of smoothness in the solution, the discontinuity threshold, and the importance of input data. Trends are identified, especially with regard to methods for automatically setting parameters. In addition, a comparison of results across different resolutions is made.
Pixel level image processing algorithms have to work with noisy sensor data to extract spatial features. This often required the use of operators which amplify high frequency noise. One method of dealing with this problem is to perform image smoothing prior to any use of spatial differentiation. Such spatial smoothing results in the spread of object characteristics beyond the object boundaries. Identification of discontinuities and explicit use of these as boundaries for smoothing has been proposed as a technique to overcome this problem. This approach has been used to perform cooperative computations between multiple descriptions of the scene, e.g., fusion of edge and motion field for a given scene. This approach is extended to multisensor systems. The discontinuities detected in the output of one sensor are used to define regions of smoothing for a second sensor. For example, the depth discontinuities present in laser radar can be used to define smoothing boundaries for infrared focal plane arrays. The authors have recently developed a CMOS chip (28 X 36) which performs this task in real time. This chip consists of a resistive network and elements that can be switched ON or OFF, by loading a suitable bit pattern. The bit pattern for the control of switches can be generated from the discontinuities found in the output of sensor #1. The output of sensor #2 is applied to the resistive network for data smoothing. If all the switches are held in conducting state, this chip performs the usual data smoothing. However, if switches along object boundaries are turned OFF, a region for bounded smoothing is created. In this chip, information from a third sensor data (e.g., intensity data from laser radar) can be incorporated in the form of a map of 'confidence in data.' The results obtained with this chip using synthetic data and other potential applications of this chip are described.
Segmentation is a basic problem in computer vision. The tiny-tanh network, a continuous-time network that segments scenes based upon intensity, motion, or depth is introduced. The tiny- tanh algorithm maps naturally to analog circuitry since it was inspired by previous experiments with analog VLSI segmentation hardware. A convex Lyapunov energy is utilized so that the system does not get stuck in local minima. No annealing algorithms of any kind are necessary- -a sharp contrast to previous software/hardware solutions of this problem.
Design of a high-speed stereo vision system in analog VLSI technology is reported. The goal is to determine how the advantages of analog VLSI--small area, high speed, and low power-- can be exploited, and how the effects of its principal disadvantages--limited accuracy, inflexibility, and lack of storage capacity--can be minimized. Three stereo algorithms are considered, and a simulation study is presented to examine details of the interaction between algorithm and analog VLSI implementation. The Marr-Poggio-Drumheller algorithm is shown to be best suited for analog VLSI implementation. A CCD/CMOS stereo system implementation is proposed, capable of operation at 6000 image frame pairs per second for 48 X 48 images, and faster than frame rate operation on 256 X 256 binocular image pairs.
The authors describe the negative fuse, an analog model which encourages boundary completion in early vision regularization algorithms. This algorithm is an extension of the successful implementation of line processes in analog VLSI using the resistive fuse (Harris et al.). The negative fuse provides for true negative resistance regions for the enhancement of edges, making long connected edges more likely to occur. This model has a natural mapping into inexpensive, fast, low-power analog hardware. The authors discuss the performance of a negative fuse element fabricated in VLSI and show simulations of network performance on digitized camera images.
A digital image correlation chip was developed as part of an electronic module library for real- time industrial machine vision applications. A core algorithm of the authors' systems is binary edge image correlation. The other electronic modules described in this paper include a monochrome edge detector and color edge detector. These modules were combined in a variety of ways for different applications. High efficiency, in terms of processing speed and hardware size, of the vision systems was confirmed in laboratory prototypes.
Many applications require the ability to detect and track moving objects against moving backgrounds. If an object's signal is less than or comparable to the variations in the background, sophisticated techniques must be employed to detect the object. An analog retina model that adapts to the motion of the background in order to enhance objects moving with a velocity different than the background velocity is presented. A computer simulation that preserves the analog nature of this model and its application to real and simulated data are described. The concept of an analog 'Z' focal plane implementation is also presented.
A CMOS VLSI chip that determines the position and orientation of an object is described. The chip operates in a continuous-time analog fashion, with a response time as short as 200 s and power consumption under 50 mW. A self-contained phototransistor array acquires the image directly, and the output is a set of eight currents from which the position and orientation can be found. Orientation is determined to within 2% or better for moderately sized and sufficiently elongated objects. Chip dimensions are 7900 m by 9200nm.
The retina computes to let us see, but can we see the retina compute? Until now, the answer has been 'no' because the unconscious nature of the processing hides it from our view. Here we overcome the barrier of our closeness and describe what (to our knowledge) is the first method of seeing computations performed throughout the retina. This is achieved by using neurophysiological data to construct a model of the retina, and using a special purpose image processing computer (PIPE) to implement the model in real time. Processing in the model is organized into stages corresponding to computations performed by each retinal cell type. The final stage is the iransient (change detecting) ganglion cell. A CCD camera forms the input image and the activity of a selected retinal cell type is the output which is displayed on a TV monitor. By changing the retina cell driving the monitor, the progressive transformations of the image by the retina can be observed. Our simulations demonstrate the ubiquitous presence of temporal and spatial variations in the patterns of activity generated by the retina which are fed into the brain. The dynamical aspects make these patterns very different from those generated by the common DOG (Difference of Gaussian) model of receptive field. Because the retina is so successful in biological vision systems, the processing we describe here may be useful in machine vision.