A review of optical pattern recognition algorithms and techniques for various levels of computer vision, reaching the recent upper levels of artificial intelligence, are presented, and briefly summarized.
Two stage discriminant analysis has been proposed for multi class recognition. In the second stage, multiple discriminant analysis is applied to the identification for each set of classes, which are not distinctly classified in the first stage. The proposed method is applied to character recognition for the method estimation. The recognition rate was 99.3% for 91 categories of alphanumerics and special symbols. The recognition speed was 20 milliseconds per character, when this analysis program was executed on image pipelined processors. It has been shown that this method is further applicable to character sequence recognition without the need for a character isolation process.
An Hybrid model for structured natural textures is presented. The texture is considered to be composed of subpatterns, called primitives, which occur with a certain regularity. The description of the texture is complete when the various classes of primitives are described with their placement rules. Classes of primitives are described in two steps : firstly, the shape coded with Fourier descriptors and secondly, the microtexture modeled using a reduced set of joint probability distribution (obtained by vector quantization). This model is well adapted for the texture synthesis,
Stereo vision measurement is useful for applications in inspection and could lead to the integration of the inspection task into the manufacturing process, resulting in an improvement in product quality. In this paper, an analysis for an inspection cell using single-image stereo vision is described and the practical problems and error sources are discussed. This paper also presents the error specifications for designing a system accurate to within .001 inch.
A practical system for computing attitude angles from the image of four rectangular, coplanar points has a number of potential error sources that degrade the accuracy of the attitude computation. An error analysis is presented for the Robot Locating System (RLS), which is a multisensor laboratory testbed that seeks to compute attitude angles from an image of four points. This paper discusses potential error sources and their relative effects for the RLS system. The error sources are discussed from a physical point of view and expected error characteristics are described. These error sources are then evaluated using a software simulation of the RLS system.
Hybrid optical correlators, which use spatial light modulators and solid state imagers to interface computers and optical Fourier correlators, will allow the rapid changing of filters and digital pre- or post-correlation processing in vision systems. However, optical correlation (particularly with phase-only filtration) is notoriously sensitive to mismatches of rotation and magnification between the reference image and the input image. Such sharpness is not always desirable if the correlation pattern is to be used in a control system. Two methods of conforming the response of an inherently sharp-response optical correlation to an in plane rotation between reference and object images are discussed. The objective is to condition the correlation for better suitability to a vision--controlled system. The first method is controlled convolutional blurring of a reference image followed by construction of a matched filter (perhaps implemented as a phase-only filter). An alternative is discussed in more detail: it is a synthesis from sharp filters, so weighted as to produce a specified off--center behavior.
This paper shows the application of waveguide hologram in optical matched filtering. Waveguide holograms of one- and two-dimensional images were formed in two-layer waveguides. A 6000 A thin As-S film was used as a recording medium. The diffraction efficiency of waveguide Fburier hologram was about 30%. No schemes of optical correlators were studied experimentally.
In this paper, we present a newly developed hybrid multi-channel real-time pattern recognition system. Two modified commercial liquid crystal televisions are applied as a real-time incoherent to coherent image transducer and a device to produce converging wavelets with different focal positions. Taking advantages of the cross-grating nature of the LCTV screen, a multi-channel correlator becomes possible. This hybrid system has both the high processing speed of an optical system and the flexibility of an electronic system.
This article is in two parts. Part I discloses the invention of a new range finding method, and Part II describes a novel video display architecture useful for viewing data acquired by such a range finder. Range finding by diffraction depends on the curvature of the wave front radiated from a point source as a proportional measure of distance. The novel display architecture, Random Address Processing, is a form of content-addressable memory built into a raster-scan graphics display.
Various new 1-D processing techniques for handling typically 2-D data problems are advanced. These include synthetic aperture radar, spotlight synthetic aperture radar, and pattern recognition. Quantitative data for 2-D optical character recognition using 1-D processing is detailed. Perfect recognition of all 26 characters is obtained using only 1-D processing. Acousto-optic signal processing architectures are emphasized since they are typically the most attractive and fastest 1-D optical processing units. We conclude with attention to new architectures to achieve image interpolation and subtraction for new target tracking applications.
In the twenty-two years since VanderLugt's introduction of holographic matched filtering, the intensive research carried out throughout the world has led to no applications in complex environment. This leads one to the suspicion that the VanderLugt filter technique is insufficiently complex to handle truly complex problems. Therefore, it is of great interest to increase the complexity of the VanderLugt filtering operation. We introduce here an approach to the real time filter assembly: use of page oriented holographic memories and optically addressed SLMs to achieve intelligent and fast reprogramming of the filters using a 10 4 to 10 6 stored pattern base.
The standard optic-flow gradient constraint equation relates temporal and spatial derivatives of the image to the image-plane velocities. Differentiating this equation with respect to time produces a non-standard wave equation. Since the resulting equation involves second spatial and temporal derivatives of the intensity, it is probably too noise sensitive for use in computing the velocity field. However, it is likely to be of use in checking the results of a velocity computation for consistency. Most optic-flow computations assume that the intensity of a scene point remains unchanged as it moves around. The wave equation constraint can be used to check this assumption. If the equation is strongly violated, then it is a signal that the velocity field should not be believed, probably because the illumination is changing.
We are studying classification of symbolic surface de-scriptors in classes that will allow fast approaches for 3-D object recognition. In our approach for object recognition, we will use features to hypothesize objects using parallel distributed approach, and then use models of objects to find objects that are present in a scene. Symbolic surface descriptors represent global features of an object and do not change when the object is partially occluded, while local features (such as corners or edges) may disappear en-tirely. We have developed a technique to segment surfaces and compute their polynomial surface descriptors. In this paper we present results of our study to determine which different types of surface descriptors (such as cylindrical, spherical, elliptical, hyperbolic, etc) can be reliably recovered from biquadratic equation models of various surfaces.
A new stereo matching algorithm is developed and applied to natural scenes. The algorithm is based on a recent approach to matching multiple, multidimensional signals that have been deformed with respect to one another.' The goal is to optimally recover the deformation map, which in this case represents the stereoscopic disparity between the left and right images. The problem is formulated as the minimization of an energy measure that combines a similarity functional with a controlled-continuity constraint. Applying the continuation method, this nonlinear, nonconvex minimization problem is solved by a deterministic dynamic system governed by a set of coupled, first-order differential equations. The system finds an optimal approximation at a coarse scale, then tracks it continuously to a fine scale, thus avoiding bad local minima. The stereo algorithm succinctly unifies the notions of matching as constrained optimization, of coarse-to-fine search, and of variational surface reconstruction.
This paper presents a method to extract and represent significant physical properties of a surface, using curvature properties of this surface. We compute curvature in 4 different directions and detect extrema and zero-crossings for each of these one dimensional curves. Since this computation is very noise sensitive, we filter these features using a scale-space tracking approach: we smooth the image with Gaussian masks of increasing variance, detecting features at the smoothest level and localizing them at the original one. We then partially group these features into junctions which correspond to significant physical properties, such as depth discontinuities, surface discontinuities, smooth extrema, and link them into curves. We believe that these descriptions capture most of the information present in the original image, but are more suited to further processing, such as matching with a model. We then illustrate this technique with several examples.
The geometric constraints available in the structure of rigid objects can easily be exploited by matching algorithms for both object recog-nition and localization. Two matching paradigms; --pose clustering, hypothesize and test are discussed and compared with respect to accuracy, computational cost, and operation in noisy and multiple object environments. Each paradigm offers certain relative advantages and implies a certain computer architecture. All algorithms in each category ultimately depend on adequate detection of primitive features and may encounter large increases in computation time in going from a single object to a multiple object environment.
This paper focuses on extraction of the parameters of individual surfaces from noisy depth maps. The basis for this are least-square error polynomial approximations to the range data and the curvature properties that can be computed from these approximations. The curvature properties are derived using the invariants of the Weingarten Map evaluated at the origin of local coordinate systems centered at the range points. The Weingarten Map is a well-known concept in differential geometry; a brief treatment of the differential geometry pertinent to surface curvature is given. We use the curvature properties for extracting certain surface parameters from the curvature properties of the approximations. Then we show that curvature properties alone are not enough to obtain all the parameters of the surfaces; higher order properties (information about change of curvature) are needed to obtain full parametric descriptions. This surface parameter estimation problem arises in the design of a vision system to recognize 3D objects whose surfaces are composed of planar patches and patches of quadrics of revolution. (Quadrics of revolution are quadrics that are surfaces of revolution.) A significant portion of man-made objects can be modeled using these surfaces. The actual process of recognition and parameter extraction is framed as a set of stacked parameter space transforms. The transforms are "stacked" in the sense that any one transform computes only a partial geometric description that forms the input to the next transform. Those who are interested in the organization and control of the recognition and parameter recognition process are referred to [Sabbah86], this paper briefly touches upon the organization, but concentrates mainly on geometrical aspects of the parameter extraction.
Defects in glassware known as "stuck glass" are caused by flying glass particles falling into red hot bottles during their manufacture. We describe the design, construction, and results of a prototype electro-optical inspection machine which detects sub-millimeter defects in glass bottles. This system uses pattern processing and spatial filtering in the Fourier domain to optically subtract the signature of the nondefec-tive part of the bottle, from the total image of the bottle. The huge variation in acceptable glass bottles, even within a particular sized bottle, requires a unique spatial filter for each bottle being inspected. We use a two-dimensional spatial light modulator to create such an adaptive spatial filter which allows us to compensate for these variations in real-time.
An achromatic optical correlator using spatially multiplexed achromatic matched spatial filters (MSFs) for white light optical pattern recognition is presented. The MSF array is synthesized using a monochromatic laser and its achromaticity is achieved by adjusting the scale and spatial carrier frequency of each MSF to accommodate the wavelength variations in white light correlation detections. System analysis and several experimental results showing the correlation peak intensity using white-light illumination are presented.
The color characteristics of a microchannel spatial light modulator (MSLM) under white-light read out illumination is examined, and a system for performing color optical processing operations using an electro-optic crystal plate along with the MSLM is presented. Experimental results for edge enhancement and image subtraction operations are demonstrated.
Optical correlators operating with spatially coherent white light are simulated on a computer. There are two sources of error in the correlators studied: 1) diffraction spot size changes and 2) transverse dispersion for diffraction grating encoded images and filters. The quality of the output correlation signal is analyzed in terms of peak intensity maximum, peak full-width-half-maximum (FWHM) size and peak-related energy efficiencies. Classical Matched Filter (CMF) operation of the white-light correlator is compared with that of a monochromatic correlator. A 4-filter dispersed white-light processor is required to produce "correlation" peaks with width approaching those of the monochromatic CMF correlator. Possible compensation of the diffractive Vander Lugt white-light dispersion blur in output plane is required for narrow peak operation of single-filter correlators. The corrected on-axis white-light CMF correlator had operating parameters very similar to those of the monochromatic correlator for discrimination between the 25 x 35 pixel letters 0 and G. The 0 and G differ by shifting of one 5 x 5 block of pixels. A phase-only filter dispersed processor simulation shows a dramatic improvement in operating parameters.
A spatial light modulator (SLM) operating as an amplitude modulating input element in a coherent optical correlator introduces a certain amount of phase shift at each pixel. We examine the effect this has on the overall quality of the correlation signal, defined in terms of SNR and correlation peak full-width-half-maxi-mum (FWHM) response. An interesting and unexpected effect is found, namely that if the attendant phase distortion is properly incorpored into the reference function, it can greatly enhance the correlation signal.
We study the relations among various linear mapping-based algorithms by formulating a more general unified pseudo-inverse algorithm. We show that the least-square linear mapping technique, the simplified least-square linear mapping technique, the synthetic discriminant function, the equal correlation peak method and the Caulfield-Maloney filter are in fact all special cases of the unified pseudo-inverse algorithm. When the total number of the training images (KM, where K is the number of classes and M is the number of training images in each class) is larger than the dimension of the images (N), the overdetermined case of the unified pseudo-inverse algorithm is the same as the least-square linear mapping technique, due to the fact that both algorithms are based on optimization processes of minimization of the least square error. When KM < N, the underdetermined case of the unified pseudo-inverse algorithm is the same as the least-square linear mapping technique and the synthetic discriminant function. Furthermore, when KM < N, the synthetic discriminant function method can be considered as the degenerated case of the least-square linear mapping technique. Experimental results on classification using the linear mapping-based algorithms are provided and show good agreement with the theoretical analysis.
The throughput of data-structure manipulation operations presently limits the applicability of relational database machines. Since most relational algebra operations can be treated as modifications of sorting algorithms, special-purpose hardware based on fast sorting algorithms should increase the performance of these machines. Parallel sorting algorithms representable as self-routing, multistage networks are ideal for optical implementation because they require global interconnects and simple parallel-processing units. The processing units perform a local operation called compare-and-exchange (C&E). Our goal is to realize fast optical sorting networks. Therefore, we describe C&E implementations in analog optics, and digital optics with all-optical, hybrid optoelectronic and polarization logic. Furthermore, we delineate application domains of the networks based on system and technology characteristics.
In this paper we examine the properties of high order neuron-like adaptive learning units whose output is invariant under an arbitrary finite group of transformations on the input space. The transformation invariance is imposed by averaging the input of each unit over a transformation group, thus eliminating the capacity of the units to detect features which are incompatible with the imposed group invariance. This averaging process also generates equivalence classes of interactions among the units, and thus allows a collapse of the interaction weight matrix, reducing the number of high order terms. As an example, we discuss the implementation of two types of translation invariance.
The use of optical computing techniques in an intelligent sensor are considered. A first order model provides the conceptual context of discussing the issues, architectures, and realizations of an artificially intelligent sensor. The major modules are defined, discussed briefly, and areas of further research identified.
A method of producing gradient-like images of phase objects is described. Photographs of various phase objects obtained by this method are presented and the relationship between the phase objects and their images is described. Finally, we show that under certain conditions the intensity of the output field at the focal point of a lens is simply related to the autocorrelation function associated with the phase object's optical thickness variations.
The response time of MSLM is discussed. Less than 3 Hz operation was achieved when a conventional MCP is used. 30 Hz framing rate is demanded for advanced application. The framing rate is determined by the charge density which can be supplied from MCP (microchannel plate) continuously. By adopting 275 μA strip current MCP, 30 Hz framing rate ( 16.5 msec writing time and 16.5 msec erasing time) can be achieved.
Switching speeds of photoaddressed liquid crystal spatial light modulators are currently limited to several milliseconds. This is due in part to the choice of liquid crystal (nematic), and in part to the choice of photoaddressing schemes. In this paper we describe two methods for making photoaddressed liquid crystal spatial light modulators with microsecond response times.
Statistically based models of digital images are used to locate and segment objects of interest from background scenes. Three models are presented and evaluated. These models are based on a Bayesian cost function, a Neyman-Pearson constant false alarm rate function, and a maximum entropy function. Detailed algorithms are presented for separating object regions from background clutter using each of these statistical methods.
Perceptual grouping is an important mechanism of early visual process-ing. This paper presents a computational approach to perceptual grouping ir dot patterns. Detection of perceptual organization is done in two steps. The first step, called the lowest level grouping, extracts the perceptual segments of dots that group together because of their relative locations. The grouping is accomplished by interpreting dots as belonging to interior or border of a perceptual segment, or being along a perceived curve, or being isolated. The Voronoi neighborhood of a dot is used to represent its local geometric environment. The grouping is seeded by assigning to dots their locally evident perceptual roles and iteratively modifying the initial estimates to enforce global Gestalt constraints. This is done through independent modules that pos-sess narrow expertise for recognition of typical interior dots, border dots, curve dots and isolated dots, from the properties of the Voronoi neighborhoods. The results of the modules are allowed to influence and change each other so as to result in perceptual components that satisfy global, Gestalt criteria such as border or curve smoothness and component compactness. Such lateral communication among the modules makes feasible a perceptual interpretation of the local structure in a manner that best meets the global expectations. Thus, an integration is performed of multiple constraints, active at different perceptual levels and having different scopes in the dot pattern, to infer the lowest level perceptual structure. The result of the lowest level grouping phase is the partitioning of a dot pattern into different perceptual segments or tokens. Unlike dots, these segments possess size and shape properties in addition to locations. The second step further groups the lowest level tokens to identify any hierarchical structure present. The grouping among tokens is again done based on a variety of constraints including their proximity, orientations, sizes, and terminations, integrated so as to mimic the perceptual roles of these criteria. The result of the grouping of lowest level tokens is even larger tokens. The hierarchical grouping process repeats until no new groupings are formed. The final result of the implementation described here is a hierarchical representation of the perceptual structure in a dot pattern. Our representation of perceptual structure allows for "focus of attention" through the presence of multiple levels, and for "rivalry" of groupings at a given level through the probabilistic interpretation of groupings present.
A knowledge-based system for interpreting aerial photographs, Picture Query (PQ), first segments an image into primitive, homogeneous regions, then searches among combinations of these to find instances which satisfy definitions of object types. If primary evidence is insufficient, there may be an hypothesis-based search for the supporting evidence of related objects. This secondary search is restricted to windows by expected spatial relations. First instances are improved by searching for overlapping variants having better goodness-of-figure. The process may be repeated using re-estimated parameters of object definitions based on instances found previously. Results are reported for images of suburban neighborhoods, including roads, houses, and their shadows.
Detection of objects is an important task of computer vision systems. In this paper we present the development of an object detection system to be useful while analyzing multispectral images. In this formulation general knowledge about spectral characteristics of the objects to be detected is utilized in the search for their location in an image. Efficiency of the system is derived by using a hierarchical framework with pyramid data structure to store multiresolution, multispectral copies of an image. At every level of processing a fuzzy cluster analysis algorithm is utilized to uncover the membership of individual picture elements. These membership values are used with the general knoweldge of spectral properties of objects to guide the search for their locations. The methodology is tested using several experiments involving multispectral satellite and aerial images. The system is shown to be successful in efficient detection of objects such as rivers, roads, and various types of buildings.
An evidence-based recognition technique is defined which identifies 3D objects by looking for notable features of objects. 3D surface patch information is used to derive a representation of objects from their range images. A list of salient or evidence features with corresponding degrees of evidence of the various objects in the database forms the core of an object recognition system. Occurrences of these salient features are interpreted as evidence for or against the hypothesis that a given object occurs in the scene. A measure of similarity between the set of observed features and the set of salient features for a given object in the database is used to determine the identity of an object in the scene or reject the object(s) in the scene as unknown. This procedure has polynomial time complexity and correctly identifies a variety of objects in both synthetic and real range images.
Because image processing is numerically intensive, there has been much interest in parallel processing for image analysis applications. While much of low-level vision can be attacked by SIMD mesh-connected architectures, intermediate and high-level vision applications might be able to make effective use of MIMD and distributed architectures. We have taken a standard parallel connected components algorithm, and applied it to image segmentation using an MIMD architecture. The resulting version of the Shiloach/Vishkin algorithm runs on the prototype NYU Ultracomputer. We will describe the implementation and the results of some experiments. We take note of the lesson learned from this implementation: that processor power should be focused dynamically to those portions of the image requiring greatest attention. We then consider the implications of this lesson to other image processing tasks.
A useful space telerobot for on-orbit assembly, maintenance, and repair tasks must have a sensing and perception subsystem which can provide the locations, orientations, and velocities of all relevant objects in the work environment. This function must be accomplished with sufficient speed and accuracy to permit effective grappling and manipulation. Appropriate symbolic names must be attached to each object for use by higher-level planning algorithms. Sensor data and inferences must be presented to the remote human operator in a way that is both comprehensible in ensuring safe autonomous operation and useful for direct teleoperation. Research at JPL toward these objectives is described.
Comprehensive simulation results using new data bases on automatic target recognition are presented. New organized methods for selection of filter synthesis parameters are proposed. Guidelines are provided for proper selection of training set images, the shift required for filter synthesis, and the choice of true and false class labels. The use of multiple correlation planes, symbolic filters and the effects of shading are discussed, and quantitative data are presented.
It has been shown that phase-only matched filtering can provide 100% optical efficiency in an optical correlator while at the same time producing a very well defined correlation spot. This analysis has been carried forward and applied to the case of a broad spectral band or white-light optical correlator. The optical correlator under white-light illumination operates at a multiplicity of wavelengths which favorably affect the noise performance of the system. Computer simulations have been used to study the noise suppression qualities of the correlator when noise is present at the filter plane. The result indicates that the noise level at the correlation plane can be reduced by utilizing a higher number of spectral band filters. It is shown that the output signal-to-noise ratio (SNR) improves approximately as the square root of N where N is the number of spectral bands employed in the phase-only matched filter. The coherence requirements for phase-only matched filtering under broadband illumination is studied. The effect of partially coherent illumination on the correlation bandwidth is determined.
Computer generated Fourier transform matched filters have been constructed using e-beam lithography. These filters were then evaluated by placing them in a VanderLugt optical correlator and addressing them with the original scene used to make the filters. Results show that the spatial frequency content of a filter may be tailored so as to obtain good signal to noise correlations while maintaining high diffraction efficiency.
In this paper a compound rotation-invariant filtering technique is developed which leads to sharp cross-correlation peaks for desired targets, while suppressing random noise and unwanted target correlations. The basic compound filter structure consists of a weighted sum of the cross-correlations between several circular harmonic filters (CHF's) and the input image. In the design procedure, the weights are constrained so that the compound filter output has a peak value of 1, for the desired target. Measures of random noise, and deterministic noise due to correlations with unwanted targets, are minimized with respect to the weights. The signal-to-noise ratio (SNR) of the compound filter is therefore maximized. Some limits to the performance of the technique are discussed. The output magnitude squared cross-correlation functions form a non-orthogonal set of functions. Therefore, the SNR could be made arbitrarily large if the functions were linearly independent and a very large number of CHF's were available. Examples illustrating the filter performance are given. Extensions of this work are discussed.
Initial results of experiments with a new photopolymer developed by Polaroid have been very encouraging. The photopolymer, DMP-128, has exhibited diffraction efficiencies of greater than 70% for plane wave holograms. The results of an in-depth study into the properties of DMP-128 are presented here. These properties include diffraction efficiency as a function of reference beam angle, exposure energy, and incident angle of the reconstruction beam.