A new and efficient real time technique to produce a string code description of the contour of an object, such as an (angle, length) (0, s) feature space for the arcs describing the contour, is detailed. We demonstrate the use of such a description for an aircraft identification problem case study. Our (0, s) feature space is modified to include a length string code and a convexity string code. This feature space allows both global and local feature extraction. The local feature extraction follows human techniques and is thus quite suitable for a rule-based processor (as we discuss and demonstrate). Aircraft have generic parts and thus are quite suitable for the model-based description.
A fully-engineered real-time (15 object per second) optical Fourier transform feature space processor for product inspection is described. This unit is presently undergoing evaluation at several sites. This paper discusses the feature space techniques employed, the advantages of the Fourier transform reduced-dimensionality feature space used, and several of its properties. Emphasis is given to initial performance data obtained in many diverse applications.
Proc. SPIE 0848, The Orthonormal Fourier-Mellin Transform For Precision Scale Detection And Range Data Acquisition, 0000 (19 February 1988); doi: 10.1117/12.942716
The Mellin transform converts reference frame scale information into phase information. This property makes the Mellin transform appropriate for determining the scale difference between two signals and for determining features which are invariant to scale change : partial shape recognition, recognition of objects at various distances, speech or brain wave processing, range mapping, processing of doppler shifted signals. A recent study [1] reveals a generalized Mellin transform which, when indexed properly, produces an orthonormal version which is superior to the Mellin transform typically seen in the literature. The orthonormal Mellin transform, the Fourier-Mellin transform and their properties are discussed. The performance of scale detection of digitized curves and its application to monocular range acquisition in a robot vision system are investigated. Key words: Mellin transform, Fourier-Mellin transform, Discrete Mellin Transform, scale detection, range acquisition
Proc. SPIE 0848, Object Identification And Orientation Estimation From Contours Based On An Affine Invariant Curvature, 0000 (19 February 1988); doi: 10.1117/12.942717
New methods of image matching and of calculation of the affine transformation relating two images of an object in different orientations are developed. These methods are applicable to contours extracted from images of planar-patch objects. The approach is based on a new definition of a curvature function (previously developed by the authors) which is invariant to all image distortions representable by affine transformations. The algorithm is implemented and tested using camera-acquired images of actual objects, and is seen to exhibit considerable robustness to real-world distortions in the imaging process, as well as to deviations from planarity of the objects.
Proc. SPIE 0848, Recognizing And Locating Objects Using Partial Object Description Generated By Feature Extraction By Demands, 0000 (19 February 1988); doi: 10.1117/12.942718
A method called Feature Extraction by Demands(FED) has been developed to generate object descriptions. Objects are described by surface adjacency graphs containing the surface class and the surface equation at each node. Due to occlusion and the use of 21D range images the generated object description is frequently partial. This paper describes a new method to generate object hypotheses and to recognize and locate viewed objects using partial descriptions of objects bounded by quadric surfaces. The method proceeds in two phases. In phase one, the object location is estimated from matched surface pairs(between an object description and an object model). Depending upon the surface type, each surface may provide partial or complete location information. As long as the location information calculated from matched surface pairs is consistent, that is passes matching feasibility tests, the object model is a candidate for the viewed object. The consistent partial location information is combined into a more complete object location estimation sequentially. The order of location information to be used in object location estimation is decided by whether a more complete object location can be calculated. If a complete location can be calculated or the object location estimation cannot be further refined the hypothesis is verified by phase two. In phase two, each remaining surface which was not used for object location estimation is searched for a matched model surface and the neighboring relations between surfaces are verified. If the hypothesis passes phase two the model is accepted as a matched model. If a complete object location can be calculated from the accepted hypothesis an optimal object location is calculated.
In this paper we discuss the differences and similarities of morphological skeleton and skeletons generated by other methods; shape representation by morphological skeleton function; and the concept of minimal skeleton. We then propose a fast shape recognition scheme based on morphological set operations and skeleton function. This scheme measures the goodness-of-fit of prototype skeletons to the observed objects via morphological erosion, exploiting prototype skeletons as structuring elements. Since morphological set operations can be implemented inparallel, the recognition process can be executed at a high speed. Further, the radius information in the morphological skeleton allows a coarse-to-fine version of recognition strategy that discovers poor matches relatively early in the process.
Synthetic Discriminant Functions (SDF's) constitute an approach to distortion-invariant pattern recognition when sufficiently descriptive training images are available. Traditionally, the SDF's have been designed in the image space even though their eventual implementation in optical processors requires the fabrication of Computer Generated Holograms (CGH's) that can be placed in the frequency plane of optical correlators. With this consideration, we formulate the SDF problem in the frequency domain and characterize the set containing all the solutions. This conversion of the SDF problem from space domain to frequency domain requires that we define a "pseudo-DFT" operation. Relevant properties of this new operation are proved. This formal mathematical characterization of the frequency-domain SDF solutions allows us to select solutions with attractive features such as having unit magnitude (phase only) or only two amplitude levels (suitable for ON-OFF devices).
The concept of an intelligent robot is an important topic combining sensors, manipulators, and artificial intelligence to design a useful machine. Vision systems, tactile sensors, proximity switches and other sensors provide the elements necessary for simple game playing as well as industrial applications. These sensors permit adaption to a changing environment. The AI techniques permit advanced forms of decision making, adaptive responses, and learning while the manipulator provides the ability to perform various tasks. Computer languages such as LISP and OPS5, have been utilized to achieve expert systems approaches in solving real world problems. The purpose of this paper is to describe several examples of visually guided intelligent robots including both stationary and mobile robots. Demonstrations will be presented of a system for constructing and solving a popular peg game, a robot lawn mower, and a box stacking robot. The experience gained from these and other systems provide insight into what may be realistically expected from the next generation of intelligent machines.
There are potential industrial applications for any methodology which inherently reduces processing time and cost and yet produces results sufficiently close to the result of full processing. It is for this reason that a morphological sampling theorem is important. The morphological sampling theorem described in this paper states: (1) how a digital image must be morphologically filtered before sampling in order to preserve the relevant information after sampling; (2) to what precision an appropriately morphologically filtered image can be reconstructed after sampling; and (3) the relationship between morphologically operating before sampling and the more computationally efficient scheme of morphologically operating on the sampled image with a sampled structuring element. The digital sampling theorem is developed first for the case of binary morphology and then it is extended to gray scale morphology through the use of the umbra homomorphism theorems.
In this paper we present new methods for computer-based symmetry identification that combine elements of group theory and pattern recognition. Detection of symmetry has diverse applications including: the reduction of image data to a manageable subset with minimal information loss, the interpretation of sensor data,1 such as the x-ray diffraction patterns which sparked the recent discovery of a new "quasicrystal" phase of solid matter,2 and music analysis and composition.3,4,5 Our algorithms are expressed as parallel operations on the data using the matrix representation and manipulation features of the APL programming language. We demonstrate the operation of programs that characterize symmetric and nearly-symmetric patterns by determining the degree of invariance with respect to candidate symmetry transformations. The results are completely general; they may be applied to pattern data of arbitrary dimension and from any source.
A parameter transform produces a density function on a parameter space. Ideally each instance of a parametric shape in the input would contribute to the density with a delta function. Due to noise these delta functions will be broadened. However, depending on the location and orientation of the parametric shapes in the input, differently shaped peaks will result. The reason for this is twofold: (1) In general a parameter transform is a nonlinear operation; (2) A parameter transform may also be a function of the location of the parametric shape in the input. We present a general framework that deals with both the above mentioned problems. By weighing the response of the transform by the determinant of a matrix, we obtain a more homogeneous response. This response preserves heights instead of volumes in the parameter space. We briefly touch upon the usefulness of these techniques for organizing the behavior of connectionist networks. Illustrative examples of parameter transform responses are given.
We present an algorithm that uses the zero-crossing information obtained from multiple resolution Laplacian of a 2-D Gaussian (VG) filtering to estimate the location, orientation, width (blur) and shape of the intensity changes in an image. Based on a ramp edge model of image edges, the algorithm uses the response zero-crossing slope to determine the width of a possible intensity change. It describes the intensity change in the region about the zero-crossing from the derivatives of the Gaussian smoothed image at the filter scale corresponding to the width.
Corner detection is often an important part of feature extraction and pattern recognition. For a given contour image, different sets of corners can be extracted depending on the scale adopted to examine the object. Existing algorithms do not emphasize the adjustability of the detection and the effect of changing their parameters is hard to predict. In this paper, we propose an algorithm which is controlled by a single parameter for corner detection. The tangent direction along the contour is evaluated based on the Poisson function weighted average of the directions connecting the given point to its neighbours within a range specified by the parameter. And the change in the tangent direction is then smoothed and compared within the range to find the corners. Based on our scheme, the number of corners decreases monotonically as the parameter value increases. The scaling effect of this simple parameter is easily predictable and similar to human visual perception. Some experimental results are shown in this article.
Proc. SPIE 0848, An Experimental System For The Integration Of Information From Stereo And Multiple Shape From Texture Algorithms, 0000 (19 February 1988); doi: 10.1117/12.942727
In numerous computer vision applications, there is both the need and the ability to access multiple types of information about the three dimensional aspects of objects or surfaces. When this information comes from different sources the combination becomes non-trivial. This paper describes the present state of ongoing research in Columbia's Vision Laboratory in the integration of multiple visual sensing methodologies which yield three dimensional information, in particular, feature based stereo algorithms, and various shape-from-texture algorithms are already in operation and multi-view shape-from-texture and shape-from shading modules are expected to be incorporated. Unlike most systems for multi-sensor integration, which fuse all the information at one conceptual level, e.g., the surface level, the system under development uses two levels of data fusion, intra-process integration and inter-process integration. The paper discusses intra-process integration techniques for feature-based stereo and shape-from-texture algorithms. It also discusses a inter-process integration technique based on smooth models of surfaces. Examples are presented using camera acquired images.
Texture, or the arrangement of surface markings, is an important cue that can be used to identify objects in an image. More often than not, object recognition requires estimating the surface orientation of the constituent surfaces. If texture is used to recover the surface orientation, then separating the surfaces to form objects will require discriminating the textured surfaces when the markings have undergone an oblique projection. How-ever, many of the most widely used methods for discriminating textures are not applicable for discriminating textures distorted by oblique projection since they are all based on measurement of distances and angles. Prior work has focused on using the cross ratio of distances between four collinear points chosen appropriately. The results of the experiments with real textures indicate that although the cross ratio performed well, using other projective invariants should be investigated. Two ratios of distances between three points that are invariant under orthographic projection are considered. The two invariants are described first, followed by the results of using these invariants to discriminate natural textures.
In this paper, methods for supervised classification and unsupervised segmentation of textured images are presented. A class of two-dimensional, stochastic, non-causal, linear models known as Simultaneous Autoregressive (SAR) random field models is used to characterize texture in a local neighborhood N. The maximum likelihood esti-mates of the model parameters denoted by fN, are selected as textural features. An efficient method for selection of a N (i.e. order of the model) which produces powerful features is presented. It relies on visual examination and comparison of images synthesized using fN. A 08% correct classification rate is obtained in supervised experiments involving nine different types of natural textures and utiliz-ing features selected by this technique. These features are also used for unsupervised texture segmentation, i.e. divid-ing an image into regions of similar texture when no apriori knowledge about the types and number of textures in the underlying image is available. Textural edges (borders between differently textured regions) are located where sud-den changes in local textural features happen. The image is scanned by a small size window and SAR features are extracted from the region encompassed by each window. Abrupt changes in the features of neighboring windows are detected and mapped back to the spatial domain to yield the sought after textural edges. A method for automatic selection of the size of the scanning window is presented. Instead of one window, two windows whose sizes differ by a few pixels are utilized and the common resulting edges are used. Parallel implementation of the segmentation algo-rithm is discussed. The goodness of the technique is demonstrated through experimental studies.
The problem of locating an object in noisy optical sensor data has arisen in many applications. The object location normally coincides with the center of a two-dimensional blur, when there is no noise. Thus one intuitively appealing estimate of the object location is the centroid of the area in which intensities of pixels exceed a certain threshold. Before the centroid is computed, the intensity data have usually undergone a series of signal processing steps. Errors are introduced in signal processing through sampling, non-uniform responses among scanning detectors, read-out noise, and quantization. In this paper, we evaluate the accuracy of using the centroid to estimate the object position. We report simulated accuracy as a function of design parameters such as sampling rate, noise variance, quantizer resolution, and signal-to-noise ratio. We also report the derivation of probability density function of the centroid assuming additive Gaussian white noise.
In this paper, a model based upon the automaton theory and used for pattern recognition is introduced. The main purpose of this model is to evaluate the performance of a pattern recognition computer and an algorithm for pattern recognition. This model abstracts main frame from new generation pattern recognition computers and is defined as follows: M=(P1, W, B, P, P1' ,S), where p1 is a set of input patterns; W is a set of strings representing a program or explanations for processing; B is a knowledge base storing rules and feature functions of puzzy sets; P represents computer systems for pattern recognition; P1'is a set of output patterns; S is the processing results. By using this model, the performance of a pattern recognition computer and an algorithm for pattern recognition can be easily evaluated.
In this paper, the usefulness of applying the complex as well as ordinary moment features for similar object recognition using a tactile array sensor has been explored. Some complex moment invariants have been derived and implementation of those features has been conducted. With those moment invariants, we can eliminate the effect of lateral displacement and rotation from the tactile images. Through the generation of a decision tree and the utilization of the complex moment features, the shape of the similar objects from the tactile array sensor can be identified.
A smart rotational tactile sensor was designed and developed for use with a 2-axis robot gripper. The structure and performance of the tactile sensor/gripper is described. The tactile sensor forms one pad of the gripper and is free to rotate by virtue of using an optical technique for recording the tactile images. This arrangement permits monitoring of the workpiece position, orientation and possibly slippage when the workpiece is being rotated by the gripper pads. The application of a smart photodiode array and the method of SHADOWing to the processing of the tactile images is also considered.
A general approach to the calibration of sensor systems is presented. Calibration is defined as establishing a transformation from a set of sensor data in sensor coordinates, to task data in a different coordinate system; this task data may be used to specify or correct robot motion, or measure part features. A sensor system is defined as any collection of individual sensing elements. Typical elements include machine vision cameras, touch sensors, and proximity sensors. It is shown that a collection of many sensors into one system can increase sensing capability and accuracy. Calibration of such a system is generally more complex than that of a single sensing element. A graphical language, using dataflow diagrams, is used to define the calibration process. Using this language, a calibration diagram is constructed. This diagram identifies individual elements, clearly shows the data flow, and can be used to analyze the statistical properties of the calibration. Diagrams of standard constituent elements are presented, including camera and robot arm. This calibration methodology is derived from the experience of hundreds of working factory installations. Several examples of actual applications are presented. Different calibration methods for each are presented and compared. Examples are given to represent robot guidance using fixed cameras, and part location using proximity sensors.
Robotic assembly cell is a hierarchically designed artifact for certain automatic assembly. Diagnosis of a robotic assembly cell requires failure recognition, cell representation, and reasoning. This paper describes robot cell representation and model-based causal reasoning for diagnosis. The robotic assembly cell is characterized by the assembly operations and the physical environments. The assembly operations are modeled as asynchronous parallel processes and the physical environments are modeled as functional device units. A layered causal network is constructed to represent the causal relations of the robotic assembly cell and the model-based causal reasoning is performed for cell diagnosis with the aids of hierarchical reasoners. This approach can be applied easily to an existing robot cell and is not limited to any cell design architecture.
A model-based optical processor is introduced for the acquisition and tracking of a satellite in close proximity to an imaging sensor of a space robot. The type of satellite is known in advance, and a model of the satellite (which exists from its design) is used in this task. The model base is used to generate multiple smart filters of the various parts of the satellite, which are used in a symbolic multi-filter optical correlator. The output from the correlator is then treated as a symbolic description of the object, which is operated upon by an optical inference processor to determine the position and orientation of the satellite and to track it as a function of time. The knowledge and model base also serves to generate the rules used by the inference machine. The inference machine allows for feedback to optical correlators or feature extractors to locate the individual parts of the satellite and their orientations.
Integrating the modules of early vision, such as color, motion, texture, and stereo, is necessary to make a machine see. Parallel machines offer an Opportunity to realize existing modules in a near real-time system; this makes system, hence integration, issues crucial. Effective use of parallel machines requires analysis of control and communication patterns among modules. Integration combines the products of early vision modules into intermediate level structures to generate semantically meaningful aggregates. Successful integration requires identifying critical linkages among modules and between stages. "I he Connection Machinet is a fine-grained parallel machine, on which many early and middle vision algorit hms have been implemented. Schemes for integrating vision modules on fine-grained macliines are described. These techniques elucidate the critical information t hat must be communicated in early and middle vision to create a robust, integrated system.
This paper reports on a model-based object recognition system and its parallel implementation on the Connection Machine' System. The goal is to be able to recognize a large number of partially occluded, two-dimensional objects in scenes of moderate complexity. In contrast to traditional approaches, the system described here uses a parallel hypothesize and test method that avoids serial search. The basis for hypothesis generation is provided by local boundary features (such as corners formed by intersecting line segments) that constrain an object's position and orientation. Once generated, hypothetical instances of models are either accepted or rejected by a verification process that computes each instance's overall confidence. Even on a massively parallel computer, however, the potential for combinatorial explosion of hypotheses is still of major concern when the number of objects and models becomes large. We control this explosion by accumulating weak evidence in the form of votes in position and orientation space cast by each hypothesis. The density of votes in parameter space is expected to be proportional to the degree to which hypotheses receive support from different local features. Thus, it becomes possible to rank hypotheses prior to verification and test more likely hypotheses first.
A control strategy for 2-D object recognition has been implemented on a hardware configuration which includes a Symbolics Lisp Machine (TM) as a front-end processor to a 16,384 processor Connection Machine (TM). The goal of this ongoing research program is to develop an image analysis system as an aid to human image interpretation experts. Our efforts have concentrated on 2-D object recognition in aerial imagery specifically, the detection and identification of aircraft near the Danbury, CT airport. Image processing functions to label and extract image features are implemented on the Connection Machine for robust computation. A model matching function was also designed and implemented on the CM for object recognition. In this paper we report on the integration of these algorithms on the CM, with a hierarchical control strategy to focus and guide the object recognition task to particular objects and regions of interest in imagery. It will be shown that these tech-nigues may be used to manipulate imagery on the order of 2k x 2k pixels in near-real-time.
Proc. SPIE 0848, Rapid Recognition Out Of A Large Model Base Using Prediction Hierarchies And Machine Parallelism, 0000 (19 February 1988); doi: 10.1117/12.942740
An object recognition system is presented to handle the computational complexity posed by a large model base, an unconstrained viewpoint, and the structural complexity and detail inherent in the projection of an object. The design is based on two ideas. The first is to compute descriptions of what the objects should look like in the im-age, called predictions, before the recognition task begins. This reduces actual recognition to a 2D matching process, speeding up recognition time for 3D objects. The second is to represent all the predictions by a single, combined IS-A and PART-OF hierarchy called a prediction hierarchy. The nodes in this hierarchy are partial descriptions that are common to views and hence constitute shared processing subgoals during matching. The recognition time and storage demands of large model bases and complex models are substantially reduced by subgoal sharing: projections with similarities explicitly share the recognition and representation of their common aspects. A prototype system for the automatic compilation of a prediction hierarchy from a 3D model base is demonstrated using a set of polyhedral objects and projections from an unconstrained range of viewpoints. In addition, the adaptation of prediction hierarchies for use on the UMass Image Understanding Architecture is considered. Object recognition using prediction hierar-chies can naturally exploit the hierarchical parallelism of this machine.
We examine several image segmentation methods that are well-suited for implementation on SIMD computers. The pyramid segmentation algorithm of Burt, Hong, and Rosenfeld [1,2,3] was implemented two different1 ways on the Connection Machine System. Timing results and comparisons of the methods are presented. Another algorithm which makes better use of the data parallelism available on the CM is discussed. This algorithm was implemented on an ordinary serial machine and its performance is compared with the pyramid algorithm of Cibulskis and Dyer [4] and with optimum thresholding.
An algorithm to perform automatic target detection has been implemented on the 16K processor Connection Machine at the Perkin-Elmer Advanced Development Center in Oakton, VA. The algorithm accepts as input a single black and white image together with the designation of a few training points from each of two categories termed interesting and uninteresting or target and background. Typically, the input image is an aerial view of vehicles on the ground with 64K pixels. The algorithm computes a five element feature vector at each pixel, and performs two-category classification at the first stage. The features employed are gray level, constant false alarm rate (CFAR) annulus sum, local average, Sobel edge operator, and the MAX-MIN texture measure. The classification process uses a Euclidean distance measure in five dimensional feature space. The second stage of processing uses a connected component algorithm to collect the interesting points into blobs. These blobs are then manipulated to eliminate isolated points. In the third and final stage, blob mensuration is performed to rule out blobs that are too large or too small. The algorithm executes on the CM 500 times faster than on a VAX 11/780.
One approach to object recognition is the -matching of two-dimensional contours, which are obtained from the projection of a three-dimensional model, with aggregates of lines extracted from an image. It is necessary to define geometric shape features which aid in the matching and can be used to compute a confidence measure for the match. Some of the standard features include curvature maxima and minima, points of inflection, trihedral vertices, and T-junctions. There has not been much evidence that global transforms such as Fourier series or symmetric axis transform make the solution any easier. What is needed is a hierarchical description which includes smooth curve segments and the types of junctions between them. A geometric grouping process is described which might be able to produce symbolic tokens in an image which could be matched hierarchically with a description from the model.
This paper describes the implementation of a hierarchical multi-resolution algorithm for the computation of dense displacement fields on the Connection Machine'. The algorithm uses the pyramid representation of the images and a coarse-to-fine matching strategy. At each level of processing, a confidence measure is computed for the match of each pixel, and a smoothness con-straint is used to propagate the reliable displacements to their less reliable neighbors. The focus of this implementation is the use of the Connection Machine for pyramid processing and the implementation of the coarse to fine matching strategy. It will be shown that this technique can be used to successfully match pairs of real images in near real-time.
Adaptive image processing schemes can be classified as open-loop, input sensing, invariant-expectation, and model reference systems. Two major adaptive image processing system mechanisms, processing status measurement and parameter adjustment, are described and a multi-resolution approach is developed. The multi-resolution schemes allow efficient adaptive image processing implementation, by enabling coarse-to-fine parameter (operation flow) adjustment in both image and parameter domains. The adaptability and robustness of these techniques is demonstrated on morphologically segmented objects from actual laser radar (range) data.
The development of an autonomous mobile platform vision system that can adapt to a variety of surroundings by modifying its current memory is an ambitious goal. We believe that to achieve such an ambitious goal it is necessary to look at areas that may seem unconventional to some researchers. Such an area is associative memory. For an autonomous robotic vision system to function adaptively it must be able to respond to a wide variety of visual stimuli, sort out what is new or different from previously stored information, and update its memory taking this new information into account. To compound the problem, this procedure should be invariant to the scale of objects within the scene and to some degree rotations as well. With this in mind we can identify two main functions that are desirable in such a visual system: 1) the ability to identify novel items within a scene; and 2) the ability to adaptively update the system memory. The need for these functions has led to the investigation of a class of filters called Novelty Filters. By use of a coordinate transformation it is possible to specify novelty filters that are invariant to scale and rotational changes. Further, it is then possible to postulate an adaptive memory equation which reflects the adaptive novelty filter for a multiple-channel pattern recognition system. This paper, while not all inclusive, is meant to stimulate further interest as well as report preliminary simulation and mathematical results.
Proc. SPIE 0848, Art 2: Self-Organization Of Stable Category Recognition Codes For Analog Input Patterns, 0000 (19 February 1988); doi: 10.1117/12.942747
Adaptive resonance architectures are neural networks that self-organize stable pattern recognition codes in real-time in response to arbitrary sequences of input patterns. This article introduces ART 2, a class of adaptive resonance architectures which rapidly self-organize pattern recognition categories in response to arbitrary sequences of either analog of binary input patterns. In order to cope with arbitrary sequences of analog input patterns, ART 2 architectures embody solutions to a number of design principles, such as the stability-plasticity tradeoff, the search-direct access tradeoff, and the match-reset tradeoff. In these architectures, top-down learned expectation and matching mechanisms are critical in self-stabilizing the code learning process. A parallel search scheme updates itself adaptively as the learning process unfolds, and realizes a form of real-time hypothesis discovery, testing, learning, and recognition. After learning self-stabilizes, the search process is automatically disengaged. Thereafter input patterns directly access their recognition codes without any search. Thus recognition time for familiar inputs does not increase with the complexity of the learned code. A novel input pattern can directly access a category if it shares invariant properties with the set of familiar exemplars of that category. A parameter called the attentional vigilance parameter determines how fine the categories will be. If vigilance increases (decreases) due to environmental feedback, then the system automatically searches for and learns finer (coarser) recognition categories. Gain control parameters enable the architecture to suppress noise up to a prescribed level. The architecture's global design enables it to learn effectively despite the high degree of nonlinearity of such mechanisms.
In this paper we introduce a new "neural" network for pattern recognition based on a gradient system. It is not, however, attempted to model any known behavior of biological neurons. This network stores any number of non-binary patterns (as its limit points) and retrieves them by associative recall. The network does not suffer from erroneous limit points. A realization of the network is given, which have heavily interconnected computing units. Finally two network examples are discussed.
Proc. SPIE 0848, Teaching Artificial Neural Systems To Drive: Manual Training Techniques For Autonomous Systems, 0000 (19 February 1988); doi: 10.1117/12.942749
We have developed a methodology for manually training autonomous control systems based on artificial neural systems (ANS). In applications where the rule set governing an expert's decisions is difficult to formulate, ANS can be used to extract rules by associating the information an expert receives with the actions he takes. Properly constructed networks imitate rules of behavior that permits them to function autonomously when they are trained on the spanning set of possible situations. This training can be provided manually, either under the direct supervision of a system trainer, or indirectly using a background mode where the network assimilates training data as the expert performs his day-to-day tasks. To demonstrate these methods we have trained an ANS network to drive a vehicle through simulated freeway traffic.
This paper discusses pattern recognition using a learning system which can learn an arbitrary function of the input and which has built-in generalization with the characteristic that similar inputs lead to similar outputs even for untrained inputs. The amount of similarity is controlled by a parameter of the program at compile time. Inputs and/or outputs may be vectors. The system is trained in a way similar to other pattern recognition systems using an LMS rule. Patterns in the input space are not separated by hyperplanes in the way they normally are using adaptive linear elements. As a result, linear separability is not the problem it is when using Perceptron or Adaline type elements. In fact, almost any shape category region is possible, and a region need not be simply connected nor convex. An example is given of geometric shape recognition using as features autoregressive model parameters representing the shape boundaries. These features are approximately independent of translation, rotation, and size of the shape. Results in the form of percent correct on test sets are given for eight different combinations of training and test sets derived from two groups of shapes.
Bus automata (BA's) are arrays of automata, each controlling a module of a global interconnection network, an automaton and its module constituting a cell. Connecting modules permits cells to become effectively nearest neighbors even when widely separated. This facilitates parallelism in computation far in excess of that allowed by the "bucket-brigade" communication bottleneck of traditional cellular automata (CA's). Distributed information storage via local automaton states permits complex parallel data processing for rapid pattern recognition, language parsing and other distributed computation at systolic array rates. Global BA architecture can be entirely changed in the time to make one cell state transition. The BA is thus a neural model (cells correspond to neurons) with network plasticity attractive for brain models. Planar (chip) BA's admitting optical input (phototransistors) become powerful retinal models. The distributed input pattern is optically fed directly to distributed local memory, ready for distributed processing, both "retinally" and cooperatively with other BA chips ("brain"). This composite BA can compute control signals for output organs, and sensory inputs other than visual can be utilized similarly. In the BA retina is essentially brain, as in mammals (retina and brain are embryologically the same). The BA can also model opto-motor response (frogs, insects) or sonar response (dolphins, bats), and is proposed as the model of choice for the brains of future intelligent robots and for computer eyes with local parallel image processing capability. Multidimensional formal languages are introduced, corresponding to BA's and patterns the way generative grammars correspond to sequential machines, and applied to fractals and their recognition by BA's.
An identification of the hidden variables of quantum mechanics ( 1) is made. A theory embodying a unitary description of mind and matter is sketched. A. novel interpretation of neural network architecture & function is formulated.
The storage capacity, noise performance, and synthesis of associative memories for image analysis are considered. Associative memory synthesis is shown to be very similar to that of linear discriminant functions used in pattern recognition. These lead to new associative memories and new associative memory synthesis and recollection vector encodings. Heteroassociative memories are emphasized in this paper, rather than autoassociative memories, since heteroassociative memories provide scene analysis decisions, rather than merely enhanced output images. The analysis of heteroassociative memories has been given little attention. Heteroassociative memory performance and storage capacity are shown to be quite different from those of autoassociative memories, with much more dependence on the recollection vectors used and less dependence on M/N. This allows several different and preferable synthesis techniques to be considered for associative memories. These new associative memory synthesis techniques and new techniques to update associative memories are included. We also introduce a new SNR performance measure that is preferable to conventional noise standard deviation ratios.
A method for 3 D image understanding based on the line sequence match is presented in this paper. It consists of four steps : (1) detecting edges by complate matching operators, then thinning them by a trace algorithm and fitting the straight lines based on the Minimum -Squared Error. (2) computing the similarity of line sequences and that of the intensity from the interval between paired edge lines using fuzzy algorithms, so that the matching of lines is optimized. (3) determing corresponding vertices in two images of an object with the pseudo inverse method and the constrain of matched lines. (4) obtaining the 3 D coordinates for vertices by means of geometrical computations.
Many global shape recognition techniques, such as moments and Fourier Descriptors, are used almost exclusively with two-dimensional images. It would be desirable to extend these global shape recognition concepts to three dimensional images. Specifically, the concepts associated with Fourier Descriptors will be extended to both three dimensional object representation and recognition and the representation and recognition of objects which are described by depth data. With Fourier Descriptors, two dimensional shape boundaries are described in terms of a set of complex sinusoidal basis functions. Extending this concept to three dimensions, the surface of a shape will be described in terms of a set of three .dimensional basis functions. The basis functions which will be used are known as spherical harmonics. Spherical harmonics can be used to describe a function on the surface of the unit sphere. In this application, the function on the unit sphere will describe the shape to be represented. The representation presented here is restricted to the class of objects for which each ray from the origin intersects the surface of the object only once. Basic definitions and properties of spherical harmonics will be discussed. A distance measure for shape discrimination will be derived as a function of the spherical harmonic coefficients for two shapes. The question of representation of objects described by depth data will then be addressed. A functional description for the objects will be introduced, along with methods of normalizing the spherical harmonic coefficients for scale, translation, and orientation so that meaningful library comparisons might be possible. Classification results obtained with a set of simple objects will be discussed.
In this paper, we propose a method of recognition using depth map data directly. Particularly, the method is suitable for recognition of objects with irregular shape. A 3-D object is represented by a number of surface patches called subtemplates. The surface patches are extracted directly from the depth maps of the object using a rotational invariant spherical window of constant radius. To facilitate matching, a surface patch is represented by a number of closed contours formed by the intersections of concentric spheres of different radii with the patch. Experimental results are quite good and the method has been used sucessfully for the recognition of partially occluded 3D objects.
Super-quadrics are a volumetric primitive which can model many objects ranging from cubes to spheres to octahedrons to 8-pointed stars and anything in between. They also can be stretched, bent, tapered and combined with boolean to model a wide range of objects. A restricted class of these have been used as the basic primitives of a volumetric modeling system developed ar SRI. At Columbia, we are interested in using superquadrics as model primitives for computer vision applications because they are flexible enough to allow modeling of many objects, yet they can be described by a small (5-14) number of parameters. In this paper, we discuss our research into the recovery of superellipsoids (a restricted class of superquadrics) from 3-D information, in particular range data. We recall the formulation of superellipsoids in terms of their inside-out function, which divides 3 space into regions inside the volume, on the boundary, and outside the volume. Using this function, we employ a nonlinear least square minimization technique to recover the parameters. We discuss both the advantages of this technique, and some of its major drawbacks. Examples are presented, using both synthetic and actual range-data, where the system successfully recovers negative superquadrics, and superquadrics from sparse data including synthetically generated sparse data from multiple viewpoints. While the system was successful in recovering the examples presented, there are some obvious problems. One of these is the relationship between the inside-out function, and the true least-squared distance of the data from recovered model. We discuss this relationship for three different functions based on the inside-out function.
This paper describes an approach to 3D surface reconstruction using orientation map and sparse depth map information. The approach integrates the information provided by two different sources: Stereo Vision and Local Shading Analysis. In our scheme the sparse depth map, obtained by stereo binocular technique, provides an estimate of surface shape that can be refined by local shading information (an orientation map), extracted from one of the stereo pairs intensity images. The integration process consists of two phases. In the first one, the scene is segmented in connected regions by means of the raw needle map. In the second one, the surface interpolation is obtained using information extracted from the segmentation process and the sparse depth map. The result of the integrated approach is a good quality dense depth map. The functionality of the whole approach has been tested on synthetical data. We are, now, analyzing the applicability to real data.
In this paper, we have developed an orientation-independent identification technique from three-dimensional surface maps or range images. Given the range image of an object, it is decomposed into orientation-independent patches using the sign of gaussian curvature. A relational graph is then set up such that a node represents a patch and an edge represents the adjacency of two patches. The identification of the object is achieved by matching its graph representation to a number of model graphs. The matching is performed by employing the best-first search strategy. Examples of real range images show the merit of our technique.
In this paper we present a method to segment a range image into regions which correspond to different object surfaces in the scene. We first obtain an equidistance contour map of the range image from slicing the range image at fixed increment distance values. Pixels along a contour are all at about the same distance from the sensor. We have observed that whenever a contour crosses an object surface edge, we would see direction discontinuity, curvature discontinuity, curvature zero-crossing, or termination of the contour. We call these places the critical points of the contour. We divide a contour into segments at its critical points. Next, we find the two corresponding contour segments on two consecutive slices. Every pair of corresponding contour segments defines a small region in the range image. Thus through registering contour segments in consecutive slices we have partitioned a range image into many small regions. Each region corresponds to a portion of an object surface. The last step is to merge these small regions into larger areas based on whether or not the corresponding scene surface segments of two adjacent regions have similar orientations in the 3-D space. The range image segmentation process is completed when the merging process is done. This approach is fast because it analyzes only the pixels along the equidistance contours and the entire process can be completed in just one pass.
A segmentation technique for range images based on Fourier transform is presented. It allows the extraction of planar and quadric surfaces using a simple data coding. The method described is global (does not require the use of local operator for classification), robust to noise and easy to implement. Recognition procedures are also discussed.
Stereo permits recovery of information about the three-dimensional location of objects which is not contained in any single image. Various applications of techniqueswhere stereo play a primary or ancillary role include such areas as video display systems, human vision, computer vision, automatic tracking, and cartography. In this paper, we have selected the area of video display systems to provide an insight into the importance of stereo. The additional bibliography is to acquaint the non-specialist with this burgeoning field.
Developed herein is a formal theory for stereo vision which unifies existing stereo methods and predicts a large variety of stereo methods not yet explored. The notion of "stereo" is redefined using terms which are both general and precise giving stereo vision a broader and more rigorous foundation. The variations in imaging geometry between successive images used in parallax stereo and conventional photometric stereo techniques are extended to stereo techniques which involve variations of arbitrary sets of physical imaging parameters. Physical measurement of visual object features is defined in terms of solution loci in feature space arising from constraint equations that model the physical laws that relate the object feature to specific image features. Ambiguity in physical measurement results from a solution locus which is a subset of feature space larger than a single measurement point. Stereo methods attempt to optimally reduce ambiguity of physical measurement by intersecting solution loci obtained from successive images. A number of examples of generalized stereo techniques are presented. This new conception of stereo vision offers a new perspective on many areas of computer vision including areas that have not been previously associated with stereo vision (e.g. color imagery). As the central focus of generalized stereo vision methods is on measurement ambiguity mathematical developments are presented that characterize the "size" of measurement ambiguity as well as the conditions under which disambiguation of a solution locus takes place. The dimension of measurement ambiguity at a solution point is defined using the structure of a differentiable manifold and an upper bound is established using the Implicit Function theorem. Inspired by the Erlanger program of F. Klein generalized stereo methods are equivalently described by the algebraic interaction of the symmetry group of automorphisms (i.e. bijections) of feature space into itself leaving a measurement solution locus invariant, with the set of automorphisms of feature space induced by arbitrary variations of a set of physical parameters. A purely group theoretic characterization of the conditions under which measurement disambiguation takes place is given.
Scene analysis requires surface information in the form of depth to be computed at all points in the image. Of the several cues available to compute depth, the retinal disparity has been proven to be the most reliable one and hence numerous stereo algorithms have been reported. A class of these algorithms, known as feature based, computes the disparity only at the edge locations in the image. Because we need depth at all the points in the image, this scarce data has to be used to estimate depth at all points in the image. While this problem could be posed as a multivariate minimization problem as Grimson suggested, the weighted sum scheme proposed by Shepard to interpolate scarce data in the geophysical domain seems to be a more computationally affordable scheme. A few interesting niceties of this scheme are: (i) the analyticity of the interpolant everywhere except at the vicinity of the data points but its mere continuousness at the data points (not even differentiable once), (ii) its similarity to the familiar gravitational models and (iii) its elegant biological feasibility. In addition, the derivative information obtained from other cues such as shading can be gracefully combined to present a unified percept of surface information. In this paper we discuss about the use of local version of this scheme to interpolate the stereo data.
A feature-based stereo vision technique is described in this paper where curve-segments are used as the feature primitives in the matching process. The local characteristics of the curve-segments are extracted by the Generalized Hough Transform (R-table) representation of the curve-segment. The left image and the right image are first filtered by using several Laplacian of a Gaussian operators (VG) of different widths. At each channel, the Generalized Hough Transform of each curve-segment in the left and the right image is evaluated. This is done by calculating the R-table representation of each curve-segment based upon the centroid of the curve-segment. The R-table, curve-length, and the average gradient of the curve are used as a local feature vector in representing the distinctive characteristics of the curve-segment. The feature vector of each curve-segment is used as a constraint to find an instance of the same curve-segment in the right image. The epipolar constraint on the centroids of the curve-segment is used to limit the searching space in the right image. A relational graph is formed from the left image by treating the centroids of the curve-segment as the nodes of the graph. The local features of the curve-segments are used to represent the local properties of the nodes, and the relationship between the nodes represents the structural properties of the object in the scene. A similar graph is also formed from the right image curve-segments. Sub-graph isomorphism is then formed between the two graphs by using the epipolar constraint on the centroids, the local properties of the nodes (node assignment), and the structural relationship (compatibility) between the nodes.
In the present state of the art, the importance of the artificial vision in robotics, industries and the necessity of robot presence in hostile environment is not yet to prove. The vision systems that already exist are type application dependent. There are three main classes of the stereo vision systems : - The laser imaging are potentially hazardous and has difficulties with shiny metal reflective surfaces. At present, it is a more expensive depth sensing technology than the other methods stated bellow ; - Photometric stereo puts great demands on the illumination in the scene and properly undestanding the reflectance properties of the object to be viewed ; - Whenever, the binocular stereo vision can be used in a wide range of illuminations and object domains. It is a well understood method. Its low cost motivate its use in a generalized robotics environment, neverless, the difficulties encountered when one want to put in correspondance the two images of the stereo pair. This paper presents one binocular stereo vision system, applied to the polyhedric objects, which performs in a first time the features extraction and 3-D coordinates determination of the vertices. In a second time, it also permits the recognition of objects that have been already modelled on data-base and characterised on an appropriate knowledge-base. The principal operations performed by our system are : - The image processing (segmentation, edge extraction and idealization, skeleton...) ; - Object's location (vertices extraction and the determination of their 3-D coor-dinates...) ; - Object recognition (pertinent features extraction, knowledge base establishement and object identification). Experimental results are already obtained in our laboratory.
Proc. SPIE 0848, Estimating The Three-Dimensional Motion Of A Rigid Planar Patch In The Hough Parameter Space, 0000 (19 February 1988); doi: 10.1117/12.942779
We develop new methods for estimating the three-dimensional general motion (rotation and translation) parameters of a rigid planar patch from two-dimensional perspective views at two time instances. The proposed method requires line correspondences between images in the Hough parameter space. With the Hough transform, the extraction of line features from scenes is simple and since the Hough transform is not severely affected by random noise, the dimensionality of the system to be solved should be small since least-squares solutions may not be necessary. We find that in the case of pure translation, three line correspondences are necessary to yield unique, linear solutions. A relative depth map of the object space can also be obtained. For the case of pure rotation, three line correspondences are also necessary to yield unique, linear solutions, and the object lines do not have to lie in a planar patch. It is not possible, however, to obtain a relative depth map of the object space. For the general case, four correspondences are needed to solve for the motion parameters, and the solution is not linear. Moreover, a relative depth map of the object space can be obtained.
In the computer vision literature, the vision model used most frequently has incorporated Monge surfaces and either orthographic or planar perspective. In recent years, a vision model based on spherical surfaces and spherical perspective has arisen as an alternative that avoids the limitations of these standard models. In this paper we discuss the use of the spherical vision model in the study of optical flow for smooth surfaces.
By analyzing the evolution of an image sequence, some geometrical properties have been introduced. These geometrical properties led to a prediction scheme in determining the correspondence of feature points in an image sequence.