An attractive feature space (chord distributions) for pattern recognition is discussed. New advancements presented are: extensions to 3-D in-plane and out-of-plane distortion-invariant object recognition; new techniques to allow estimation of in-plane distortion parameters; and a new technique to achieve class estimation in the presence of multiple distortions. Quantitative results are provided for a ship data base (for out-of-plane distortions) and for an aircraft data base (for in-plane distortions).
This paper deals with the issue of conjoint resolution as it pertains to joint spatial/spatial-frequency representations such as the Difference of Gaussians (DOG) and Gabor representations as well as the Wigner distribution. We define what the conjoint resolution of a joint representation is, how one could measure it, why one would wish to use a representation that maximizes it, and finally, how high-conjoint resolution is obtained, in practice, by selecting an appropriate joint image representation.
A novel feature extraction method, useful for 2-D shape description, is proposed. It is based on an optimal representation of a 1-D signal in space - spatial frequency domain, the Wigner distribution. For shape clasification, one of the many 1-D representations of the 2-D contours is employed. Boundary features, or shape descriptors, are obtained using sigular value decomposition of the Wigner distribution (WD). Properties of WD singular values are presented and shown to encode certain shape features such as the space-bandwidth product, the shape complexity in terms of number of components and their spacing, and the spatial frequency vs. the space dependence. The singular values of the boundary Wigner distri bution possess all the properties required of good shape descriptors. To illustrate the effectiveness of these descriptors in shape classification, a number of examples are presented. The proposed method is useful for robust classification of any 1-D patterns.
A set of 2-D multi-resolution, rotation-invariant operators is developed. These operators are based on 1-D projection functions which form a basis set for local image patterns. The operators are complex with the magnitude rotation-invariant and the phase carrying directional information. Convolution of an operator with an image yields a complex output image containing magnitude and phase information. Application of a set of such operators at different resolutions to an image yields a set of features which may then be used for classification and phase analysis. The operators are evaluated with respect to their ability to perform texture analysis. Texture classification of four structurally similar textures is performed with better than 90% accuracy of interior regions. Also demonstrated is the ability of the operators to provide orientation information about textures.
The exponential non-uniform to uniform hardwired spatial coordinate transformation inherently imbeds an algorithm in the hardware architecture and has thus been called an algotecture. It has been suggested that the algotecture described may be more sensitive to centroid pointing errors than conventional cartesian grid structures. Simulation results for crosscorrelation template matching in the algotecture space, as opposed to standard rectilinear coordinate space, is presented for the case of annulii images with various centroid mismatch. These simulations support the claim that the algotecture mapping is less sensitive to centroid mismatch. The use of template matching on an algotecture mapped grey scale image shows the feasibility of using this technique on more complex images. Since crosscorrelation is a relatively time consuming operation, a sliding window differencing similarity measure is proposed to accomplish fast detection in the algotecture mapped space directly at the sensor level. Coupling this idea with the classification of objects via the formation of orthogonal feature vectors contained in separate spatial frequency channels which are constrained by human visual system physiological data provides a fast method of object classification which exploits the fact that different features occur in different spatial frequency bands. Finally, the use of a three spatial frequency bandpass feature extraction filter system useful for an intra-class, inter-class, and membership identification classification scheme is discussed.
This paper proposes an approach to waveform coding that generates a family of one dimensional waveforms from a two dimensional image, transform codes the waveforms by fitting a low rank approximation to them, and codes the approximation parameters into variable length binary codes that reflect the probabilistic structure of the image. We argue that contour lines are the most natural one dimensional lines to pass through an image. As readers of topographic maps know, a relatively small number of them may be used to transmit relevant information about an image. Furthermore, contours organise the image data into pixel classes that are characterised by smooth connecting lines. It is reasonable to assume that these contour lines can be represented by very low rank models. The distribution of the approximating parameters in the model is then used to derive a coding scheme. In this way first and second order information is used to compress the image data. The first order information, the data itself, is low rank approximated. The second order information, typically represented by the correlation structure of the image, is used to derive the probabilistic structure of the approximation parameters so that probable para-meters may be coded into short words and improbable ones into long words. As contours tend to be shaped very much like adjacent contours, as in the contouring of a peak or valley on a topographic map, it is reasonable to assume that the correlation between contours can be exploited to reduce further the number of bits required to represent a scene.
Two capabilities that future generation cruise missiles may have include retargetting and damage assessment. This paper discusses three key areas required to implement these capabilities (missile-borne sensors, software and communications). It addresses the impact of a fixed versus a moveable or mobile target structure on the hardware and software needed, and provides a time estimate to implement these capabilities on cruise missiles. A brief discussion is also given on the extension of these techniques to (smaller) remotely piloted vehicles and ballistic missiles.
A 2D, binary vision algorithm for contour description is presented. The contourcoding scheme is based on a "reduced generalized chain code". The most interesting feature of the resulting 4-bit code is its length preserving behaviour. Preliminary results for contour segmentation are presented, but this problem is still under investigation.
An automated, or "smart", surveillance system must be sensitive to small object motion wherever it may occur within a large field of view. The system must also be capable of distinguishing changes of interest from other image activity or noise. Yet the data processing capabilities of practical systems is often quite limited. To achieve these performance objectives at a low data rate, a pyramid based image preprocessor has been constructed that can compute frequency tuned "change energy" measures in real time. A microprocessor then examines a relatively small set of these measures and follows a foveal search strategy to isolate moving objects for tracking or for more detailed analysis.
In order to accurately describe and detect motion, a high sensitivity to temporal discontinuities in intensity must be maintained over a wide range of velocities. We present a highly parallel computing structure that optimizes the detection of these discontinuities. The architecture consists of three, hierarchically organized, two-dimensional layers of hexagonal operators with overlapping receptive fields. Two types of INPUT operators generate responses of opposite polarity and different onset delay. The CONTEXT operators act to enhance and rectify this difference. Contextual information is used via lateral inhibition pathways to reset the thresholds for the INPUT and the OUTPUT operators thus controlling temporal sensitivity of the module. This facilitates the detection of relative motion. The actual output of our module is approximately a difference of Gaussians of the absolute value of the derivative of intensity with respect to time. Results of simulations show that uniform motion throughout the visual field produces a small response. When motion is limited to any small part of the visual field, the response at the output of the module is approxi-mately a linear function of velocity.
Firm analytical foundations are laid in this paper for quantifying the performance of a class of synthetic discriminant functions (SDFs) known as the equal correlation peak (ECP) filters. It is shown that while the conventional ECP filters are optimum for white input noise, they are not optimal for colored input noise. A modified ECP filter is proposed to optimize the performance for colored input noise. General expressions are also derived to quantify the loss of optimality of the conventional ECP filters.
Machine vision systems are being called upon to solve problems in inspection and control. The majority of such systems use a monochrome TV camera as the image sensor. However, such cameras cannot be applied where color plays an essential role. Color vision is particularly appropriate for dealing with colored objects or color-coded parts. The use of color permits part discrimination where gray-scale information is insufficient. A color detection system using three color filters and a monochrome TV camera is presented. Tests were made on identifying colored wires and values of color-coded resistors. The results of this first look into the use of color vision are promising.
Vision is interpreted as embedding an image in a causal framework. Specifically, vision is decomposed into image recognition and image understanding. A fuzzy cognitive map (FCM)a fuzzy causal graph--is selected as the causal framework in which vision occurs. Image recognition is then interpreted as activating causal concept nodes on a FCM. Image understanding is interpreted as the causal association, via FCM edge functions, induced by recognized/activated causal nodes. Vision therefore is represented as the spatiotemporal process of spreading activation and decaying oscillation or resonance on a FCM. An explicit dynamical model of vision is introduced and an optical implementation is described.
A solution is presented to one of the problems hampering the development of video track-ing systems - that of discriminating a target in space when the Earth is in the background. The visual tracking system that is described uses the near infrared (IR) to take advantage of differences in spectral reflectance between the target and Earth in the background to enhance amplitude differences, simplifying the detection process. The real-time visual tracking system operates under microprocessor control and is light weight, adaptive, and low power. This paper describes the sensing hardware, tracking hardware, and system operation.
Assembly tasks typically involve tight fits and precise positioning beyond the capability of position-controlled robots. The motions required for such tasks, called fine-motions, may be performed by means other than pure position control. Fine-motions must overcome the inherent uncertainty in the robot's position relative to its environment. This uncertainty results from errors in sensing, modeling, and control. If bounds on these errors are known or can be estimated accurately, motions may be planned that will perform fine-motion tasks despite the uncertainty. Automatic planning of robot motions to perform these tasks prevents the tedious, error-prone process of constructing a plan for each task by hand. This paper presents an approach to fine-motion planning and an implementation of the algorithms in a planning system. A simple method is explored to plan and perform assembly tasks in the presence of uncertainty using a geometric model of the task, error bounds, and a model of compliant behavior. The principal ingredients of the method are algorithms that manipulate graph representations of the task to find, and choose from, a small number of alternative plans. The approach is implemented in a system that generates plans from task descriptions of two degree-of-freedom operations.
This paper deals with the acquisition of an object by a manipulator arm when there is significant uncertainty concerning the position and orientation of the object. The key operation is planning the grasp. Objects are modelled using constraint sets. A constraint manipulation system is employed to maintain a more efficient representation. These models combine all sources of uncertainty in the position, orientation and size of objects. Geometric reasoning is applied to the hand and object models to find a suitable grasp configuration. This involves predicting a probable configuration and checking for feasibity. Trajectory planning which enables the hand to achieve the grasp smoothly, without collision with other objects is also discussed.
In this paper an efficient approach to collision detection is proposed. Its incorporation in two approaches for the solution of the find-path problem is also presented. Collision detector is based on representing objects by a hierarchy of bounding spheres. Features of the collision detector are that computation time depends on the proximity of the objects and that the method concentrates its efforts on the parts of objects most likely to have geometric interference as determined by collision information. The use of this representation for collision detection in two approaches for the solution of the find-path problem, namely a generate and test approach and a free space representation based approach, are presented. The paper is organized as follows. First, a review of the literature is given. Second, details of the collision detector and its incorporation in a solution for the find-path problem are presented. Third, the implementation of this approach on a LISP machine and the preliminary performance tests performed using a PUMA 560 robot are discussed. Finally, conclusions and potential extensions are outlined.
An intelligent robot control system based on dynamic sensing feedback using visual eye-in-hand systems have attracted great research efforts, because these systems have several advantages over the use of a fixed external video camera system. For example, it ignores inadequate lighting and background contrast, optical shadows, environmental lighting variables, and an increased maneuverability to access widely scattered or hidden parts. This paper presents an eye-in-hand system using a solid-state linear line scanner as the sensing element. The system currently has the ability to determine the pose and shape of objects, and also is capable of defining the proper grasping site prior to grasping. The paper also describes the system configuration and the proposed object learning and recognition procedure using line scan methods in detail. In this research, an experimental model is built which is used to simulate a fully opened parallel jaw gripper. A photodiode array is embedded in one side of the finger. A Compaq microcomputer based on the Intel 8088 microprocessor is used as the host. An interface between linear array video scanning signals, CPU and Video RAM is built and successfully tested.
A tactile sensor for use with robotic manipulators is de-scribed. The sensor has a spatial resolution of 8 x 8 points in a 1.27 x 1.27 cm square space, giving it a theoretical. two point disparity of 1.9 mm. Force is measured by determining the variation in distance between two parallel plates of a capacitor. An injected molded silicone rubber material, placed between the upper and lower plates of the force sensing capacitors, is used as both the elastic and di-electric medium. A series of experiments were conducted to quantify the performance of the device; hysteresis was found to be low and spatial resolution approached its theoretical limit. Finally, the tactile force images of several small objects are shown.
The structure and hence geometry of present day robots is outside the actuator control loop. Accuracy under changing load conditions requires a rigid and therefore massive structure. The performance may be improved by use of lighter structures with sensing and feedback control of the elastic modes and damping. This paper describes a sensing method based on a system of internal light beams and photodetectors that provides end-point sensing and structural deformation information necessary for high performance control.
This paper describes a current research which involves the development of a special end effector and the necessary algorithms to perform inspection and gaging tasks using tactile sensing. The special end effector, which has a mechanical probe performs these operations by using adaptive contact and continuous measurement of forces to determine the locations and the dimensions of the holes to be inspected. This will provide a flexible and relatively inexpensive means of inspection when extremely high accuracies are not desired.
Two mechanisms are presented which select viable candidates for a pattern matching process. This selection process is termed hypothesis generation. One mechanism is data-driven, using viewpoint dependent features to eliminate obviously poor choices and inhibit unlikely choices. The second mechanism is context-driven, using previously recognized objects as cues for generating future match candidates. These mechanisms have been incorporated into a theorem proving based pattern matching system and serve to constrain the space of possible matches. This isolates the system performance from the effects of the large search space that is necessary for a general-purpose vision system operating in an unconstrained environment.
In this paper a cylindrical multivalued logic transform is defined. The transform is defined in terms of cylindrical multivalued Walsh functions. It is shown that these functions form an orthogonal set. The number of independent constraints needed to define these functions is shown to be equal to (m+1)/2. The cylindrical Walsh functions form a complete set. Thus, the transform can be used to expand the two dimensional functions as a series of multivalued cylindrical functions.
The Hierarchical Region Structure (HRS) method successively clusters related image segments, and represents these clusters as a hierarchical tree-like structure. This approach organizes the segmented regions and enables improved interpretation and classification. One ot the major benefits of this approach is that a cluster of regions may be more readily matched and interpreted than the individual segments which comprise the cluster. Region relationships (proximity, similarity, containment, and similar orientation) are used as the basis for selecting which regions will be merged. Hierarchical representation of this relationship information, as well as traditional feature attribute information, provides a novel approach to representing information about segmented image regions.
Fuzzy Vision is an architecture being considered at Nasa for the interpretation of multiple successive images. Its goal is the development of a system that can accept a sequence of images from several sensors, and update an image interpretation fast enough to direct an autonomous vehicle. Fuzzy Vision is relatively noise insensitive, and can be mapped directly onto parallel procesing hardware using the Grundy  parallel processing system. Fuzzy Vision breaks the image interpretation problem into two distinct subsystems that access a semantic net  called the region structure. The sensors and a set of masks (arbitrary functions) comprise the first subsystem called the region generator. The region generator updates the values in the semantic net. The interconnectivity of the region structure is fixed, but the information stored at the nodes and links is altered. The second subystem is an expert system called the viewer that produces the object list which is the system output. The two subsystems operate asynchronously. The Fuzzy Visionarchitecture accepts several technologies and component implementations. The region generator can comprise slow scan cameras, ordinary cameras, radars, operator input, and other sensor input. Likewise, the viewer consists of several cooperating expert systems operating asynchronously. If noise is added to the picture, only the data in the region structure is altered, the stability of the viewer is not affected, and noise insensitivity is achieved. Since the region structure is specified, the viewer can be simpler than the expert systems usually used. Local autonomous navigation (under 10m) is too close for practical use of radar, and most laser techniques require cooperation from the target. Malfunctioning satellites or those built without reflectors cannot be easily serviced. A vision system that can interpret raster images into a list of objects in real time would allow autonomous astrogation in close quarters or perhaps on another satellite. A computer or remote operator can interrogate Fuzzy Vision, reducing bandwidth requirements by a factor of 1 million or more. There has been success with current vision understanding systems such as ACRONYM , but these systems, require large periods of time (hours) to analyze a simple image, will not accept data from other sources such as radar, and are often unstable in the presence of noise.
A method of vision, called by "Patterns Aggregation", intended to satisfy some aplica-tions" in Robotic Assembly   is described in this paper. This method, which uses a structural approach, is turned towards the Artificial Intelligence, notably, by the use of a rule-based system. The model, adopted by this method, represente a some 2-D pertinent Sub-patterns of a 3-D object. This model is described by means of a Grammar called "Pattern's Grammar" (PG) . The Recognition adopts an Ascending-Descending analysis strategy. This strategy resorts to an hypothesis "Prediction-Verification" Scheme for the Confirmation (positive information) or the invalidation (negative information) of definite information in a set of images generated from a 3-D object.
Coherent optical processors capable of producing various feature spaces in parallel exist. Those feature spaces that can be optically generated include fourier coefficients, space variant Mellin transforms and polar transform coefficients, moments, chord distribu-tions and Hough/Radon transforms. Optical correlators capable of simultaneous identification and location of multiple objects are well-known and have reached significant levels of compact fabrication. New methods of matched spatial filter synthesis allow these correlators to achieve multi-class distortion-invariant object recognition. New optical computer architectures performing matrix-vector and linear algebra operations represent a class of general purpose systems similiar to digital systolic and multiple processor systems. These various optical architectures will be reviewed and reference will be made to initial results obtained on these systems. In all instances, all architectures are hybrid optical/digital systems that utilize the best features of optical and digital processors. In all cases, attention is given to multi-class distortion-invariant pattern recognition.
In this paper, a new system for calculating the geometric moments of an input image is presented. The system is based on a mathematical relationship that relates the geometric moments of the input image to the intensity of the Fourier transform of the image. A digital post-processor performs a differentiation process on the sampled Fourier intensities to obtain a set of values which are combined in a pre-determined fashion to provide the geometric moments of the original input function.
A number of machine vision systems have been invented, analyzed, and related to their ability to perform their intended tasks. The systems have usually been hybrid, with the sensor being the analog device, and the processing and articulation control being digital. Herein, a system is proposed in which the use of an optical correlator is extended to a vision system by using the parametric sensitivities of a matched filter as position or aspect determinants. The results of the analysis of the proposed system are presented and related to applications suitable for the techniques.
In this paper we investigate the possibilities associated with applying the theory of matched filtering to the problem in robot vision commonly referred to as the "bin picking problem." While the implementation of matched filtering theory to applications in pattern recognition or machine vision is ideally through the use of optics and optical correlators, both digital and optical considerations of such an application will be investigated.
The use of optical Fourier transform and computer generated hologram (CGH) techniques allows a high-dimensionality feature space to be produced in parallel. By the proper coordinate transformation CGH, a position, rotation and shift invariant feature space results. The use of synthetic discriminant functions (SDF) and CGH techniques allows high-dimensionality of optical linear discriminant functions (LDFs) to be produced. These optical LDFs allow high-dimensionality and when designed by SDF techniques, 3-D distortion-invariance results. Initial simulation results using a ship image data base are presented.
Low level processing or preprocessing is often the starting point in many computer vision systems. This stage is typically characterized as picture processing and uses gradients, edge templates, filters, thresholding, and other enhancement techniques. The next stage in computer vision commonly deals with the processed image, usually binary in nature, and extracts intrinsic features such as size, shape, boundary, orientation, moments, etc. A new class of multiple-valued integrated optical processors is proposed for the parallel implementation of selected preprocessing algorithms. The basic component of the multiple-valued processor is an integrated optical threshold gate.(1-3) The parallel processing algorithms required for both smoothing and boundary determination with an integrated optical thresholding array are described. One possible architecture for an optical parallel digital image processor is presented.
The extraction of planar surfaces in range imagery is essential in many object recognition schemes where 3D objects are assumed to be composed mainly of such surfaces. In this paper a novel hierarchical technique is presented for the extraction of planar surfaces which has the potential of being both efficient and robust in the presence of noise. The planar surfaces are extracted by means of an hierarchical 3D Hough transform. The levels of quantization of the 3D Hough space vary from coarse to fine depending on their position in the hierarchy. The technique has many advantages in terms of efficiency, storage requirements, and immunity to noise.
A computer simulation environment is being developed in which 3-D polyhedral models of robots and other objects are created and manipulated with nearly real time response. Color perspective images of scenes are generated with object faces displayed as shaded filled polygons illuminated from a point light source. An approximate but simple and fast hidden surface algorithm is used. The position and orientation of the virtual camera observing the scene can be changed interactively to focus on areas of particular interest. The system will eventually support development and initial debugging of robot application programs within the simulation environment, with subsequent automatic generation of programs for real target robots. It will also be used in research on model based intelligent robotic systems. This memo describes an initial set of capabilities that will be expanded in the future. A short videotape has been prepared that demonstrates many of the system's features.
The motivation for this development is a need to construct a robot simulation facility which will assist in the development of effective algorithms to plan and control robot movements. It was thought that a system that presented robot's movements with an animated graphic display would be useful in the design and testing of robot planning and control programs. In this research a "programmer's apprentice", called the Graph Design Assistant (GDA), has been designed to help researchers construct robot simulation models on an Evans and Sutherland PS 300 graphic workstation. A programmer's apprentice is an expert system with knowledge about a programming task. It guides and advises a programmer as he interactively builds a program. This article describes how the system allows a researcher to develop a program on the PS 300 without having to learn the intricacies of the PS 300's dataflow language. In this system, plans are templates for portions of a graph in the functional graph network. The plans are similar to the macros used in the Evans and Sutherland Functional Graph Network Editor. The difference is that plans have slots which can be expanded to arbitrary structures. Some of the other aspects of the GDA are the development of an interactive interface, a library of plans, a user controlled agenda, consistency checker, and plan creation and editing capability.
Grundy, an architecture for parallel processing, facilitates the use of high-level languages. In Grundy, several thousand simple processors are dispersed throughout the address space and the concept of machine state is replaced by an invokation frame, a data structure of local variables, program counter, and pointers to superprocesses (parents), subprocesses (children), and concurrent processes (siblings). Each instruction execution consists of five phases. An instruction is fetched, the instruction is decoded, the sources are fetched, the operation is performed, and the destination is written. This breakdown of operations is easily pipelinable. The instruction format of Grundy is completely orthogonal, so Grundy machine code consists of a set of register transfer control bits. The process state pointers are used to collect unused resources such as processors and memory. Joseph Mahon found that as the degree of physical parallelism increases, throughput, including overhead, increases even if extra overhead is needed to split logical processes. As stack pointer, accumulators, and index registers facilitate using high-level languages on conventional computers, pointers to parents, children, and siblings simplify the use of a run-time operating system. The ability to ignore the physical structure of a large number of simple processors supports the use of structured programming. A very simple processor cell allows the replication of approximately 16 32-bit processors on a single Very Large Scale Integration chip. (2M lambda) A bootstrapper and Input/Output channels can be hardwired (using ROM cells and pseudo-processor cells) into a 100 chip computer that is expected to have over 500 procesors, 500K memory, and a network supporting up to 64 concurrent messages between 1000 nodes. These sizes are merely typical and not limits.
In this paper an approach is proposed for acquiring location data for robot programming without using the robot itself as a digitizer. It is based on using a non-contact opto-electronic position measurement sensor to register the location data of a suitable target. The target will be positioned appropriately relative to the object of interest by the programmer. This process yields a sequence of locations for the hand-held target from which the desired tool locations relative to the object could be computed. Furthermore, as the sensor used is very fast, it allows a sampling rate that is adequate for capturing data on the trajectory of the hand-held target for subsequent programming of the robot to produce that same trajectory. Finally, the same sensor could be used to monitor the actual robot locations and/or trajectory and to automatically modify the program data to produce the best performance with respect to the data captured. The paper is organized as follows. First, various approaches to robot programming are briefly discussed and their various advantages and disadvantages are highlighted. Second, the approach proposed is presented. Third, implementation details and preliminary experimental results of various tests using the Selspot II system and a PUMA-560 robot are given. Finally, conclusions and an outline of work in progress are given.
This paper describes an architecture for creating pyramid transforms of real time video images. A powerful preprocessor can be designed with this architecture by representing the image data in a form most suitable for the application. The Burt pyramid algorithm , an efficient method for transforming video images into hierarchical representation, is an example of an effective transform. A programmable version of such a preprocessor, occupying two Multibus* boards, was built at low cost using available hardware. This unit can perform the basic pyramid trans-form on 256 x 240 images in real time. More complex and/or multiple transforms can be performed at reduced data rate by passing data through the unit several times, or can be performed in real time by passing data through multiple units. Because the preprocessor is programmable, the system is easily configured to perform several different pyramid transforms, or the appropriate inverse transforms. Algorithms can be developed to reduce edge effects by modifying the edges of the image before applying the transform. The preprocessor can be programmed to insert time delays into the system, which is useful in the display of the results. Image processing system, design can be simplified by using the preprocessor to significantly reduce the computational requirements of the main processor. The proposed architecture, suitable for system integration, could lead to the availability of low cost, efficient image processing systems.
Array processors are being utilized in combination with microcomputers to provide high performance sophisticated processing for vision systems. Many manufacturers of vision systems have endeavored to build proprietary array processors because an off-the-shelf solution has not been available at a reasonable cost. The ZIP 3216 array processor has overcome the obstacles that have inhibited the widespread use in vision systems through hardware and software innovations. The architecture allows for easy integration of the array processor into almost any configuration and the software enviroment allows for easy customization of algorithms and efficient programming. Details of the programming environment will be discussed with emphasis on programming examples that eliminate the need for microprogramming the array processor. The features of the hardware design that allow for optimal speed and flexible integrating and upgrading capabilites, will also be discussed.
The design of an algorithmically specialized processing module capable of implementing arbitrary geometric transforms on discrete image data is described. Applications include geometric image correction, image mapping, and real-time computer graphics and display systems. Following a discussion of the inherent design complexities involved, several potential solutions and their relative merits and shortcomings are discussed. Of particular importance is the use of a binary or bit plane approach where k-bit pixel data is geometrically transformed one bit plane at a time. This is achieved via a massively parallel pixel destination address calculation and use of either an associative memory (AM) or a bit-plane AND-ing operation. The output bit-plane is formed using row and/or column decoders with OR-ed outputs. Future research directions are outlined.
The CBAP (Cross-Bus Array Processor) is a versatile VLSI array processor which can be used for a variety of matrix operations. It is easy to operate and is more efficient than systolic arrays in many aspects. In this paper, we explore the possibility of applying this VLSI array to image processing. For image feature extraction and pattern classification involving matrix operations, this array processor is obviously simple and effective. With appropriate buffer provision, 2-D convolution can be executed at the raster-scan image acquisition rate. Several neighborhood operations such as thinning, min/max filtering, and median filtering, can be performed as well. Other applications also presented here include finding the largest/smallest k gray levels and multi-threshold region labeling.
The paper deals with an algorithm driven architecture devoted to fast edge detection. The architecture has been specifically designed to process large convolution masks in a pyramidal (multiresolution) scheme. The basic element of the convolution board is a pro-grammable VLSI component. Several identical components can be connected in a virtually systolic structure in order to achieve the desired throughput rate. A distinctive feature of the system is the multiple-resolution capability of the convolver board. The number of convolver boards hosted by a multiple bus vision machine can be selected to achieve a parallel multiple-resolution operation. The main application of the proposed architecture is for fast edge detection based on the extraction of the zero-crossings of Gaussian filtered images. The paper is divided into two sections: In the first one results of numerical simulations are presented showing the accuracy of this edge detection technique applied on convolved images in a pyramidal structure while in the second, one the systolic architecture implementing the algorithm is presented.
The ALV project is sponsored by the Defense Advanced Research Project Agency (DARPA) as part of its Strategic Computing Program and contracted through the Army Engineer Topo-graphic Laboratories (ETL) under contract DACA76-84-C-0005. The purpose of the strategic computing program is to advance the state of the art in artificial intelligence, image understanding, and advanced computer architectures and to demonstrate the applicability of these technologies to advanced military systems.1
The ideas that follow in this paper relate to the central theme of autonomous mobile robot navigation. Particular attention is given to the issue of designing a capability of landmark recognition. The proposed scheme of landmark recognition is intended to correct for the problem of discrepancies between the robot's actual location and the robot's estimate of its location. Acknowledgements: My graditude is extended to Tom Mowbray and John Bradstreet for their helpful discussions of this report topic. Keywords: Mobile robot navigation, robot vision, dynamic modeling.
On March 22, 1983 at a press conference in Los Angeles, Odetics, Inc. introduced ODEX I, a tetherless, six-legged walking machine built to demonstrate the company's base technology for intelligent machine systems.
The Computer Vision Laboratory at the University of Maryland has been participating in DARPA's Strategic Computing Program for the past year. Specifically, we have been developing a computer vision system for autonomous ground navigation of roads and road networks. The complete system runs on a VAX 11/785, but certain parts of the system have been reimplemented on a VICOM image processing system for experimentation on an autonomous vehicle built for the Martin Marietta Corp., Aerospace Division in Denver, Colorado. We give a brief overview here of the principal software components of the system, and then describe the VICOM implementa-tion in detail.
An algorithm has been developed to guide a robot by identifying the orientation of a randomly-acquired part held in the robot's gripper. A program implementing this algorithm is being used to demonstrate the feasibility of part-independent robotic bin picking*. The project task was to extract unmodified industrial parts from a compartmentalized tray and position them on a fixture. The parts are singulated in the compartments but are positionally and rotationally unconstrained. The part is acquired based upon three-dimensional image data which is processed by a 3D morphological algorithm described in . The vision algorithm discussed here inspects the parts, determines their orientation and calculates the robot trajectory to a keyed housing with which the part must be mated. When parts are extracted during a bin picking operation their position and orientation are affected by many factors, such as gripper insertion-induced motion, interference with container side walls during extraction, slippage due to gravity and vibration during robot motions. The loss of the known position and orientation of the part in the robot gripper makes accurate fixturing impossible. Our solution to this problem was to redetermine the orientation of the part after acquisition. This paper describes the application in detail and discusses the problems encountered in robot acquisition of unconstrained parts. Next, the physical setup and image acquisition system, including lighting and optical components, are discussed. The principles of morphological (shape-based) image processing are presented, followed by a description of the interactive algorithm development process which was used for this project. The algorithm is illustrated step by step with a series of diagrams showing the effects of the transformations applied to the data. The algorithms were run on ERIM' s new fourth generation hybrid image processing architecture, the Cyto-HSS, which is described in detail in , and the performance is compared to the same programs executed on a general-purpose mid-sized computer.
In the implementation of an autonomous mobile robot, the navigation system must be able find an acceptable path through a region of multi-valued traversal costs (as opposed to a binary regime of obstacle avoidance). Information must be efficiently represented, with sufficient information density in the robot's immediate navigation domain, in a manner which facilitates a process of learning the terrain. This paper discusses a decision system built around a "Routing-Engine" employing a cellular-array processor to propagate a wave over an two-dimensional map in which the pointwise traversal costs are represented as pointwise refractive indices. The path returned is the locus of local normals to the wavefront of the first wave, originating at the robot's current location, to reach the goal. This routing-engine is run recursively on a hierarchical stack of maps arranged in linear-spatial registration with the coarsest information resolution in the most global map. The central fovea of each map in the hierarchy is "blown-up" to yield a map more local to the vehicle, with the lowest level map possessing sufficient resolution to maneuver the robot. As the robot moves, its registration in the centre of each map in the stack is maintained by "scrolling" the maps over each other. As this is done, sensed information is propagated through the stack updating the information stored at each level. The system has been implemented successfully in simulation.
An autonomous mobile robot that is capable of planning "intelligent" paths through its world must have a suitable representation, or model, of that world and must be able to interpret its sensory inputs in terms of that world model. Furthermore, the robot must be able to modify its planned movements (and world model) based on differences between current perceptions of the world and the existing model. This paper describes a two-part world model and a method for path planning that are particularly suitable for use in known, man-made environments. The basic model treats the robot's world as a hierarchical network of logical regions with arbitrary polygonal boundaries. Because the regions need not be convex, they may correspond one-to-one with intuitively partitioned, but irregularly shaped, real-world regions such as rooms, hallways, and open work areas. Obstructions within regions are also modeled by arbitrary polygons. A "complex configuration space" is derived from this basic model and is used to plan goal-oriented paths in a top-down fashion. Dynamic replanning is used to cope with changes in the environment.
Among the various schemes for navigating mobile robot, the control of robot vehicle under a supervisory vision system is a simple and flexible one. In this paper, a design of such system is presented. The human operator can easily specify the planned route of the mobile robot on a monitor which shows the site image taken by the supervisory camera. Our design allows the user to specify the route, the dwelling time at each stop along the route, and the multi-route, multi-robot scheduling. In addition to the interactive human interface, several vision techniques used to achieve real-time mobile robot locating are presented. Together with the image-to-world coordinate transformation, the accuracy of this vision-based navigation system is analyzed.
Mobile robotic devices hold great promise for a variety of applications in industry. A key step in the design of a mobile robot is to determine the navigation method for the mobility control. The purpose of this paper is to describe a new algorithm for omnidirec-tional vision navigation. A prototype omnidirectional vision system and the implementation of the navigation techniques using this modern sensor and an advanced automatic image processor is described. The significance of this work is in the development of a new and novel approach - dynamic omnidirectional vision for mobile robots and autonomous guided vehicles.
The computation for the intersection of convex objects is one of the most important problems for assembly robots and intelligent robots. This paper presents the collision discriminant and algorithm for two moving robotic arms with complicatedly shaped objects being grasped in the three dimensional space. A discriminant criterion has been proposed for intersection detection, which is based on proximity space. In this paper we have discussed how to reduce the calculated times of computer. This method is more expressive and straightforward, requiring less storage and computing time.
A multisensor, microprocessor-controlled robotic locating system (RLS) is being devel-oped in our machine intelligence laboratory. A video camera, laser illuminator, optical angle sensors, and ultrasonics are the primary sensor components. Software is being de-veloped on a personal computer that has been augmented with video acquisition and high-speed signal processing boards. Real-time (less than 1 second) algorithms are being implemented that exploit both natural and artificial patterns. Artificial patterns are introduced by planting light-emitting diodes (LED) and reflective materials in the envi-ronment. Projecting light patterns is also part of the design. Using control points to determine camera position and orientation is a particularly interesting problem because of its apparent simplicity. Several algorithms for calibrat-ing a camera using control points are discussed in detail. The use of four control points in a rectangular configuration is especially detailed. A review of various approaches to camera calibration is included. The methods pre-sented are from the literature and work performed by the authors. These different ap-proaches provide insight into the nature of the problems. Closed form solutions are emphasized.
A value function structure is proposed to solve the problem of multiple sensor integration. The value of a sensor or a group of sensors will be a function of the number of possible object contenders under consideration and the number that can be rejected by using the information available. It will also depend on the current state of the environment and can be redefined to indicate changes in sampling frequency and/or resolution for the sensors. A theorem prover will be applied to the sensor information available to reject any contenders. The rules used by the theorem prover can be different for each sensor while the integration is provided by the common decision space. This overcomes any incompatibility between different sensors with respect to feature extraction and pattern recognition algorithms. A database will be used to store the values for different sensor groups and the best search paths, and can be adaptively updated, thus providing a training methodology for the implementation of this approach.
This paper is a synopsis from a study of sensor data fusion and artificial intelligence (AI) aspects of blackboarding systems (or belief structure methodology). The study promotes the design of a knowledge base for robot motion control and a decisionmaking scheme for a robot working in an environment, using multiple sensors for guidance and control. Advanced manipulator control systems require output processing from several sensors, fusion of the sensor data, and determination of motions that best adjust the end-effector position for an optimal motion trajectory. A method to weigh the data from the manipulator guidance sensors and rules for sensor output processing will be proposed.
A coherent automated manufacturing system needs to include CAD/CAM, computer vision, and object manipulation. Currently, most systems which support CAD/CAM do not provide for vision or manipulation and similarly, vision and manipulation systems incorporate no explicit relation to CAD/CAM models. CAD/CAM systems have emerged which allow the designer to conceive and model an object and automatically manufacture the object to the prescribed specifications. If recognition or manipulation is to be performed, existing vision systems rely on models generated in an ad hoc manner for the vision or recognition process. Although both Vision and CAD/CAM systems rely on models of the objects involved, different modeling schemes are used in each case. A more unified system will allow vision models to be generated from the CAD database. The model generation should be guided by the class of objects being constructed, the constraints of the vision algorithms used and the constraints imposed by the robotic workcell evironment (fixtures, sensors, manipulators and effectors). We propose a framework in which objects are designed using an existing CAGD system and logical sensor specifications are automatically synthesized and used for visual recognition and manipulation.
The octree representation of three-dimensional objects is a generalization of the two-dimensional quadtree. It is a hierarchical representation based on the principle of recursive subdivision. The major features of the octree representation are that it is a hierarchical data structure, objects are kept in a spatially pre-sorted order at all time, and it has spatial addressability. Many operations performed on octrees can be easily implemented as tree traversals. These special features make octree representation very attractive in many applications such as solid modeling, computer graphics, computer-aided design/manufacturing, computer vision, robotics, space planning, and medical imaging. This paper surveys the recent advances made in the construction, representation, and manipulation of the octree representation.
The principles of stereo vision for three-dimensional data acquisition are well-known and can be applied to the problem of an autonomous robot vehicle. Coincidental points in the two images are located and then the location of that point in a three-dimensional space can be calculated using the offset of the points and knowledge of the camera positions and geometry. This research investigates the application of artificial intelligence knowledge representation techniques as a means to apply heuristics to relieve the computational intensity of the low level image processing tasks. Specifically a new technique for image feature extraction is presented. This technique, the Queen Victoria Algorithm, uses formal language productions to process the image and characterize its features. These characterized features are then used for stereo image feature registration to obtain the required ranging information. The results can be used by an autonomous robot vision system for environmental modeling and path finding.
A method is presented for locating an object which is introduced into a well modelled robot workspace. A sparse, accurate depth map of the object can be found by examining images of the workspace captured prior to introduction of the object, extracting visual edges from the old and new images, and suppressing old edges by a window-correlation technique. The resulting features can be matched within the stereo pair. This focus-of-attention strategy yields points on the object which are sufficiently numerous and accurate that the object may be acquired by a robot gripper.
This paper investigates the geometrical structure of parameters extracted from an optical flow. The transformation law is given with respect to the camera rotation around its focus by considering infinitesimal transformations. The parameter transformations form a representation of the 3D rotation group. The parameter space is decomposed into invariant subspaces of irreducible representations, and the optical flow is accordingly decomposed into two parts. The 3D reconstruction is described in terms of invariants extracted by this process. Their geometrical interpretation is also given.