We describe a real-time optical Hough transform (HT) inspection system and show quantitative inspection results using an industrial inspection application. The HT architecture uses an electronically addressed liquid crystal television (LCTV) as the real-time spatial light modulator, a novel selective edge-enhancement filtering technique, and realizes multiple slices of the HT with a computer generated hologram. The industrial case study of the inspection of cigarette packages is used to benchmark the HT processor. A test set of 100 packages is presented to the processor to qualify its effectiveness. The statistical significance of these finite test set results is also examined.
The human ability to process faces is remarkable. We can identify perhaps thousands of faces learned throughout our lifetime and read facial expression to understand such subtle qualities as emotion. These skills are quite robust, despite sometimes large changes in the visual stimulus due to expression, aging, and distractions such as glasses or changes in hairstyle or facial hair. Computers which model and recognize faces will be useful in a variety of applications, including criminal identification, human-computer interface, and animation. We discuss models for representing faces and their applicability to the task of recognition, and present techniques for identifying faces and detecting eye blinks.
A fast algorithm is described which makes use of image signatures or profiles for automated object detection. Following segmentation, signatures are generated with respect to both axes and transition points are marked to separate the signature into bands. The intersections of the bands define rectangular regions (subimages) which may contain objects or groups of objects. Signature parsing is repeated for each subimage until single band intersections are produced at which point each object is naturally bounded by the band limits. Recursive decomposition of the image in this manner allows fast location of objects and calculation of object parameters while avoiding pixel level processing. The output of the algorithm is a two-dimensional spatial object relationship tree (SORT) which contains a high-level hierarchical description of the spatial interrelationships between objects and groups of objects. The SORT provides a powerful tool for scene matching and can be used to distribute subimages (nodes) to multiple processors. The algorithm has been used for efficient detection of objects and their locations in picking and placing tasks with a (PC) compute time of 1-2 seconds for a typical image.
This paper presents an application of an experimental robot vision system for flexible component handling. Operational details of system components are described. Emphasis is laid on information processing, system integration and overall control. The functions of the image processing part are described briefly including a summary of a new algorithm for outline approximations of component silhouettes. Advantages of this algorithm include its potential for recognizing overlapping, touching or partially visible objects. The overall system implementation is addressed in more detail, covering synchronization between sensor and robot, operating modes, and control hierarchy.
The human visual system is usually able to recognize objects as well as their spatial relations without the support of depth information like stereo vision. For this reason we can easily understand cartoons, photographs and movies. It is the aim of our current research to exploit this aspect of human perception in the context of computer vision. From a monocular TV image we obtain information about the type of an object observed in the scene and its position relative to the camera (viewpoint). This paper deals with the theory of human image understanding as far as used in this system and describes the realization of a vision system based on these principles.
After more than twenty years, the field of computer vision has still not produced any clear understanding of how fast and generic recognition of unexpected 3D objects from single 2D views is even possible. RBC, a human image understanding theory, has been recently proposed as a model of this complex high-level visual capability. However, no systematic computational evaluation of its many aspects has yet been reported. The PARVO system discussed in this paper is a first step towards this goal since its design respects and makes explicit the main assumptions of the proposed theory. It analyses single-view 2D line drawings of 3D objects typical of the ones used in human image understanding studies. It is designed to handle partially occluded objects of different shape and dimension in various spatial orientations and locations in the image plane. PARVO is able to successfully compute generic descriptions and then recognize many common man-made objects. We present an overview of PARVO and original object recognition results.
Three-dimensional viewpoint invariance can be an important requirement on the representation of surfaces for recognition tasks. In general, the parameters in a parametric surface representation can be arbitrarily defined. A canonical, intrinsic parameterization provides us with a consistent, invariant form for describing surfaces. Our goal here is to define and construct such a parametrization. This paper presents an efficient technique for invariant surface reconstruction from a depth map. Our approach, which is based on a differential geometric analysis of surfaces, is computationally efficient compared to prior invariant representations. We present a two stage technique for the construction of a canonical parameterization of the surface in terms of arc lengths along lines of curvature. The first stage requires the minimization of a parameterized functional, which leads to a locally-coupled, sparse linear system solvable using efficient iterative methods. This minimization yields a surface that is invariant to 3D rigid motion. This is followed by the second stage, involving the computation of canonical parameter curves namely, the lines of curvature, and the numerical generation of a new grid on the surface which incorporates these parameters. We present experimental results on synthetically generated sparse noisy data.
A new translation and rotation invariant algorithm to identify and locate occluded objects in an image is presented. The points of local maxima and minima of the curvature function extracted from a digitized image of the object and smoothed by a Gaussian filter are used as control points and the object boundary is approximated by straight-line segments connecting these points. A two-pass boundary matching procedure is used to match the control points of the test shape to those of the object to be recognized. The matching is done from local to global, that is, from matching one segment pair to matching groups of segment pairs. The possible translational and rotational parameters (0, Ax, Ay) between the two shapes is recorded and a distance transformation used to determine the set which yields the best match. The algo-rithm has been successfully used to locate a set of tools from occluded images.
We discuss the problem of interpreting dense range images obtained from the scene of a heap of man-made objects. We describe a range image interpretation system consisting of segmentation, modeling, verification, and classification procedures. First, the range image is segmented into regions and reasoning is done about the physical support of these regions. Second, for each region several possible 3-D interpretations are made based on various scenarios of the objects physical support. Finally each interpretation is tested against the data for its consistency. We have chosen the superquadric model as our 3-D shape descriptor, plus tapering deformations along the major axis. Experimental results obtained from some complex range images of mail pieces are reported to demonstrate the soundness and the robustness of our approach.
A segmentation approach that uses successive cost processing is introduced. By processing graytone information the background is extracted adaptively throughout the image and pixels are classified into uncertainty and brightness relative to local background. The uncertainty pixels correspond to an estimate of the overlap between the graytone distributions of the background and dark objects, and the background and bright objects. Uncertainty pixels identify image regions requiring further processing with local and spatial information rather than merely graytone information. One rapid method resolves uncertainty by selecting thresholds at the maxima of local normalized edge magnitude histograms. The statistical means of floating histograms and of the estimated local background are used to control the search for thresholds in the local normalized edge magnitude histograms. Floating histograms retain the local brightness relationship between non-background pixels and the background. This method can distinguish between seven or less distributions locally by determining additional thresholds in the normalized edge magnitude histograms of dark pixels and bright pixels. The graytone image is then mapped into a multiple-label image corresponding to object brightness relative to local background. Another method erodes boundaries of uncertainty regions with a non-maxima edge magnitude suppression technique that insures consistent gradient direction among adjacent edgels. Gradient direction space is partitioned into certainty and uncertainty arc zones. A certainty arc zone directs an edgel during non-maxima suppression to the 8-neighbor it bounds; an uncertainty zone to the two 8-neighbors that bound the arc zone. Good quality and rapid segmentations have been obtained in industrial scenes of modest complexity. Background-object and interior object boundaries are satisfactorily outlined and the enclosed regions are represented with the proper relative brightness label.
A segmentation method that refines thresholds that extract the local background is introduced. During the first phase, the method decomposes the image into rectangular regions and measures the left and right standard deviations and graytone mean of each region. Background-homogeneous regions are detected by using criteria based on statistical theory. The background of a homogeneous region is extracted by computing the left and right shoulder thresholds of its graytone distribution from the measured standard deviations and graytone mean of the region. The corresponding thresholds for a non-homogeneous region are computed from estimates of the standard deviations and mean of its background. Pixels are then classified as darker than background, background, or brighter than background. The second phase of the method focuses background extraction refinements in non-homogeneous regions. The statistics of the background classified pixels in each non-homogeneous region are measured. If the set of background classified pixels in a region exhibits homogeneity, background extraction is ameliorated by computing thresholds from the new background statistics. The homogeneity criteria is tightened as the number of background classified pixels in a region decreases. If the set of background classified pixels in a region is non-homogeneous it suggests pixel misclassifications. The final background statistics of the region are then estimated by comparing background statistics measured in two successive trials, and from whether the set of pixels is left or right non-homogeneous, or both. This heuristical approach was implemented by studying non-homogeneous regions in a set of industrial images of moderate complexity. Rules that predict the final background extraction were derived by observing the behavior of successive background statistical measurements in the regions under the presence of dark and/or bright object pixels. Results indicate a significant reduction of background clutter in the industrial scenes. Good results have also been obtained in outdoor scenes of moderate complexity.
Most of the successful image understanding systems work because the domain is sufficiently restricted. Although variables such as the image formation process, objects in the domain and the interpretation tasks are explicitly constrained, there are many implicit constraints in the image processing steps. It can be difficult to generalize from these exemplary image understanding systems since we do not know how general this choice of steps is or if they were chosen since they "work the best." To generalize to other image understanding tasks and domains requires an explicit understanding of why the operators were chosen and their performance. Although a suitable segmentation is required for computing an accurate interpretation, no one set of low level operators will work in all image domains. A more promising approach is to develop a segmentation plan generator which uses the application goal as well as intrinsic and domain-specific knowledge to guide the segmentation processes. This paper demonstrates an intermediate level vision system which can generate an initial segmentation plan and then refines the plan to produce an optimal segmentation (for the desired goal) of the scene. The plan organizes the domain-independent and domain-dependent knowledge in a systematic way that makes the system suitable for a large and varied set of domains. This paper describes the planning approach to segmentation, an outline of the steps involved and results of the first experiments using the plan generator to segment panoramic dental radiographs.
This paper discusses a method for edge linking using an ellipsoidal clustering technique. The ellipsoidal clustering technique assumes that each data point is an ellipsoid with a mean and covariance matrix and generates a decision tree which partitions the sample ellipsoids into clusters. The problem of edge linking can be visualized as a clustering process. By assuming the properties of each edge pixel to be components of the data vector, pixels having similar properties are clustered together and pixels in the same cluster are linked together. The edge data is obtained using the facet model based edge detector and the calculation of the property vectors and the covariance matrices of the edge pixels is also computed from the facet edge detector output. The performance of the clustering algorithm is evaluated by computing the average clustering error and the relationships between the clustering threshold, the noise level and the clustering error are outlined.
We propose a uniform processing framework for low-level vision computing in which a bank of spatial filters maps the image intensity structure at each pixel into an abstract feature space. Some properties of the filters and the feature space will be described. Local orientation is measured by a vector sum in the feature space as follows: each filter's preferred orientation along with the strength of the filter's output determine the orientation and the length of a vector in the feature space; the vectors for all filters are summed to yield a resultant vector for a particular pixel and scale. The orientation of the resultant vector indicates the local orientation, and the magnitude of the vector indicates the strength of the local orientation preference. Limitations of the vector sum method will be discussed. Our investigations show that the processing framework provides a useful, redundant representation of image structure across orientation and scale.
Thinning is an image processing procedure that extracts the medial axes, or skeletons, of objects in a binary image. Because of the iterative pixel-removing strategy used, most existing thinning algorithms are either inefficient (sequential algorithms) or need special hardware (parallel algorithms). Furthermore, for line-shaped objects, the line intersections produced by these algorithms tend to be elongated. A new line thinning and intersection detection approach is presented in this paper that deals with images in which objects are lines (curves). It uses run-length representation for the lines in the image. A histogram of run length is consulted to identify runs that correspond to line cross sections. The mid-points of the selected runs are used to form the skeletons. Line intersections are detected at locations where the sequences of runs merge or split. This approach is non-iterative with a time complexity linear to the size of the image.
A 2-D local operator is described for computing the local curvature of intensity isocontours in a digital image. The operator directly estimates the average local curvature of the isointensity contours, and does not require the explicit detection of edges. In a manner similar to the Hueckel operator, a series of 2D basis functions defined over a circular local neighborhood extract a set of coefficients from the image at each point of investigation. These coefficients describe an approximation to a circular arc assumed to pass through the neighborhood center, and the curvature is taken as the inverse of the estimated arc radius. The optimal set of basis functions for approximating this particular target pattern is shown to be the Fourier series. Discretization of the continuous basis functions can create anisotropy problems for the local operator; however, these problems can be overcome either by using a set of correction functions, or by choosing a discrete function which closely approximates the circular neighborhood. The method is validated using known geometric shapes and is shown to be accurate in estimating both curvature and the orientation of the isocontours. When applied to a test image the curvature operator provides regional curvature measurements compatible with visible edges in the image.
In this paper we present a new framework for discrete black and white images that employs only integer arithmetic. This framework is shown to retain the essential characteristics of the framework for Euclidean images. We propose two norms and based on them, the permissible geometric operations on images are defined. The basic invariants of our geometry are line images, structure of image and the corresponding local property of strong attachment of pixels. The permissible operations also preserve the 3x3 neighborhoods, area, and perpendicularity. The structure, patterns, and the inter-pattern gaps in a discrete image are shown to be conserved by the magnification and contraction process. Our notions of approximate congruence, similarity and symmetry are similar, in character, to the corresponding notions, for Euclidean images . We mention two discrete pattern recognition algorithms that work purely with integers, and which fit into our framework. Their performance has been shown to be at par with the performance of traditional geometric schemes. Also, all the undesired effects of finite length registers in fixed point arithmetic that plague traditional algorithms, are non-existent in this family of algorithms.
The overall goal of our research is to build a vision learning system which can learn to classify objects from 2-D contour information. The visual representation method for such a vision learning system, called the hierarchical local symmetry (IILS), will be discussed in this paper. The definition and algorithms of smoothed local symmetry (SLS) is reviewed, which was introduced by Brady as a method satisfying stability versus sensitivity criteria of visual representation method. In this paper, HLS, modified SLS, are formalized and a new algorithm to compute the HLS is described. HLS eliminates some redundant information in the SLS and gives us hierarchical information. It also makes it possible to devise more efficient algorithms than that of SLS. Normalized polar coordinate representation (NITII) is used to store the computed HLS with translation, scale, and rotation invariance. Transforming the HLS into the NPCR where the learning process can be performed is also discussed.
An interactive system for computing terrain elevation maps and synthetic views of planetary scenes from a single panchromatic image is described. The system, implemented on an 8192 processor CM-2 Connection Machine, can generate an alternative view from an original (512 by 512) image in about 20 seconds. The system uses a shape-from-shading algorithm based on a numerical integration approach for computing relative elevations from the image and an oblique parallel projection/hidden surface removal algorithm for generating synthetic renditions of the scene. Both of these algorithms are implemented using scans and execute in constant time for a given image size. Results from Mars using Viking Orbiter imagery are presented.
The interpretation of a 2D line drawing as a 3D scene is an important area of study within the fields of artificial intelligence and machine vision. In the area of CAD/CAM, research has focused on the reconstruction of a 3D solid from its engineering drawings, either with two views or three views, or from its wireframe representation. We have been working on the problem of automatically reconstructing a 3D solid object's Constructive Solid Geometry (CSG) representation from a single 2D line drawing of the object. This paper describes our approach as well as some preliminary results. We validate our approach on a restricted set of objects consisting of simple rectilinear polyhedra. Using the Huffman-Clowes labeling scheme we are able to successfully identify the primitive blocks necessary for the CSG tree generation, as well as the set operations that must be applied to them. Extension to general polyhedra is also discussed.
We describe a technique to classify surface types from range data using local derivative estimates. We propose an optical architecture using acousto-optic devices to efficiently compute these derivatives. The derivative estimates are combined into curvature functions which are scale-, translation-, and rotation-invariant, and the surface types are determined from these curvature features. Results are presented for the classification of test range surfaces.
We have been developing a framework for the visual representation of three-dimensional free-form curved surfaces based on a special class of surface curves which we call the surface structure curves. By analyzing their properties, we attempt to construct a basis for describing the topographical structures of curved surfaces that leads to a global description of the surface geometry. Surface structure curves are a set of surface curves defined by using viewpoint-invariant features - surface curvatures (and their gradients and asymptotes), from differential geometry. From the surface structure curves, surface sketches by means of the topographical structures of ridges lines, valley lines, and enclosing boundaries of bumps and dents can be inferred. In this paper, we propose a viewpoint invariant representation scheme that provides a smooth surface sketch which can be used as a natural parameterization of free-form curved surfaces. We define three types of surface structure points and five types of surface structure curves in terms of zero-crossings, asymptotes and gradients of the Gaussian and mean curvatures. We discuss their properties and functions in edge-based segmentation and description of free-form curved surfaces. Some examples of surface sketches by the surface structure curves are shown.
The subject of our research is on the 3D shape representation problem for a special class of range image, one where the natural mode of the acquired range data is in the form of equidistance contours, as exemplified by a moire interferometry range system. In this paper we present a novel surface curvature computation scheme that directly computes the surface curvatures (the principal curvatures, Gaussian curvature and mean curvature) from the equidistance contours without any explicit computations or implicit estimates of partial derivatives. We show how the special nature of the equidistance contours, specifically, the dense information of the surface curves in the 2D contour plane, turns into an advantage for the computation of the surface curvatures. The approach is based on using simple geometric construction to obtain the normal sections and the normal curvatures. This method is general and can be extended to any dense range image data. We show in details how this computation is formulated and give an analysis on the error bounds of the computation steps showing that the method is stable. Computation results on real equidistance range contours are also shown.
This paper presents a new, robust and massively parallel algorithm for determining three-dimensional structure of a scene. It is based on analyzing the images that result from a straight line camera motion. In order to simplify the computations, we use spherical angles to represent pixels in the image (in place of x-y). In this frame every moving point in the image can be easily processed independently of any other point. We show how to reconstruct 31) geometry by an integration operation (no differentiation operator is involved). Preliminary results for the case where the optical axis is parallel to the motion axis show an accuracy less than 0.6% in absolute distance.
Knowledge of depth aids greatly in the object recognition task. It is preferable to be able to calculate depth without requiring multiple views, or special light sources. This paper presents a method of calculating depth from a single image in the case where the object is symmetrical about a plane. The algorithm requires the identification of symmetrical locations in the image, thus for each point at which depth is to be computed the corresponding symmetrical point must be visible in the image. The distance for which the camera is focused must also be known. The implementation of this algorithm is discussed and reconstructed shapes computed from real images are presented.
Consider a binocular stereo system observing a textured surface patch that is oriented in depth. Due to the geometry of the situation, a given texture element on the surface will appear with differing orientations in the binocular projections. The differential image orientation of corresponding elements in binocular projections is referred to as orientational disparity. An analysis of stereoscopic projection is presented that explicitly relates three-dimensional surface orientation to orientational disparity. This analysis allows for the specification of relations for recovering three-dimensional surface orientation from local measures of binocular stereo disparity.
Pose estimation is an important operation for many robotic tasks. In this paper, we propose a new algorithm of pose estimation. The input to this algorithm are the six distances joining all feature pairs and the image coordinates of the quadrangular target. The output of this algorithm are (1) the effective focal-length of the vision system, (2) the interior orientation parameters of the target, (3) the exterior orientation parameters of the camera with respect to an arbitrary coordinate system if the coordinates of the target are known in this frame, and (4) the final pose of the camera. The contribution of this method is the fast recovery of the vectors joining the effective focal-point and each of the target points using an all-geometric close form unique solution. Taking advantage of all the geometric information inherent in the target and its image, each of these vectors is recovered in six different ways. This redundancy is exploited in order to minimize the effect of random errors in the target sizing or in the recovery of its image coordinates. Knowing the relative position of the vision system frame with respect to a fixed coordinate system, the exterior orientation parameters are recovered in the form of a matrix transformation relating the fixed coordinate system to the target coordinate system. The decomposition of the latter matrix transformation into a translation and three rotations about the major axes provides the final pose of the camera.
The "Tulip" is a modified Munsell Color Space in which equal hue spacing is converted to variable hue spacing, reflecting the differential sensitivity to hue as a function of value, for a fixed chroma. Number of discernible hues, when plotted on a hue-value plane, results in the proposed tulip shape, with curved lines delineating the boundaries between hues. By means of a signal detection experiment, the tulip for yellow-green and for blue is determined. It is shown that more distinct hues of yellow-green are discernible at a high value than at low value. Conversely, for blue, more distinct hues are discernible at low value than at high value.
An automated scheme for the detection of chromosome aberrations in color chromosome images is described. The analysis scheme consists of three steps: segmentation, clustering, and scene understanding. First the target chromosome pixels are segmented via thresholding based on a chosen color measure. Then a clustering technique is applied to cluster the target chromosome pixels into groups in such a way that every group corresponds to a unique target chromosome domain. Finally, human chromosome aberrations are detected by calculating the geometrical properties of each detected group and counting the number of the confirmed target chromosomes. Experiments have been carried out to compare the effectiveness of several color measures for the purpose of the segmentation. Moreover, a novel self-tuning thresholding method has been developed to improve the robustness of segmentation. With this method, chromosome aberrations can be idetified even under different background brightness and chrominance distribution.
A method is proposed for recognizing and locating previously modelled objects from range images. The objective was to develop a simple, robust, working recognition method for scenes of low or moderate complexity. Such a method is needed in many applications, e.g. fully automatic shape inspection or robotic manipulation tasks. In the method proposed here the objects are described by simple 3D vector patterns formed using certain special points on 3D edge segments. Matching is based on a hypothesis and verification approach. The performance of the method is evaluated.
Three-dimensional reconstruction method of a straight homogeneous generalized cylinder model using "axis-based stereo" , and contour line segmentation method are described. To achieve the axis-based stereo matching, stereo contour images are used. In each contour image, a pair of contour line segments are assumed to be the extremal contour of a cylinder and are interpreted as a "ribbon". A pair of ribbons over the stereo contour images are interpreted as a "cylinder". The cylinder's axis is determined by the stereo match of two ribbon's axes in space. For the line segmentation, an interval tree structure is built taking feature points as the curvature extrema using Scale-space analysis, and splitting the feature points' interval recursively to satisfy a line regularity. As a result of the line segmentation and reconstruction, actual human arms are recovered and recognized by the prototype vision system.
A generalized version of the Hough transform, called the 3-D Generalized Hough Transform (3-D GHT), provides an effective technique for the identification and location determination of objects in three dimensional Euclidean space. This technique is suitable for multiple object identification with occlusion; the surfaces of the objects of interest may have arbitrary complexity. The 3-D GHT is defined with respect to a 21/2-D image in which the surface normal has been determined for each pixel. As with the 2-D generalized Hough transform (2-D GHT), each active pixel is mapped into a set of locations in an accumulator array by means of a table defined for a given object. Peaks in this array correspond to possible locations for the transform defining object. A figure of merit is computed for each significant peak based on its local neighborhood in accumulator space in order to normalize with respect to noise and interference from other objects. Peak merit values from different object arrays for a given location are compared to determine the object type. The 3-D GHT has been tested with a number of synthetic range images to which various amounts of Gaussian noise had been added. The results indicate that the 3-D GHT generates less false peaks than the 2-D GHT. Furthermore, the 3-D GHT can distinguish between some objects that appear identical to the 2-D GHT due to the use of range information. The 3-D GHT has also been shown to be effective in multiple object images in which objects occluded each other.
The computation and interpretation of visible parts of objets in a scene is one the most important tasks in imagery. Hidden part removal algorithms are often used to solve this problem in synthetic imagery, but these algorithms are not adapted to recognition tasks because they do not provide a representation of visibility and (its dual) occlusion. In this paper we propose to take into account the representation of occlusion by a high-level description of tridimensional scenes. The idea is that the knowledge of a view point introduces occluding relations between objects in the scene. We use a graph based approach, and we define an "occluding graph" for a viewpoint position. The use of this graph simplifies applications like hidden part removal, image interpretation, and obstacle avoidance.
During last years, Computer Vision has developed algorithms for most of early vision processes. It is a common idea that each vision process seaparatly cannot supply a reliable descritpion of the scene. In fact, one of the keys in reliability and robustness of biological systems is their ability to integrate information from different early processes. The base concept of our vision system is to integrate information from stereo and shading (Fig.1). The results obtained from this scheme in previous works are very interesting and suggest us to continue on this methodology. In the first work 1.2 the base approach to integration scheme was presented. The work deals on general concepts and main evolutions on shading analysis, in terms of analysis simplifications and improved accuracy. The scheme was tested on both synthetical and real scenes.
In this paper, processing of images of projected structured light is applied to define a 3-D unstructured scene. The structured light, viewed by a stereoscopically located, single video camera, results in an image in which the regional 3-D characteristics of the scene are represented by corresponding distortion of the structure of the projected light. Image points of the structured light can be mapped to the 3-D location of their corresponding scene points. These points can then be processed to yield a planar fit of the region of interest in the scene. An illustrative example is presented. The scene definition capabilities of this method can provide essential input to an autonomous navigation system for an ofd road or planetary exploration vehicle.
An adaptive stereoscopic vision system has been examined for the task of industrial inspection. With fixed optical setups and working under large defocusing the camera system is simple and robust in construction and reliable in working accuracy. Applied in combination with an adaptive control of the industrial robot industrial parts with various forms and sizes can be inspected under demanded accuracy. As a basis the characteristics of the camera system have been investigated by different optical setups and measuring distances. The calibrations show a good conformance of the camera with its mathematical model under a measuring distance from 325 to 825 mm. The achievable accuracy of the stereoscopic vision system under different camera arrangements is better than 0.5% through a measuring distance of 353 to 1278 mm.
A three unit artificial neural network (ANN) automatic target recognition (ATR) system is integrated within, and compared to, a recently AFIT developed conventional ATR system. The integration of ANN within this existing framework allows the determination of where the benefits of using these biologically motivated processing techniques lie. The integration and testing of ANN within each of the three units constitutes the major contribution of this research. The emphasis of this paper is in the area of effects of learning alternatives on ATR. Several alternative feedforward networks were compared in the classifier unit.
This paper reports on an object recognition system that combines a neural network global approach with assistance from local features. The Relevant Feature Technique uses a global classifier to determine a characteristic class and uses the local relevant features of that class to improve the recognition of the visual object. Predominantly local features are difficult to utilize in a neural network environment because they are local, and may not be considered significant to the globally sensitive neural network. In the technique shown here, locally relevant features are used to influence and constrain global recognition process.
A model-based object recognition technique is introduced in this paper to identify and locate an object in any position and orientation. The test scenes could consist of an isolated object or several partially overlapping objects. A cooperative feature matching technique is proposed which is implemented by a Hopfield neural network. The proposed matching technique uses the parallelism of the neural network to globally match all the objects (they may be overlapping or touching) in the input scene against all the object models in the model-database at the same time. For each model, distinct features such as curvature points (corners) are extracted and a graph consisting of a number of nodes connected by arcs is constructed. Each node in the graph represents a feature which has a numerical feature value and is connected to other nodes by an arc representing the relationship or compatibility between them. Object recognition is formulated as matching a global model graph, representing all the object models, with an input scene graph representing a single object or several overlapping objects. A 2-dimensional Hopfield binary neural network is implemented to perform a subgraph isomorphism to obtain the optimal compatible matching features between the two graphs. The synaptic interconnection weights between neurons are designed such that matched features belonging to the same model receive excitatory supports, and matched features belonging to different models receive an inhibitory support or a mutual support depending on whether the input scene is an isolated object or several overlapping objects. The coordinate transformation for mapping each pair of matched nodes from the model onto the input scene is calculated, followed by a simple clustering technique to eliminate any false matches. The orientation and the position of objects in the scene are then calculated by averaging the transformation of correct matched nodes. Some simulation results are shown to illustrate the performance of the system for scenes containing an isolated object or several overlapping objects. Finally the performance of the proposed technique is compared with that of a relaxation technique.
A method for optical flow estimation from an image sequence using a neural network is presented. Under hypothesis based on local rigidity, translational motion and smoothness constraints, a neural network is designed to estimate the optical flow. Experimental results using real world I.R. images are presented to demonstrate the efficiency of this method compared to Horn and Schunck algorithm.
A data-based model of associative memory is described which uses statistical inference techniques to estimate an output response from a set of inputs and a database of previously stored patterns. The model is easily scaled in terms of the number of patterns that can be stored in the database as well as the number of fields in a pattern. Other features include the ability to change the input and output fields, to adjust the amount of generalization performed by the associative memory, and to control the size of the database by pruning redundant or conflicting patterns. Applications of associative memories to a wide variety of problems are illustrated to motivate their use as general system building blocks. Implementations in hardware and software are discussed.
The fundamental problem faced by all vision systems is the ambiguity created by the projection process. An object's projected shape in an image changes dramatically for small changes in the observer's viewpoint. This is the basic difficulty in creating a machine vision system that can respond robustly in an unconstrained 3D environment. Our approach to this problem enables the vision system to actively engage its interpretation of the surroundings using a distributed memory system modeled as visual potentials and made up of characteristic view(point)s (Koenderink and van Doorn, 1979).
In addition to invariance with respect to certain geometric transformations, there are two other key requirements for any shape recognition system. It should be flexible enough to adopt to a variety of sets of shapes with minimal training; and it should be capable of performing even in presence of occlusion. This paper describes one such shape recognition system that is currently under development. The system is based on the redundant hashing scheme of Kohonen for recognizing and correcting misspelt words. The current version of the system is meant for 2-D shapes; however, the same approach is applicable to 3-D shape recognition. In the present implementation for 2-D shapes, a polygonal approximation of the given shape is encoded in the form of a string. The encoded string is then used to generate a very small set of shape hypotheses through the use of redundant hashing. The best hypothesis from the set of competing hypotheses is selected by a very simple matching scheme followed by a verification phase based on rotation transformation. The experiments thus far indicate that the system is capable of recognizing shapes in presence of occlusion with about 5% error. The system has significant ability to adopt to new sets of shapes; it does not require any training except the building of hash index table and the corresponding shape dictionary.
An algorithm has been developed which allows a simple optoelectronic architecture to have capabil-ities of matching a test image (1-D or 2-D) with stored images. It is shown that if a test image and a complementary stored image together with complementary test image and stored image are compared optically, the emerging light intensity is minimum for the best match. The suggested architecture then carries out efficient parallel comparison, and points out the best matched test image using a TV screen and transparencies. A time varying light intensity (or a time varying thresholding voltage) source and a device which switches its state when light input falls below a threshold value like a night-light are two ingredients, which are used to select the best match in parallel. The architecture is also ideally suited to finding the closeness of match of two images for quality control type operations.
Multitarget tracking over consecutive pairs of time frames is accomplished with a neural net. This involves position and velocity measurements of the targets and a quadratic neural energy function. Simulation data are presented, and an optical implementation is discussed.
This work further develops a neural network model of motion segmentation by visual cortex that was outlined in Grossberg (1987). We illustrate the model's properties by simulating on the computer data concerning group and element apparent notion, including the tendency for group notion to occur at longer ISIS and under conditions of short visual persistence. These phenomena challenge recent vision models because the switch between group and element motion is determined by changing temporal but not spatial display properties. The model clarifies the dependence of short-range and long-range motion on spatial scale. Its design specifies how oriented (x cell) and unoriented (y cell) detectors cooperate and compete in successive processing stages to generate notion signals that are sensitive to direction-of-notion, yet insensitive to direction-of-contrast. Tereus displays and Burt and Sperling displays generate appropriate motion signals in the circuit. Apparent motion and real motion generate testably different model properties. The model also clarifies how notion after-effects may be generated and how preprocessing of motion signals is joined to long-range cooperative notion mechanisms to control phenomena such as induced motion and notion capture. The total model systems is a motion Boundary Contour Systems (BCS) that is computed in parallel with the static BCS of Grossberg and Mingolla before both systems cooperate to generate a boundary representation for 3-D visual form perception.
Using biology as a basis for the development of sensors, devices and computer vision systems is a challenge to systems and vision scientists. It is also a field of promising research for engineering applications. Biological sensory systems, such as vision, touch and hearing, sense different physical phenomena from our environment, yet they possess some common mathematical functions. These mathematical functions are cast into the neural layers which are distributed throughout our sensory regions, sensory information transmission channels and in the cortex, the centre of perception. In this paper, we are concerned with the study of the biological vision system and the emulation of some of its mathematical functions, both retinal and visual cortex, for the development of a robust computer vision system. This field of research is not only intriguing, but offers a great challenge to systems scientists in the development of functional algorithms. These functional algorithms can be generalized for further studies in such fields as signal processing, control systems and image processing. Our studies are heavily dependent on the the use of fuzzy - neural layers and generalized receptive fields. Building blocks of such neural layers and receptive fields may lead to the design of better sensors and better computer vision systems. It is hoped that these studies will lead to the development of better artificial vision systems with various applications to vision prosthesis for the blind, robotic vision, medical imaging, medical sensors, industrial automation, remote sensing, space stations and ocean exploration.
This paper presents a generalization of two earlier studies that treat efficient coding and redundancy reduction in the retina. The analysis combines the predictive coding method applied by Srinivasan, Laughlin and Dubs in the spatial domain with the transform coding method applied by Buchsbaum and Gottschalk in the color domain.1-2 The analysis predicts a family of hybrid spatio-chromatic channels paralleling spatio-chromatic receptive fields in the visual system. These channels constitute the visual system strategy to simultaneously and efficiently encode the spatial and chromatic dimensions of color images.
Very few techniques have been presented in the literature for shape representation that have combined both region and boundary information at multiple scales. We introduce such a novel technique in this paper. Our algorithm is an implementation of a two-dimensional dynamic grassfire that relies on a distance surface on which elastic contours minimize an energy functional. A Euclidean distance transform combined with the active contour model, referred to as a "snake", is used for this minimization process. Boundary information is integrated into the model by the extraction of curvature extrema and arcs of constant curvature. We do so by introducing a new technique based on operations derived from the field of mathematical morphology. The principal advantages of our new method compared with previous algorithms for shape description based on skeletonization are: implicit connectivity of the skeleton, smooth and accurate results, integration of region and boundary information, multiscale description and hierarchical representation in terms of feature significance. Furthermore, new possibilities are offered within the context of our method. For the first time, dynamic (deformable) skeletons are naturally defined. Also, our method easily permits user interaction which can be used for the generation and comparison of different types of skeletons. Finally, the graph representation of the skeleton is straightforward to obtain, a fundamental step for shape analysis. Our new method for shape skeletonization is the first to address these issues which are fundamental to the description of natural forms.
Color and spatial opponent signals are essential properties of the visual system. These properties are formulated at a very early stage in the so-called retinal outerplexiform layer. To elucidate the underlying neural mechanisms, we constructed a model of the horizontal cell layer. The model consists of red- and green-sensitive cones, L- and R/G-type horizontal cells. The light induced transmitter release from photoreceptors is mimicked by a third order linear system. The horizontal cell is expressed by Hodgkin-Huxley type equations in terms of relevant ionic currents. To make the horizontal cell layer, each horizontal cell is connected by gap junctional linear conductance. The main input to L- and R/G-type horizontal cells comes from red- and green-sensitive cones, respectively. The L-type horizontal cell has a negative feedback connection to photoreceptors. Computer simulation of a stimulus displacemnet produced dynamic charactersitics very similar to the experimental results in the L-type horizontal cell layer. However, the dynamic response of the R/G-type horizontal cell differed from the experimental data without a negative feedback from R/G-type horizontal cell to green-sensitive cone. This suggests that R/G-type horizontal cell may also have feedback synapse to the green-sensitive cone, although conclusive physiological evidence has not yet been found. We also simulated bipolar cell responses and confirmed that bipolar cells respond to the local contrast change by mediation of the surrounding effect from horizontal cells.
Biological mechanisms of nonuniform sampling and processing of visual signals, and of automatic gain control are discussed. It is argued that such mechanisms, which result in extensive data reduction and efficient allocation of limited computational resources, are attractive for wide field-of-view machine vision systems.
The proposed mechanism for designing a robust machine vision system is based on the dynamic activity generated by the various neural populations embedded in nervous tissue. It is postulated that a hierarchy of anatomically distinct tissue regions are involved in visual sensory information processing. Each region may be represented as a planar sheet of densely interconnected neural circuits. Spatially localized aggregates of these circuits represent collective neural assemblies. Four dynamically coupled neural populations are assumed to exist within each assembly. In this paper we present a state-variable model for a tissue sheet derived from empirical studies of population dynamics. Each population is modelled as a nonlinear second-order system. It is possible to emulate certain observed physiological and psychophysiological phenomena of biological vision by properly programming the interconnective gains . Important early visual phenomena such as temporal and spatial noise insensitivity, contrast sensitivity and edge enhancement will be discussed for a one-dimensional tissue model.
Fuzzy Logic has gained increased attention as a methodology for managing uncertainty in a rule-based structure. In a fuzzy logic inference system, more rules can fire at any given time than in a crisp expert system and since the propositions are modelled as possibility distributions, there is a considerable computation load on the inference engine. In this paper, two neural network structures are proposed as a means of performing fuzzy logic inference. In the first structure, the knowledge of the rule (i.e., the antecedent and consequent clauses) are explicitly encoded in the weights of the net, whereas the second network in trained by example. Both theoretical properties and simulation results of these structures are included.