In this paper, we discuss a statistical framework for multiscale signal and image processing based on a class of multiresolution stochastic models, which can be used to represent spatial random processes at a range of scales. The model class is quite rich, and in fact includes the class of Markov random fields. In addition, the models have a scale recursive structure which naturally leads to efficient, scale recursive algorithms for smoothing and likelihood calculation. We discuss an application of the framework to the problem of computing optical flow in image sequence, and demonstrate computational savings on the order of one to two orders of magnitude over standard algorithms.
Adapted wave form analysis, refers to a collection of FFT like adapted transform algorithms. Given a signal or an image these methods provide a special orthonormal basis relative to which the image is well represented. The selected basis functions are chosen inside predefined libraries of oscillatory localized functions (waveforms) so as to minimize the number of parameters needed to describe our object. These algorithms are of complexity N log N opening the door for a large range of applications in signal and image processing, such as compression, feature extraction and enhancement. Our goal is to describe and relate traditional Fourier methods to wavelet, wavelet-packet based algorithms by making explicit their relative role in analysis. Starting with a recent refinement of the windowed sine and cosine transforms we will derive an adapted local sine transform, show it''s relation to wavelet and wavelet- packet analysis and describe an analysis tool-kit illustrating the merits of different adaptive and non-adaptive schemes.
This paper presents a 3D multiresolution wavelet analysis that provides a tool for analyzing spatial details (e.g., horizontal and vertical edges) of moving objects contained in a sequence of images. Current multiresolution wavelet analysis theory is modified to create an orthonormal wavelet basis for L2(R3) by forming the tensor product of three non- identical multiresolution wavelet analyses on L2(R). An unconventional multiresolution decomposition and reconstruction algorithm is presented which provides a new tool for analyzing moving objects in a scene. Preliminary results demonstrate the new analysis technique''s potential for segmenting key characteristics of an object moving against stationary or moving backgrounds.
We summarize the properties of the auto-correlation functions of compactly supported wavelets, their connection to iterative interpolation schemes, and the use of these functions for multiresolution analysis of signals. We briefly describe properties of representations using dilations and translations of these autocorrelation functions (the auto-correlation shell) which permit multiresolution analysis of signals.
The ability to segment a textured image into separate regions (texture segmentation) continues to be a challenging problem in computer vision. Many texture-segmentation schemes are based on a filter-bank model, where the filters (henceforth referred to as Gabor Filters) are derived from Gabor elementary functions. The goal of these methods is to transform texture differences into detectable filter-output discontinuities at texture boundaries. Then, one can segment the image into differently textured regions. Distinct discontinuities occur, however, only if the parameters defining the Gabor filters are suitably chosen. Some previous analysis has shown how to design appropriate filters for discriminating simple textures. Designing filters for more general textures, though, has largely been done ad hoc. We have devised a new, more effective, more rigorously based method for determining Gabor-filter parameters. The method is based on an exhaustive, but efficient, search of Gabor-filter parameter space and on a detection-theory formulation of a Gabor filter''s output. We provide qualitative arguments and experimental results indicating that our new method is more effective than other methods in producing suitable filter parameters. We demonstrate that our model also gives good filter designs for a variety of texture types.
This paper presents the extraction of depth data from stereo image pairs using a nontraditional stereo algorithm taken from computational neuroscience. The technique is based on the workings of the mammalian visual system, using the Gabor representation of an image to mimic the filtering properties of simple and complex cells in the visual cortex. Gabor- transformed images afford an alternate stereo correlation method that, though computationally intensive, is well-suited for solution in parallel. This implementation computes the Gabor transform of input images by sampling at four distinct frequencies and computing correlation at each frequency. We consider four methods of combining the resulting four correlation measures and present results of testing the algorithm on random dot and real image stereograms.
We consider the implementation of high capacity Ho-Kashyap (HK) associative processors (APs) on non-ideal optical and analog VLSI systems. Processor non-idealities considered include quantization, non-uniform beam illumination, and nonlinear device characteristics. New training-out techniques to overcome these non-idealities are advanced. We obtain optimal performance in the presence of stochastic noise by proper selection of the processor parameter (sigma) syn. We derive important results that allow us to a priori determine the optimal value of (sigma) syn and the expected recall accuracy P'c without having to simulate the specific processor. We present a new algorithm that allows us to achieve storage near the theoretical maximum capacity (2 N, where N is the dimensionality of the input vector) with excellent recall accuracy. Optical laboratory results are included. We achieved storage of 1.5 N with recall accuracy P'c >= 95% with input noise of standard deviation (sigma) 1 equals 0.02 present and with optical analog components with 5 bit input accuracy and 8 bit memory matrix accuracy. With higher accuracy analog VLSI components (10 bit input accuracy and 11 bit weight accuracy), we achieve storage of 1.75 N with P'c equals 96.43%.
Neural network models of the Hopfield type have drawn intensive attention in past years, however, the information capacity of the Hopfield model is limited. Theoretically, the number of arbitrary state vectors that can be made stable in a Hopfield network of N neurons is bounded above by N (Abu-Mostafa and Jacques theorem). In this paper, we study third order Hopfield neural networks.
Accurate estimates of traffic characteristics are essential for effective highway planning and management. This paper briefly describes the current approach, based on FHWA recommendations, to estimating these traffic characteristics and recommends an alternative approach that has the potential for improving the precision of the estimates and/or reducing the data collection efforts and their corresponding costs. The alternative recommendation is based on the concept of linear transfer functions for leading indicators to forecast estimates of the traffic characteristics.
Artificial neural networks (ANNs) are highly parallel, adaptive and fault tolerant dynamical systems modeled like their biological counterparts. Given a set of input-output patterns, ANNs can learn to classify these patterns by optimizing the weights connecting the nodes of the networks. The backpropagation (BP) algorithm using the gradient search technique has been a popular method for training artificial neural networks. However, the BP method, in which each step size is fixed at an arbitrary value, frequently experiences cycling and often falls in a local minimum instead of finding the desired global minimum of the error function. In this paper, an ANN utilizing an adaptive step size algorithm based on random search techniques is proposed to solve the inverse kinematic problem in robotics. This procedure assures monotonic convergence by adjusting the step size on each step based on the net gradient and the direction of the steepest descent. The results of the computer simulation for the improved adaptive method show a much better convergence rate and robustness than the BP method. This improvement can minimize the burden of real time processing load for robot control by reducing the costly derivation and computationally intense programming of the inverse kinematic algorithm.
In this paper we present a new architecture of neuron, called the dynamic neural unit (DNU). The topology of the proposed neuronal model embodies delay elements, feedforward and feedback signals weighted by the synaptic weights and a time-varying nonlinear activation function, and is thus different from the conventionally and assumed architecture of neurons. The learning algorithm for the proposed neuronal structure and the corresponding implementation scheme are presented. A multi-stage dynamic neural network is developed using the DNU as the basic processing element. The performance evaluation of the dynamic neural network is presented for nonlinear dynamic systems under various situations. The capabilities of the proposed neural network model not only account for the learning and control actions emulating some of the biological control functions, but also provide a promising parallel-distributed intelligent control scheme for large-scale complex dynamic systems.
The superb aereal performance of flying insects is achieved with comparatively simple neural machinery. Insects react rapidly to changing visual images. The abilities of insects to perform these computations in real time has already led to a successful prototype autonomous guided vehicle with a sensor and control structure modelled on the fly eye. Increasingly in visual neuroscience it is possible to isolate the critical image cues used by identified neurones to achieve a selective response to a feature or group of features within the changing visual image. In this paper we describe a biological neural network based on the input organization of such an identified motion detecting neurone, which responds selectively to the images of an object approaching on a collision course with the animal. We compare the response of the artificial neural network with the biological neural network in the same colliding stimulus. This approach led to a series of testable predictions about the organization of the biological neural network.
The bidirectional associative memory (BAM) so far is limited to two input/output patterns. Recently, Humpert has suggested the generalization of the BAM to a bidirectional associative memory with several input/output patterns (BAMg). The generalization of the BAM to the BAMg raises several interesting questions regarding the inter-connections and updating of neuron fields. Some of the possible configurations of the BAMg are investigated in the paper. The BAMg is very useful in many image processing applications where-in storage and retrieval of sets of images is required. Each set may contain two or more images. In this paper, we have developed a software simulation for the BAMg, and have used the BAMg to store and retrieve three sets of images. Each set consists of three images. During the retrieval partial or noisy images are used as the stimulus vectors to retrieve corresponding images in other sets. The results are presented in the paper.
The paper provides a discussion of the results derived from the theory of invariant higher- order neural networks to design a system which will produce an invariant classification solution for a particular pattern recognition problem. This is done by employing a generalized to higher-orders back-propagation algorithm with reduced network connectivity. In special case optimal solution is obtained using linear equation technique. In both cases the volume of computations in the algorithm is much less, than that of the other methods. We demonstrate that the system can correctly classify shifted, rotated, scaled and distorted patterns with a certain amount of noise.
Dynamic identification of temporally changing signals is a key issue in real-time signal processing and understanding. Such changing signals may arise from moving objects in visual images, spoken words, target trajectories and other kinds of sensor data in a wide variety of applications. An Adaptive Time-Delay Neural Network (ATNN) is proposed, which dynamically adapts its time-delays as well as its synaptic weights. The resulting network is trained to distinguish the temporal properties and spatiotemporal correlations of various input patterns. In biological systems, the delays along axons or at the synapses may vary, like in the ATNN, due to factors such as the length of the axon, insulation (myelin), or the details of the biochemical processes. In this paper, an improved learning algorithm based on gradient descent is derived, both for adaptive time-delays and synaptic strengths. This adaptation paradigm offers more flexibility for the network to attain the optimal time-delays and to achieve more accurate pattern mapping and classification than is the case of using arbitrary fixed delays, as has been done previously. Noise tolerance was tested on a series of experiments, and it is found that the proposed ATNN shows advantages. Time series prediction was tested with the chaotic Mackey-Glass equation, and the ATNN performed better than training with fixed time delays. The ATNN is suitable for spatiotemporal signal recognition, prediction and classification.
Different approaches to computational stereo to represent human stereo vision have been developed over the past two decades. The Marr-Poggio theory of human stereo vision is probably the most widely accepted model of the human stereo vision. However, recently developed motion stereo models which use a sequence of images taken by either a moving camera or a moving object provide an alternative method of achieving multi-resolution matching without the use of Laplacian of Gaussian operators. While using image sequences, the baseline between two camera positions for a image pair is changed for the subsequent image pair so as to achieve different resolution for each image pair. Having different baselines also avoids the inherent occlusion problem in stereo vision models. The advantage of using multi-resolution images acquired by camera positioned at different baselines over those acquired by LOG operators is that one does not have to encounter spurious edges often created by zero-crossings in the LOG operated images. Therefore in designing a computer vision system, a motion stereo model is more appropriate than a stereo vision model. However, in some applications where only a stereo pair of images are available, recovery of 3D surfaces of natural scenes are possible in a computationally efficient manner by using cepstrum matching and regularization techniques. Section 2 of this paper describes a motion stereo model using multi-scale cepstrum matching for the detection of disparity between image pairs in a sequence of images and subsequent recovery of 3D surfaces from depth-map obtained by a non convergent triangulation technique. Section 3 presents a 3D surface recovery technique from a stereo pair using cepstrum matching for disparity detection and cubic B-splines for surface smoothing. Section 4 contains the results of 3D surface recovery using both of the techniques mentioned above. Section 5 discusses the merit of 2D cepstrum matching and cubic B-spline interpolation for 3D surface recovery either by motion stereo model or stereo vision model implemented in a machine vision system.
A recent paper by Sinha and Dougherty established a rigorous morphological framework for images with fuzzy valued pixels. In particular, the erosion operation, which is fundamental to image filtration schemes, is cast in this setting. In this paper, a single instruction architecture will be presented for implementing the erosion operation for fuzzy valued images. It will also be shown that this architecture is sufficient for implementing the other basic fuzzy morphologic operations namely dilation, opening and closing.
Populations of cortical nerve cells that selectively learn from external stimuli are described in this paper. Numerous neural populations are interconnected within a spatially distributed neural activity field. Each population is assumed to possess a multi-modal distribution of neural thresholds which enable it to exhibit one or more state attractors for any given stimulus input. Each stable attractor represents a potential memory. The memory function of the field corresponds to the numerous attractors, or potential memories, generated after the removal of the external stimulus pattern. Massive numbers of attractors are inherent in the field of the onset of learning. The selective learning process involves enlarging the basin around the attractor selected by a given stimulus. A computer simulation involving three sets of stimuli is used to illustrate some of these notions.
In this paper, the single instruction architecture is used to construct circuitry to perform dilation and erosion of gray valued images, where the gray values are discrete but limited only by the number of bits chosen for the binary encoding. In addition, methods for minimizing the number of cells needed, using basic digital techniques, are discussed. While others have constructed architectures for gray valued dilation and erosion, these are based on non- homogeneous circuits, and typically use Umbra transformations to handle the gray values, rather than binary encoding. Finally, it is shown that the half-adder elements used in the single instruction architecture can easily be replaced with uniform multiplexer cells in deference to the McCulloch-Pitts model of the neuron. This analogy between the single instruction architecture and the neuronal construction of the brain is intentional.
In this paper, we present a unsupervised orthogonalization neural network, which, based on Principal Component (PC) analysis, acts as an orthonormal feature detector and decorrelation network. As in the PC analysis, this network involves extracting the most heavily information- loaded features that contained in the set of input training patterns. The network self-organizes its weight vectors so that they converge to a set of orthonormal weight vectors that span the eigenspace of the correlation matrix in the input patterns. Therefore, the network is applicable to practical image transmission problems for exploiting the natural redundancy that exists in most images and for preserving the quality of the compressed-decompressed image. We have applied the proposed neural model to the problem of image compression for visual communications. Simulation results have shown that the proposed neural model provides a high compression ratio and yields excellent perceptual visual quality of the reconstructed images, and a small mean square error. Generalization performance and convergence speed are also investigated.
This paper presents a novel approach for template recognition by signal expansion into a set of non-orthogonal template-similar basis functions (wavelets). It is shown that expansion matching is a special case of the general non-orthogonal expansion which is equivalent to 'restoration' of undegraded images. Expansion matching also maximizes a new and more practically defined Discriminative signal-to-noise ratio (DSNR). It is proved that maximizing the DSNR is equivalent to minimum squared error restoration by Wiener filters. The widely used matched filtering (also known as correlation matching) maximizes the conventional SNR and generates broad peaks since the SNR imposes no constraint on the sharpness of the filter response to the template itself. In comparison, maximizing our newly defined DSNR ensures that expansion matching yields much sharper peaks, with the ideal response quested being a delta (impulse) function. Furthermore, it is demonstrated that expansion matching outperforms correlation matching by more than 20 dB DSNR. This results in much less spurious responses and a more robust performance in noise and severe occlusion. Since expansion matching is fundamentally a decomposition process, it is also quire suitable for the analysis of superimposed signals, such as sound and radar signals. The filters generated by expansion matching have a multi-pole structure with amplitudes approximately proportional to the curvature of edges in the template. Analytically, they confirm previous conjectures about shape perception.
A goal of computer vision is the construction of scene descriptions based on information extracted from one or more 2D images. A reconstruction strategy based on a three-level representational framework is proposed. The first representational level, the Primal Sketch, makes explicit important information about the two-dimensional image, primarily the intensity changes and their geometrical distribution and organization. The intensity changes appear at several spatial scales and image analysis performed at multiple resolutions is therefore required. We propose a compact pyramidal neural network implementation of the multiresolution representation of the input images. Features of the scene are detected at each resolution level and feedback interaction is built between pyramid levels in order to reinforce edges which correspond to physical features of the observed scene. The second representational level, the raw 2.5 D Sketch, makes explicit the orientation and rough depth at the edge location of the visible surfaces. A multiresolution neural network stereo algorithm is designed to compute the disparity at each pixel location and at all the resolution levels. Matching is facilitated by a hierarchical focussing mechanism. The third representational level, the full 2.5 D Sketch, makes explicit the orientation and depth estimate at all the visible surface coordinates. Depth information between the edges is computed with a local shape- from-shading algorithm.
The neural architecture, neurophysiology and behavioral abilities of insect vision are described, and compared with that of mammals. Insects have a hardwired neural architecture of highly differentiated neurons, quite different from the cerebral cortex, yet their behavioral abilities are in important respects similar to those of mammals. These observations challenge the view that the key to the power of biological neural computation is distributed processing by a plastic, highly interconnected, network of individually undifferentiated and unreliable neurons that has been a dominant picture of biological computation since Pitts and McCulloch's seminal work in the 1940's.
This paper describes a method for multiresolutional representation of gray-level images as hierarchial sets of strokes characterizing forms of objects with different degrees of generalization depending on the context of the image. This method transforms the original image into a hierarchical graph which allows for efficient coding in order to store, retrieve, and recognize the image. The method which is described is based upon finding the resolution levels for each image which minimizes the computations required. This becomes possible because of the use of a special image representation technique called Multiresolutional Attentional Representation for Recognition, based upon a feature which the authors call a stroke. This feature turns out to be efficient in the process of finding the appropriate system of resolutions and construction of the relational graph. Multiresolutional Attentional Representation for Recognition (MARR) is formed by a multi-layer neural network with recurrent inhibitory connections between neurons, the receptive fields of which are selectively tuned to detect the orientation of local contrasts in parts of the image with appropriate degree of generalization. This method simulates the 'coarse-to-fine' algorithm which an artist usually uses, making at attentional sketch of real images. The method, algorithms, and neural network architecture in this system can be used in many machine-vision systems with AI properties; in particular, robotic vision. We expect that systems with MARR can become a component of intelligent control systems for autonomous robots. Their architectures are mostly multiresolutional and match well with the multiple resolutions of the MARR structure.
This paper discusses the relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. We also discuss the impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often lends itself to clustering algorithms. Then we present two generalizations of LVQ that are explicitly designed as clustering algorithms: we refer to these algorithms as generalized LVQ equals GLVQ; and fuzzy LVQ equals FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. We use Anderson's IRIS data to compare the performance of GLVQ/FLVQ with a standard version of LVQ. Experiments show that the final centroids produced by GLVQ are independent of node initialization and learning coefficients. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution--these are taken care of automatically.
Our innate ability to process and interpret large volumes of poorly defined visual data, in essence to perceive visual information, enables us to function effectively in a continually changing complex world. As knowledge engineers, it would be highly desirable to incorporate such flexibility into artificial systems. Fuzzy logic is a mathematical tool created to help synthesize complex systems and decision processes that must deal with imprecise or ambiguous information. In terms of vision, this ambiguity arises from the meanings attached to the sensor inputs and the rules used to describe the relationship between the various informative visual attributes. Notions that pertain to vision perception such as fuzzy images, fuzzy mathematical operators and fuzzy inference procedures are outlined in this paper.
The segmentation and representation of complex features in imagery is of great importance for automatic target recognition and human perception of image features. Multiresolution image representation paradigms are currently being pursued to provide a hierarchical framework for efficient analysis and compression of image data for machine recognition and human perception of image features. Grayscale mathematical morphology is playing an emerging role as a multiscale image representation operator. IN particular, grayscale morphology utilizes set operations, requiring only computer addition and comparison, with tractable shapes of varying size and orientation. For practical implementation, the nonlinear filters of mathematical morphology may be cascaded without precision loss by exploiting the finite characteristic of digital computers and avoiding the constricting and abstract Fourier space requirements of linear filtering techniques, such as wavelet theory. This paper presents morphological filter operations which use oriented line shapes or structuring elements in comparison with image data to extract and analyze line and textural features within a hierarchical framework and representation paradigm. A median line filter is defined by the maximum of oriented 1-D erosions, median erosion, or the minimum of oriented 1-D dilations, median dilation. A cascade of these operations to form an image pyramid or a morphological skeleton are investigated as plausible representation paradigms. Smoothing and differencing operations are exploited to provide multiscale complimentary feature representations similar to the wavelet base.
Identification of outliers or noise in a real data set is often quite difficult. A recently developed adaptive fuzzy leader clustering (AFLC) algorithm has been modified to separate the outliers from real data sets while finding the clusters within the data sets. The capability of this modified AFLC algorithm to identify the outliers in a number of real data sets indicates the potential strength of this algorithm in correct classification of noise real data.
This paper presents an unsupervised fuzzy neural network which can be used for clustering and classification of complex data sets. The Integrated Adaptive Fuzzy Clustering (IAFC) architecture uses a control structure similar to that found in the Adaptive Resonance Theory (ART-1) with a new learning rule and a new similarity measure. We compare IAFC with other fuzzy ART-type clustering algorithms. The critical parameters in the operation of the IAFC are discussed. The Anderson's iris data are used to show the performance of the algorithm in comparison with other clustering algorithms.
This paper addresses a solution to the problem of scene estimation of motion video data in the fuzzy set theoretic framework. Using fuzzy image feature extractors, a new algorithm is developed to compute the change of information in each of two successive frames to classify scenes. This classification process of raw input visual data can be used to establish structure for correlation. The algorithm attempts to fulfill the need for non-linear, frame-accurate access to video data for applications such as video editing and visual document archival/retrieval systems in multimedia environments.
During the past several years, fuzzy control has emerged as a suitable control strategy for many complex and nonlinear control problems. The control provided by fuzzy logic is both smooth and accurate. Also the 'if-then' rules of fuzzy control systems are easy to understand and relatively easy to develop. This paper presents a toolkit which is used in the implementation of fuzzy control system. The toolkit consists of C++ class library which computes inferences in fuzzy logic. The toolkit is used to implement a fuzzy control system which controls the movement of a simulated mobile robot. The proposed architecture consists of several rulesets. Each ruleset specializes in some control task, for example, there are rulesets for going around an obstacle, avoiding a moving obstacle, going through a door, etc. The multiple ruleset fuzzy control system is used to guide the simulated mobile robot to a given goal in an unknown environment. With the proposed multiple ruleset architecture complex control problems can be solved while single rulesets remain simple and efficient.
Techniques are presented for automatically generating optimal vision programs from high- level task descriptions. Vision programs are the object models that describe strategies to recognize and locate objects in an image. The effectiveness of the program depends on the features used for recognition and the order in which the features are evaluated. We describe three probabilistic feature utility measures and a cost function based on program execution time that serve as the basis of our technique. Computation of such utility measures from a statistically representative sample of images has been demonstrated. Problems encountered in computing such measures from computer-generated images are described.
In this paper we present a probabilistic prediction based approach for CAD-based object recognition. Given a CAD model of an object, the PREMIO system combines techniques of analytic graphics and physical models of lights and sensors to predict how features of the object will appear in images. In nearly 4,000 experiments on analytically-generated and real images, we show that in a semi-controlled environment, predicting the detectability of features of the image can successfully guide a search procedure to make informed choices of model and image features in its search for correspondences that can be used to hypothesize the pose of the object. Furthermore, we provide a rigorous experimental protocol that can be used to determine the optimal number of correspondences to seek so that the probability of failing to find a pose and of finding an inaccurate pose are minimized.
The selection and placement of cameras and light sources for a specific task (e.g., locating a part in a tray or inspecting an object) is one of the most important steps in creating a successful vision system, because obtaining high-quality images can greatly simplify the vision algorithms and improve their reliability. We will describe techniques that use a visual task description stated in terms of features to be detected, and derive a range of light-source locations that satisfy the task requirements. In particular, given a task description that specifies particular object edges to be detected with a given edge detector (e.g., a Sobel edge operator), our techniques determine the constraints on light-source location such that the edge is detected.
One of the purposes of computer vision is to reconstruct a three-dimensional description of the environment from multiple sensor images. Because no single image can show all salient features of a complex scene, an intelligently chosen set of images is needed to provide the complete description. Because the sensor data is generally incomplete and errorful, a priori knowledge about the scene and the sensors is used in the reconstruction process. This paper will describe methods for applying scene and sensor knowledge to the problem of three- dimensional reconstruction from multiple images. In particular, the geometric representation and reasoning techniques applied to 3D generic object recognition in the 3D FORM system [Walk88, Walk90] will be extended to reason about segmented 2D images. Knowledge about the sensors and objects in the scene will be represented as frames in the 3D FORM system. For each sensor, the geometric relationship between the sensor's pose, the image features and the world features is modeled, and for each object, the geometric relationships between the object and its parts are modeled. Three-dimensional reconstruction is performed by transforming each sensor image to a set of constraints on the world, and then combining the constraints from all sensors with constraints imposed by the object models to generate an interpretation that satisfies all constraints. The advantage of this method is that the resulting system is able to adjust itself to the available information without knowing in advance which constraints will be specified.
Hierarchical representation of three dimensional (3D) object shape has been based on different levels of resolution. This paper introduces a representational hierarchy that is based on the connectedness and neighborliness of object shape expressed through topologies on the bounding surface with increasing strength. The topology at the object part level is weaker than at the level of simply connected elliptic, parabolic, plane and hyperbolic regions and the strongest topology is given by the classical topology for smooth surfaces. This provides a unified view on the representation of three-dimensional object shape for recognition. The open sets have a natural interpretation in the context of object recognition and relate to different types of recognition processes. More elaborate descriptions are naturally obtained by the introduction of additional structure, such as affine and metric. Qualitative shape features are defined at each level of the hierarchy their usefulness and limitation for shape discrimination is discussed. The possibility of deriving the topologies from ordinal structure is considered and examples of object description presented.
Technical Illustrations (TI) are one of the strong methods to show a way to assemble/disassemble a mechanical assembly. The information including not only the shape of a constituent and an order of operations necessary to assemble/disassemble can be driven from a single TI. However an additional new TI is often required to further supplement insufficient information obtained from a single TI. The paper shows the solution required to resolve the problem which arises in augmenting a model description by unifying the results obtained from several TIs.
There are two kinds of depth perception for robot vision systems: quantitative and qualitative. The first one can be used to reconstruct the visible surfaces numerically while the second to describe the visible surfaces qualitatively. In this paper, we present a qualitative vision system suitable for intelligent robots. The goal of such a system is to perceive depth information qualitatively using monocular 2-D images. We first establish a set of propositions relating depth information, such as 3-D orientation and distance, to the changes of image region caused by camera motion. We then introduce an approximation-based visual tracking system. Given an object, the tracking system tracks its image while moving the camera in a way dependent upon the particular depth property to be perceived. Checking the data generated by the tracking system with our propositions provides us the depth information about the object. The visual tracking system can track image regions in real-time even as implemented on a PC AT clone machine, and mobile robots can naturally provide the inputs to our visual tracking system, therefore, we are able to construct a real-time, cost effective, monocular, qualitative and 3-dimensional robot vision system. To verify our idea, we present examples of perception of planar surface orientation, distance, size, dimensionality and convexity/concavity.
In the photometric stereo method, the surface orientation of an object is determined by using multiple images (at least three images). The multiple images are obtained by changing the position of a light source, and the surface orientation of the object can be determined only in the area of the object illuminated by all light sources. So it is desirable that the number of light sources should be reduced. We propose a method for determining surface orientation along with reflectance, in which only two light sources are used. It is assumed that an object is convex and has a smooth Lambertian surface with locally constant reflectance. We have found the following: there exists a 'separation line' in the image, along which the surface normal is represented as a linear combination of two vectors pointing toward the direction of light sources, and the separation line separates the surface into two regions. Not only when the reflectance is known, but even though the reflectance is unknown, we can determine the surface orientation of the object using the property of the separation line. Here we use the constraint of convexity. Simulations carried out under various situations yielded satisfactory results.
We present a new edge-based matching algorithm that can be applied for unstructured stereo images. We first review briefly a major recent development in establishing stereo correspondences and show that a great deal of research in stereo is needed in order to apply stereo techniques to solve more real-world problems. We then discuss why we use edges (without simplification and without linking) as primitives to be matched rather than other primitives (points, regions, ...) and what adequate edge detector should be used. We then argue that matching a whole parts of edges should be done rather than comparison of straight line segments. We then discuss the occlusion problem and some defects of the edge detector that possibly cut the edges to several pieces, and argue that partial matching of edges can be a solution to such problems. We then explain what best matching criteria are used, comprising both shape and gray level attributes. These criteria are chosen for their pertinence, representation and computation simplicity and for their similarity to those probably used in human vision system. Finally, complete quantitative experimental results are shown with various indoor and outdoor real world scenes.
A field of great interest in computer vision is depth reconstruction by motion. The final goal is the computation of the visible surface structure in a 3D scene by analyzing a sequence of digital images acquired moving a camera in the environment. This paper describes a method of depth reconstruction based on stochastic modeling of the motion, the image acquisition processes, and the 3D-2D projection. The stochastic model is based on the well-known extended Kalman filter to derive an optimized depth estimation: it integrates successive views by using a pair of optical flow equations that we have adapted to a general pin-hole camera model (linear transformation from 3D to 2D coordinates). In comparison with similar methods we developed a reconstruction system to improve the speed of the estimation process and its stability by means of a multi-scale approach and used a massive parallel MIMD machine to speed up globally the estimation process.
The state-of-the-art of the transform algorithms has allowed the newest versions to produce excellent and efficient reconstructed images in most applications, especially in medical CT and industrial CT etc. Based on the Zero Cylinder Coordinate system (ZCC) presented in this paper, a new transform algorithm of image reconstruction in fan beam industrial CT is suggested. It greatly reduces the amount of computation of the backprojection, which requires only two INC instructions to calculate the weighted factor and the subcoordinate. A new backprojector is designed, which simplifies its assembly-line mechanism based on the ZCC method. Finally, a simulation results on microcomputer is given out, which proves this method is effective and practical.
The aim of this paper is to investigate the different components of a laser scanner for industrial applications like quality control or future domestic applications like the front end of a videophone system. The scanner implemented in this paper is based in the laser range finding principle.
A computer vision-based system for stop and go driving is presented. The horizontal edges characteristics of vehicles are extracted with a model-based filter. Symmetric groups of horizontal lines indicate the centerlines of vehicles in the scene. Discontinuities in the lines indicate the edges of the vehicles. The centerline and width of the vehicles gives the bearing and approximate range to them. This approximate range is filtered and used to maintain a safe distance to the vehicle ahead.
Passively sensing three-dimensional structure by means of computational stereo has received a great deal of attention in the computer vision community as well as in the traditional photogrammetric and remote sensing communities. The first and most difficult step in recovering 3-D information from a pair of stereo images is that of matching points from one image of the pair to the corresponding points in the second image. In this paper we develop an edge-based, fast and effective stereo matching technique characterized by two matching stages: initial matching and consistency check. Several constraints (Epipolar, Uniqueness, Disparity continuity, Stochastic constraint and Disparity range constraint) are used to reduce the combinatorial search and the ambiguity of the false targets. With this approach, we can obtain the global optimum matches. The algorithm has been experimentally evaluated using a set of real images. The implementation and results have shown the efficacy of the proposed stereo matching technique.
This paper will describe from plane to space corresponding lines location 3-D information acquisition and three views reconstruction. For this purpose, a series of space geometry calibration should be done, and a set of coordinate also are set. Then the intersection of the corresponding points in space can be realized to acquire the 3-D depth information.
In this paper, we have discussed the improved technique of surface interpolation, clipping, and hidden surface elimination to solve some problems in generating perspective views of digital terrain model (DTM) data based on matching of stereo image pair.
The proposed parallel clustering technique performs several clustering processes (for the same data set) in parallel, using different sets of initial cluster centers. Each clustering process consists of a sequence of iterations. The clustering processes are iterated in parallel within each parallel step. By the end of each parallel step, the clustering parameters are evaluated according to prespecified criteria. 'Non-promising' cluster center sets are discarded, and new cluster center sets are formed using 'promising' cluster centers. The presented illustrated examples indicate a reduction of 7% to 30% in the number of iterations required for convergence.