A simple vision based system to perform a contactless coupling, rather than a hardware hook is suggested. The method facilitates automatic convoy formation and management. The vision technique uses 3 landmarks and a CCD camera. Using a geometric approach that capitalizes the excellent angular resolution of CCD cameras, the position and orientation of the camera are estimated in respect to the landmarks. Experimental results show the reliability of the technique in operating contexts.
Simple region features consist of the salient parts of regions bounded by the zero-crossings of the Laplacian of Gaussian operator. Simple region features have two fundamental advantages over existing features, such as edges. First, there is no threshold involved in simple region feature extraction process; thus, the process is not sensitive to the specification of the threshold. Second, the features consist of regions, rather than points. Consequently, they have geometric attributes, such as area, shape, and orientation, that can be exploited by subsequent processes. In this paper, we use simple region features to estimate camera motion and depth over multiple frames. A tracking algorithm computes the correspondence of the features across the image sequence; recursive estimates are obtained for the optical flow of each of the features. The geometric properties of the features are used to determine a measure of the reliability of the correspondence mapping. A weighted least-squares error estimate is obtained for the camera motion and the depth of each feature; the weights for each error term are derived from the reliability measure of the correspondence mappings.
This paper addresses the problem of recovering time-to-contact by actively tracking motion boundaries. Unlike previous approaches which use image features, we use the camera's own motion to both detect and track object boundaries. First we develop a framework in which the boundaries of objects are automatically detected using the motion parallax caused by the motion of an active camera. We use a correlation-based method to locate motion boundaries and our work has focused on detecting the motion boundaries early and robustly. A confidence field, which expresses the likelihood that a point lies on a motion boundary, is constructed from the shape of the correlation surface. Spatial coherence of object boundaries is modeled with dynamic contours which are automatically initialized using an attentional mechanism. Then, as the camera moves, the shapes of the dynamic contours are held fixed and they are tracked under an assumption of affine deformation. The affine parameters are recovered from the active tracking over time and are used to compute time-to-contact. We illustrate the behavior of this active approach with experiments on both synthetic and real image sequences.
A collision is an event where the robot path intersects with an object in the environment. Collisions can be desired if the object is a goal, or undesired if the object is an obstacle. We call the place of intersection a collision point. Prediction of collision points relies on a continuity assumption of the robot motion such as constant velocity. The robot is equipped with monocular vision to sense its environment. Motion of the robot results in motion of the environment in the sensory domain. The optic flow equals the projection of the environment motion on the image plane. We show that under the continuity assumption described above, the collision points can be computed from the optic flow without deriving a model of the environment. We mainly consider a mobile robot. We derive the collision points by introducing an invariant, the curvature scaled depth. This invariant couples the rotational velocity of the robot to its translational velocity and is closely related to the curvature of the mobile robot's path. We show that the spatial derivatives of the curvature scaled depth give the object surface orientation.
This paper addresses the object tracking problem in an image sequence using an active contour model called `snake' based on energy-minimizing curves with global constraints. Attempts to improve the robustness of these models is developed within the robust estimation theory leading to a new cost function (rho) in the new definition of energy: the resulting model is thus called (rho) -snakes. Furthermore, we formulate a temporal continuity constraint in the energy definition: an incremental active contour model is suitable for object tracking as it includes prediction. Experimental results on the (rho) -snakes compared to classical snake are given on both synthetic and real IR images sequence.
Using computer vision for mobile robot navigation has been of interest since the 1960s. This interest is evident in even the earliest robot projects: at SRI International (`Shakey') and at the Stanford University (`Stanford Cart'). These pioneering projects provided a foundation for late work but fell far short of providing real time solutions. Since the mid 1980s, the ARPA sponsored ALV and UGV projects have established a need for real time navigation. To achieve the necessary speed, some researchers have focused on building faster hardware; others have turned to the use of new computational architectures, such as neural nets. The work described in this paper uses another approach that has become known as `perceptual servoing.' Previously reported results show that perceptual servoing is both fast and accurate when used to steer vehicles equipped with precise odometers. When the instrumentation on the vehicle does not give precise measurements of distance traveled, as could be the case for a vehicle traveling on ice or mud, new techniques are required to accommodate the reduced ability to make accurate predictions about motion and control. This paper presents a method that computes estimates of distance traveled using landmarks and path information. The new method continues to perform in real time using modest computational facilities, and results demonstrate the effects of the new implementation on steering accuracy.
This paper deals with a visual-motion fixation invariant. We show that during fixation there is a measurable nonlinear function of optical flow that produces the same value for all points of a stationary environment regardless of the 3-D shape of the environment. During fixated camera motion relative to a rigid object, e.g., a stationary environment, the projection of the fixated point remains (by definition) at the same location in the image, and all other points located on the 3-D rigid object can only rotate relative to the 3-D fixation point. This rotation rate of the points is invariant for all points that lie on the particular environment, and it is measurable from a sequence of images. This new invariant is obtained from a set of monocular images, and is expressed explicitly as a closed form solution. In this paper we show how to extract this invariant analytically from a sequence of images using optical flow information, and we present results obtained from real data experiments.
In this research we propose solutions to the problems involved in gaze stabilization of a binocular active vision system, i.e., vergence error extraction, and vergence servo control. Gazing is realized by decreasing the disparity which represents the vergence error. A Fourier transform based approach that robustly and efficiently estimates vergence disparity is developed for holding gaze on selected visual target. It is shown that this method has certain advantages over existing approaches. Our work also points out that vision sensor based vergence control system is a dual sampling rate system. Feedback information prediction and dynamic vision-based self-tuning control strategy are investigated to implement vergence control. Experiments on the gaze stabilization using the techniques developed in this paper are performed.
Spatial quantization error and displacement error are inherent in automated visual inspection. This kind of error introduces significant distortion and dimensional uncertainty in the inspection of a part. For example, centroid, area, perimeter, length, and orientation of parts are inspected by the vision inspection system. This paper discusses the effect of the spatial quantization error and the displacement error on the precision dimensional measurement of an edge segment of a 3D model. Probabilistic analysis in terms of the resolution of the image is developed for one dimensional and two dimensional quantization error. The mean and variance of these errors are derived. The position and orientation errors (displacement error) of the active vision sensor are assumed to be normally distributed. The probabilistic analysis utilizes these errors and the angle of the line projected on the image. Using this analysis, one can determine whether a given set of sensor setting parameters in an active system is suitable to obtain a desired accuracy for specific line segment dimensional measurements. In addition, based on this approach, one can determine sensor positions and viewing direction which meet the necessary range for tolerance and accuracy of inspection. These mechanisms are helpful for achieving effective, economic, and accurate active inspection.
Active vision systems faced with the problem of searching for targets in natural scenes are finding space variant sensors increasingly important. A popular class of space variant sensors are those based on the structure of the primate retina: they have a small area near the optical axis of greatly increased resolution (the fovea) coupled with a gradual falloff in resolution as one moves towards the periphery. In such systems, a primary requirement is an efficient mechanism for targeting the optical axis to different points of interest in the visual world. In this paper, we describe a robust and efficient paradigm for achieving foveal targeting by making use of iconic scene descriptions comprised of the responses of a bank of steerable filters at multiple scales and orientations. The filter bank description is rotation and scale invariant, occlusion tolerant, and view-insensitive to a considerable extent. We describe procedures for robustly matching vectors of a previously foveated point to instances of the point in other possibly transformed images obtained after camera motion. In such situations, the multiscale structure of the filter bank lends itself naturally to an efficient targeting mechanism for the space variant sensor.
The applications of robots in automating assembly tasks has been hampered by an inability to overcome problems of operation of robots within an unstructured and unconstrained environment at an acceptable cost and performance. Such operations necessitate the integration of high-level sensory capabilities -- and in particular vision sensing -- into the overall control system at various levels in the control hierarchy. The paper summarizes the past and present approaches to robot guidance using vision from the perspective of a classification due to Sanderson and Weiss. A two pronged approach is suggested as a promising strategy towards integrating vision into a robot for visual servoing. That is to directly feed back vision derived spatial information (optimized for robot position control) into a robot controller cast in task or operational space and to use vision as a primary sensor medium for all critical spatial measurement tasks in the workspace. This offers the potential for low-cost high performance vision servoing through a greatly reduced reliance on the fidelity of the individual components. The paper presents this new strategy for visual guidance supported by experimental results, which is based on a new model of natural vision systems and upon a strategy which addresses the identified performance needs of the diverse vision tasks within a vision oriented robotic workcell. A practical implementation of this generalized approach has been developed which offers sub-pixel resolution operation at low cost. This system has, in turn, been integrated into a direct vision feedback control application (involving a two-link rig and on 2 degrees of freedom of a PUMA560 arm) and has been used to demonstrate the execution of pick-and-place type tasks. Hence end-effector placement has been demonstrated with the positioning accuracy limit imposed by the camera, and significant robustness to kinematic model and vision calibration errors being observed. Overall, therefore it is shown that low-cost direct vision feedback is possible and that it can offer significant improvements over existing strategies.
The purpose of this paper is to describe exploratory research on omnidirectional vision for the recognition and control of a mobile vehicle. The omnidirectional vision control technique described offers the advantage of an extremely wide angle field of view. This may be translated in practice to a machine which will not get lost when following a path, to a target locating system which can see both forward and backward, and generally helps the robot survive as a prey rather than as a predator. The wide angle of view permits a mobile robot to follow a curved path even around sharp corners, hairpin turns or other complicated curves. The disadvantage of the omnidirectional view is geometric distortion. This geometric distortion may be easily corrected after calibration to determine important parameters. An object recognition method was used that detects the largest target in a selected region of the field of view, and computes the centroid of this target. When two target points are detected, the algorithm calculates a projected three dimensional path for the robot. The distance and angle from this ideal path are then used to provide steering control for a mobile robot. The current application for this technique is a generic intelligent control device that is transportable from one mobile vehicle to another with changes only in system parameters rather than control architecture. The significance of this research is in showing how the geometric distortion can be compensated to permit an omnidirectional vision navigation control system for a mobile robot.
In this work we consider algorithmic approaches to the placement problem for surface mounted components and present theoretical results on the minimum number of measurements that are needed to reconstruct images of the components and circuit boards. Initial placement of the chip over the mount area may be slightly incorrect. Machine vision has been used to achieve high accuracy in the placement by providing feedback to the controller on the position of the component's leads relative to the soldering pads. Many traditional 2D vision techniques for image registration have been used to address this problem. Here we take a different approach and consider the problem of reconstructing the shape of the component and the circuit board from a set of projections. We assume that the image consists of overlapping rectangles (the leads and pads) that are iso-oriented and provide theoretical results on the minimum number of measurements that are needed to recover the shape. Such theoretical bounds can help in designing efficient methods to solve the problem of aligning the chip leads to the solder pads.
In this paper we present a self-training high-speed inspection system which can be used in a variety of manufacturing applications for the detection of flaws in the overall appearance of products without requiring explicit specification of what defects are expected and in what regions. During training, the system is presented a number of good parts and learns and remembers two main aspects of a product: where edges are and where they are not. After showing the system a few images of the product, it expects areas of the product that had edge information to continue to have edge information and it expects `quite areas' to remain quiet. During run-time, if the system detects discrepancies in either category, it rejects the product. What sets this inspection system apart from traditional inspection machines is that it also has the ability to adapt to normal manufacturing realities. The system forgives changes in the product which occur during fabrication as long as they fall within an acceptable range. This range is defined by the manufacturing process itself during system training.
This paper describes a laser triangulation system that has been developed by the authors and integrated into a machine that manufactures cork stoppers automatically. The system is based on a solid state laser that projects a set of parallel lines onto the surface of the raw cork material. The three-dimensional shape of the surface is extracted through the analysis of an image obtained from a standard monochrome video camera of the lines of laser light. Special algorithms were developed to overcome image degradation caused by the uneven illumination intensity of the laser lines and by irregularities and roughness of the surface itself. The complete vision system incorporating this development is presented. It is responsible for the inspection of the cork surface, detection of relevant defects and the generation of a punching pattern to produce the stoppers. This pattern is then transmitted to the punch mechanism. The machine allows objective quality control constraints to be applied and decreases the amount of waste produced when compared with the manual method of producing cork stoppers. A fully functioning prototype of the machine has been built and successfully tested and other applications for the laser triangulation system technique are being considered.
Ultrasonic non-destructive testing is used both in manufacturing and in maintenance to ensure quality. In ultrasonic testing, a scanning probe transmits ultrasound pulses and the signal scattered back is detected by a receiver. The time of flight information is often called the A- scan. The A-scans form two dimensional images (B, C, D-scans) corresponding to different projection planes. The scanning over a surface provides information about both the location and the size of the defects and produces huge data files. Therefore, A-scans are often reduced to a single C-scan. In this set-up, the probe angle is zero degrees and the inspection plane is near the focus plane. The inspector uses C-scans or also A and B-scans, if they are available, in defect assessment. An approach to combine the reduction of memory space and the categorization of defects is proposed. Each A-scan is clustered in an unsupervised manner into a number of classes using a self-organizing feature map. The self-organizing process produces a feature map where similar A-scans are close to one another. The classes are visualized by assigning a color for each neurone, so that similar A-scans will get similar colors. In the classified C-scan, different defects can be easily distinguished by their color. The proposed method supersedes the typical C-scan methods by its ability to classify the defects using the characteristic features of the A-scans.
The visual inspection of many industrial products is based on contour analysis. The main task is the detection of defects like cracks and chips. The automation of this visual inspection procedure requires robust and flexible systems and algorithms. This paper describes new contour analysis algorithms for defect detection. Innovative techniques for contour filtering and analysis (e.g., scale-space analysis) are presented. The performance of different approaches is shown by several industrial examples and applications.
Our recent research results indicate that a very good texture discrimination can be obtained by using simple texture measures based on gray level differences or local binary patterns, for example, with a classification principle based on a comparison of distributions of feature values. In this paper two case studies dealing with the problems of determining the composition of mixtures of materials and metal strip inspection are considered.
This paper describes a proposal to develop a flexible automation system for sample preparation and analysis in a chemistry laboratory without human assistance. The key to such automation is a robot arm, centrally placed with respect to a series of work stations containing balances, mixers, dispensers, centrifuges and analytical instruments. Object handling at each station and sample movement from one station to another is performed by the robot arm according to user-programmed procedures. The research emphasizes the analysis and modular decomposition of chemistry procedures, modeling the procedures in a computer system and integrating this model with robot arm and other instrumentation hardware involved in a complete automation of a chemistry laboratory.
This paper presents an overview of the development of vision algorithms for a flexible inspection system. This system is being designed for the inspection of surface mounted devices. The system identifies missing components and quantifies the position and rotation of the components present. Two types of approaches for the identification of missing components are presented and contrasted. The first approach involves conventional image processing techniques that are actually used in an operative prototype that is under test in an industrial environment. The second approach is a study of the use of backpropagation neural networks in as an alternative inspection method for component detection only.
Classical nonlinear methods for finding the relative orientation between a camera and an object suffer from considerable computation time and requirement for a good initial estimate, while linear methods, which do not need an initial estimate, are sensitive to noise and outliers. In this paper, we study the performances of different solution methods in the presence of noise and outliers. We also present a new iterative algorithm which is efficient, globally convergent, and relatively robust to noise and outliers.
A technology is described which allows the application of real-time imaging in combination with mega-pixel CCDs. This technology is based on the following characteristics: high-speed transport of the video information through the parallel CCDs in the imaging section, very high-speed transport of the charge packets through the serial section of the devices, and high- speed conversion of the electrons to a measurable voltage by the output amplifier. Key competencies to comply with these requirements are: low-resistive CCD gates, low- capacitance CCD gates and high bandwidth and low noise floor output stages.
The problem studied in this paper is algorithms for fast and reliable extraction of range discontinuities in dynamic scenes. The application is to control the motion of a robot using a range scanning sensor. When estimating the pose of the objects in a scene, it is obvious that range discontinuities and flat surfaces have the largest information content. The concept studied consists of a smart camera chip together with a scanning illuminating laser. Feedback loops are closed between the chip and the scanning laser so as to follow along different types of range discontinuities in the scene. More explicitly: (1) Two types of feedback laws are outlined so as to track along range discontinuities both with and without occlusion. (2) The laser can also track along a `generalized cylinder,' say, a cable free in space or laying on an uneven surface. (3) The tracking accuracy is estimated as the laser follows along the `curve of discontinuity.' The results are too preliminary and are not in this paper. In an earlier study, the Hough transform was found to be very robust in extracting the coordinates of planar surfaces. The edge parameters in this study are thus complementary to these surface parameters. Compared with complete range scanning of the entire scene, it seems possible to gain at least one order of magnitude in speed. This is important since these extracted range features are inside the feedback loop of the robot.
This paper presents a method for calibration of a sheet of light range camera using robot motion information from odometers instead of an absolute reference object in a static scene. The calibration is integrated with the robot position estimation, and performed during normal operation of the robot. This makes it possible to detect when parameters change during operation, and to recalibrate automatically. To estimate the calibration parameters as well as the robot position relative to the surface a Kalman filter is used. The system is highly nonlinear, and a normal extended Kalman filter will often fail to converge to the correct value. This is solved by changing the coordinate frame in which the filtering occurs, taking into account the resulting correlations between the estimate and measurement. The calibration assumes that initial estimates of the calibration parameters are available.
A module approach is discussed for high-resolution imaging. The functionality of this module, based on a 1k X 1k real time CCD-imaging sensor requires new approaches in signal processing, working modes, and interfacing. The implementation into an actual module is presented. Implementing a high resolution real time CCD-image sensor into an image-capture module requires some novel approaches. After a general approach of image-capture modules the main characteristics of a 1k X 1k CCD-image sensor are given. Some aspects of video-processing and specific functions, such as alternatives timings and interfacing, are discussed. The paper concludes by discussing the implementation into an actual module for high resolution imaging.
The purpose of this paper is to elaborate a novel method to determine the absolute geometry of a biplane x-ray imaging system, using angiograms acquired daily by the clinician, without any special calibration procedure during the x-ray examination. The approach is based on the minimization of the mean square distance between observed and predicted projections of a set of reference points identified by the clinician on a simultaneous pair of images. The method employed is iterative and needs two views of at least 12 reference points of unknown 3-D positions to converge to the correct answer. Our approach should be particularly useful in clinical applications since it needs very little intervention from the clinician.
This paper presents a heuristic edge detector for extracting wireframe representations of objects from noisy range data. Jump and roof edges were detected successfully from range images containing additive white Gaussian noise with a standard deviation equal to as high as 1.2% of the measured range values. This represents an appreciable amount of noise since approximately 5% of the errors are greater than 12 cm and 32% of errors are greater than 6 cm at a distance of 5 meters. The noise insensitive characteristic of the heuristic edge detector enables low cost range scanners to be used for practical industrial applications. The availability of low cost active vision systems greatly broadens the horizon of integrating robotics vision systems to manufacturing automation.
The majority of vision-based recognition systems currently employ geometric methods for matching extracted image primitives with a three-dimensional representation of an object, e.g. a CAD model. Such systems typically assume that each object will be discernibly different, hence ensuring unique recognition. However, in many situations, the geometry of the objects may be identical (e.g. drinking mugs on a shelf, packets on a supermarket shelf), but a particular object may only be distinguished by the patterns or markings on its surface. Such surface markings may be described, for example, by simple vector-based graphical primitives (lines, curves, text) or a pixel-based image representation. These descriptions are considered to be `painted' onto the model object surface. This paper considers several algorithms that may be used to extract and match surface-based primitives to model objects stored in a database; how the different surface descriptions may be represented in a model-based system; and the integration of these into a system to match appropriate descriptions of surface markings in order to achieve recognition. Results of applying these algorithms to a set of exemplar images are presented.
In this paper we propose a solution for the location of a mobile robot. The approach is based on visual sensing of artificial landmarks scattered across the workspace. The problem of location is divided in two stages: identification and positioning. The identification of the shape consists in its description in terms of statistical invariants, the comparison of these descriptors with known classes, and its assignation to the nearest pattern known. The positioning is solved in three steps: first, the transformation of the shape in the image plane is recovered using the change of the statistical moments; second, the 3D orientation of the landmark is inferred by calculating the parameters of the plane where the mark is; and finally, the position of the landmark with respect to the robot is computed. Through all the location process, the statistical moments is the underlying technique used. We present experimental evidence to analyze the error in the estimation of the 3D structure of some shapes for both a monocular and a two camera vision system. Our experiments show that the approach presented is suitable for robot location.
Using computer vision to recognize 3-D objects is complicated by the fact that geometric features vary with view orientation. The key in designing recognition algorithms is therefore based on understanding and quantifying the variation of certain cardinal features. The features selected for study in the research reported in this paper are the angles between landmarks in a scene. The spatial arrangement of landmarks on an object may constitute a unique characteristic of that object. As an example the angles between the wing tips and the nose cone of an aircraft may be adequate in distinguishing amongst a given class of aircraft. In a class of polyhedral objects the angles at certain vertices may form a distinct and characteristic alignment of faces. For many other classes of objects it may be possible to identify distinctive spatial arrangements of some readily identifiable landmarks. In this paper we derive the two dimensional joint density function of two angles in a scene given an isotropic view orientation and an orthographic projection. This analytic expression is useful in deriving likelihood functions which may be used to obtain measures of the likelihood of angle combinations in images of known objects or scenes. These likelihood functions allow us to establish statistical decision schemes to recognize objects. Experiments have been conducted to evaluate the usefulness of the proposed methods.
The work here presented proposes an iterative refining algorithm to build a sequence of convex functionals based essentially on a weak thin-plate under tension model for smoothing. This algorithm is applied to structure estimation and edge detection problems. The sequence of functionals is obtained by modifying continuously and iteratively continuous non-binary line processes which control regularity of the surface. This is done by comparing the smoothed estimation with initial data (sparse or dense), that is evaluating signal-to-noise characteristics.
Geometric features of an object in an image vary with the view angle of the camera or with the orientation of the object. The variation in measured features is often expressed using probability density functions. The probabilistic bases of this approach arise from assumptions concerning the unconstrained pose of the object and the consequent that the two orientation angles are random variables with a known joint density. In this paper we concentrate on recognizing the faces of polyhedral surfaces. We start by quantifying the minimal features in a face that are scale invariant and rotation invariant (about the optical axis). Two features we found to be analytically tractable were the normalized area between two edges and the normalized innerproduct of two edge vectors. We refer to the features as quadrature line ratios. The joint density of these measured features in orthographic images has been derived. The variation of these features in images are analyzed and plotted. Likelihood functions based on this density have been developed and used in distinguishing and recognizing faces of polyhedra. Experiments with real and simulated data have been conducted to verify the efficacy of the proposed schemes and the results show that the method is promising.
This paper presents a method for detecting corner points on 3D space curves. It is an extension of a previous work of distance accumulation for detecting corner points on 2D planar curves.6•7 A quantitative measure is proposed to define the "cornerness" of a 3D curve at a curve point, which has the advantage of being more stable and causing much less shape distortion than the traditional smoothing methods. In particular, this measure is invariant to scale, an attractive property for corner detection. Strategies are presented for reliable maximum selection. Experimental results with simulated data shows that the robustness and accuracy of the cornerness method.
Keywords - Corner detection, curvature, distance accumulation, maxima selection, curve representation, 3D curves.
This paper examines recovering the depth of an object from two stereo images by correlating matching feature points. The principle concern of this paper is to demonstrate one method of feature point extraction. The orientation of the two viewers with respect to each other is fixed, this allows us to make use of the epipolar constraint for depth recovery. The methodology for extracting feature points from each image are: perform edge detection on each original image; use the Hough transform to identify the lines which define the image; label feature points as the intersections of lines where the intersection is a valid point in the edge detected image. The above method was very accurate with respect to recovering the depth of input images which were in certain orientations. However, the algorithm was too sensitive to error in the Hough transform space to allow for consistent evaluation of depth for all orientations.
Depth map recovery is one of the central tasks in active vision systems. In many applications such as path planning and collision avoidance, there is a clear need for obtaining a coarse depth map of the environment in a reasonable amount of time. Traditionally, stereo vision techniques have been used for depth map recovery. Such methods require that features are first found and then correctly corresponded between two images. However, in real images, stereo vision techniques not only are computationally time consuming but also suffer from errors in feature detection and correspondence. This paper describes the theory and implementation issues of depth map recovery from a sequence of two monocular images without prior knowledge of the involved motion. Our technique does not use either optical flow or feature correspondence. Instead, the spatio-temporal gradients of the input intensity images are used directly. The experimental results of implementing this method on real images are presented. Furthermore, important implementation issues such as detecting and correcting depth map flaws are discussed and techniques for overcoming such practical problems are described and tested. We also investigate the influence of subsampling on the quality of the recovered depth maps and introduce some more sophisticated techniques.
It has been well known that the human vision system is a multi-resolution and spatially shift- variant system. The size of the edge filters on each retina are small and roughly constant within the foveal area, and are increased linearly with eccentricity outside the fovea. This mechanism allows the human visual system to perceive detailed description about the target surface within the fovea vision area, and to obtain a global description about the scene in the peripheral vision area. This paper describes a stereo vision system which simulates this mechanism of human vision. The system uses a pair of crossed-looking cameras as sensors. The fixation point is used as the geometric center of the 3D space. With this perception geometry, the vision system perceives depth variation of the target surface, rather than the absolute distance from the target surface to the sensors. This property allows the stereo correspondence to be achieved through a fusion process, which is similar to the optical fusion of two diffraction patterns. The fusion process obtains disparity information of the entire image in one convolution operation. The volume of calculation, and therefore the processing time needs for depth perception, is largely reduced. The mechanism of spatially shift-variant processing is implemented by applying logarithm conformal mapping to the images. As a result, the sensitivity in depth perception decreases exponentially from the center to the peripheral area of the image. This allows the vision system to obtain depth information about the scene within a broad field of view.
This paper describes a pyramidal robust algorithm dedicated to the depth-from-motion problem (computation of the third dimension in the case of a moving monocular camera). We propose a direct method that does not require the tedious computation of optical flow. Depth is obtained from cinematic parameters and gradients of image brightness. As with many of the early vision problems, this one is ill-posed and regularization techniques are used to recover a unique stable solution. In the presence of noisy image acquisition and motion discontinuities, the estimation of optical flow is reformulated in robust estimation framework. To improve our algorithm to large motions, we use a multi-resolution scheme. Some experimental results are shown both on simulated and real images.
A new method for automatically reconstructing a three dimensional object from a serial cross section is presented in this paper. The method combines the technical chaining contour and the sampling method to construct an object. In the proposed method the initial description of the object is formed by a serial continuous contour. First we sample each cross section to determine the nods that constitute the vertices of the triangles used for the reconstruction, then we determine the tangent in these nods and we assign to each one of them a code depending on the tangent direction. For triangulation there is nothing left to do but collect the nods of two adjacent contours the code of which is identical.
Surface shape description from 3-D measurements is an important problem since many applications, including recognition and scene interpretation, rely on this initial description. In order to be reliable, the recovered description must not be sensitive to the acquisition conditions: (1) it must be robust in the presence of spurious measurements, (2) it must be viewpoint invariant, and (3) the density of measurements must be sufficient for stability. Very few reconstruction approaches based on regularization, segmentation, or geometric primitive extraction respect one or even two of these three conditions. A new reconstruction approach is presented for which a polynomial-based description of surface sections is provided to higher level applications as reliable hypotheses. The first two conditions are met by including a measurement error model as an integral part of the recovery procedure. To meet the third condition, a test of the stability of the hypothesized model is performed in the measurement space and states whether the hypothesis is data-dependent or not by searching for the redundancy in the data supporting the model. Along with the set of descriptive parameters for each polynomial section, a figure of merit based on the covariance matrix of the parameters is computed. The validity of the reconstruction approach is demonstrated by extracting planar and quadric sections from range data through an implementation on a massively parallel SIMD processor architecture.
This paper introduces a new biologically inspired technique that retrieves scene surface information. The human eye provides the inspiration for this technique. Human vision essentially comprises peripheral and foveal vision. Information processing under peripheral vision is very fast since fewer features are sensed. Peripheral vision provides guiding information for the subsequent foveal analysis. Foveal vision involves a high resolution examination of a particular region of the scene. The biologically inspired technique involves a pyramid hierarchy that provides the platform for three-dimensional reconstruction at different resolution levels. Three-dimensional reconstruction is based on a cue combination process that involves feature-based stereo, occlusion, and visible surface reconstruction. The technique enables fine resolution surface information to be obtained for a specified region of interest which is surrounded by coarse resolution surface information. Experimental results are presented to illustrate the performance of this technique.
Many objects in the real world are tubular in shape and this paper is about understanding this object class, which we refer to in the paper as generalized tubes (hereafter GTs). Intuitively, a GT is constructed by sweeping some planar closed curve (the GT cross-section) along a 3D space curve (the GT axis). First, we examine the GT class as a whole and identify two important GT subclasses where the parametric curves and the set of intrinsic directions are related: (1) GTs with circular cross-sections (hereafter CGTs) and (2) GTs with zero-torsion axes (hereafter ZGTs). Then, these two classes of generalized tube are analyzed with respect to their surface and projective properties. For example, CGT occluding edges are shown to project to parallel contours, and the contour image of a CGT is shown to have at least one degree of freedom. An algorithm is then given that uses both image contour and reflectance to recover CGT shape parameters modulo scale.
In this paper, we present a new intensity based technique for recovering depth information from two or more images. Our method uses planar patches to approximate 3-D surfaces. In order to recover the depth information, the view-line constraint and the imaging geometry are introduced. The view-line constraint is used to restrict the position of a planar patch. From the constraint we can get the candidates of the corresponding planar patch. The imaging geometry will then help us to find the best estimation from the candidates. A hypothesis and verification based optimization procedure is used. We project each candidate of the planar patch perspective onto the observed images, and calculate the intensity difference between the projected image and the observed images. The candidate which has the best fitting with the observed data is selected as the solution. This method is different from the traditional stereo algorithms because it does not require us to solve the corresponding problem.
In many industrial applications, non-contact and non-destructive surface profile measurements are frequently required. The projection grid method is one of the available methods that fulfills the above needs. It is based on the principle of triangulation, hence simple in computation. It has the advantage of full field measurement without any scanning optics. To increase the sensitivity, phase shifting techniques are incorporated. It improves the sensitivity to 1/100th of a fringe. However an accurate positioning device is required to shift the grating. Traditional systems use mechanical devices, for example a translation stage, to perform the shifting. Due to hysteresis, backlash and wear, inaccurate shift such as unequal or incomplete shift often results leading to undesirable errors. This paper describes a phase shifting digital projection system that solves the above problem. The conventional physical grating is replaced by a computer generated grating. A computer generated grating is projected onto the test object via an LCD projector. Complete shifting of grating is performed by computer software. Besides the system is very flexible. It has the advantage of variable type and pitch of grating. In this paper, the system hardware is described in detail, followed by a performance analysis and experiments.
This paper is written to further understanding of the basic limitations of eye-in-hand range cameras for the handling of specular and transparent objects. The basic underlying assumption for a range camera is one diffuse reflex. Specular and transparent objects usually give multiple reflections interpreted as different types of `ghosts' in the range images. These `ghosts' are likely to cause serous errors during gripping operations. As the robot moves some of these `ghosts' move inconsistently with the true motion. In this paper we study, experimentally and theoretically, how the range measurements can be integrated in a consistent way during the motion of the robot. The paper is experimental with emphasis on parts with `optical complications' including multiple scattering. Occlusion is not studied in this paper. Some of our findings include: (1) For scenes with one plane mirror there is a complete understanding of the `deambiguation' by motion. Also, the coordinates of the mirror can be estimated without one single observation of the mirror itself. The other objects in the scene are not `mirror like.' (2) For polished steel cylinders, the inclination and radius can be estimated from the curved ray-traces on plane matte surfaces.
In this paper the active stereo vision system KASTOR of the IPR and its calibration technique are described. KASTOR is designed to serve as a flexible sensing device mounted on a mobile robot platform. It is used for collision avoidance, navigation based on natural landmarks, and object recognition. KASTOR has eight motorized optical and mechanical degrees of freedom. The article describes the design of KASTOR as well as the real-time vision system it is connected to. A new camera calibration technique for an active vision system is presented. The DLT-matrices (direct linear transformation) are computed directly by the observation of a reference object in a scene. A solution to the problem that the accuracy in the detection of reference points in the images is very low is presented. With the presented method reference points can be located with sub-pixel accuracy. As KASTOR alters any of its degrees of freedom, the calibration has to be updated. However, it is not possible to track a reference object when the head is mounted on a mobile robot. The new solution is to compute all possible DLT-matrices in advance and to store them in the memory. During autonomous operation the right matrices are selected according to the head-configuration. Experimental results prove that the accuracy of the new calibration technique is absolutely sufficient to use KASTOR as a sensor for the navigation of a mobile robot. This is shown in an example where the task is to navigate the mobile robot through a narrow door only based on active stereo vision sensing.
Discontinuity preserving smoothing is very important for the reliable extraction of features such as edge maps and surface patches from range images. In this paper, we propose a modification of Perona and Malik's anisotropic diffusion algorithm for range image surface characterization. Perona and Malik's scheme has several practical and theoretical difficulties. The main difficulty arises when the signal is noisy. Since this introduces in theory unbounded oscillations of the gradient. Thus the conditional smoothing introduced by the model will not help. We propose a method which avoids the difficulty and use an efficient implicit scheme to solve the resulting partial differential equation. The proposed algorithm is robust in the presence of noise and is very useful for the segmentation of range images into surface patches and edge maps. A number of experimental results obtained by applying the algorithm on synthetic and real images are presented.
One of the main problems in a 3D computer-aided reconstruction lies in the fact that the obtained result depends on the cross sections acquisition way. If the interslice distance between successive contours is greater than the in-plane resolution, we can generate a coarse representation of the physical scanned object. In this paper, we present an elastic contour interpolation scheme to refine 3D object reconstruction from serial cross sections.
Using known camera motion to estimate 3D structure of a scene from image sequences is an important task in robot manipulation and navigation. This paper presents a new way to integrate 3D structure of a scene, by tracking and fusing 2D line segment measurements over image sequences. The system is based on a cyclic process. The model structure undergoes a cycle of prediction, matching and updating. The process of tracking, matching and updating is based on Kalman filtering framework. In this work no constraint on camera motion is used and segment tracking is based on estimated structure instead of on traditional features tracking based on image motion heuristics. This approach provides for reliability, accuracy, and computational advantages. Experimental results from a camera mounted on a robot arm are presented to illustrate reliability and accuracy of the approach to integrate 3D structure of a scene.