We present a new class of quadratic filters that are capable of creating spherical, elliptical, hyperbolic and linear decision surfaces which result in better detection and classification capabilities than the linear decision surfaces obtained from correlation filters. Each filter comprises of a number of separately designed linear basis filters. These filters are linearly combined into several macro filters; the output from these macro filters are passed through a magnitude square operation and are then linearly combined using real weights to achieve the quadratic decision surface. This nonlinear fusion algorithm is called the extended piecewise quadratic neural network (E-PQNN). For detection, the creation of macro filters allows for a substantial computational saving by reducing the number of correlation operations required. In this work, we consider the use of Gabor basis filters; the Gabor filter parameters are separately optimized; the fusion parameters to combine the Gabor filter outputs are optimized using the conjugate gradient method; they and the nonlinear combination of filter outputs are included in our E-PQNN algorithm. We demonstrate methods for selecting the number of macro Gabor filters, the filter parameters and the linear and nonlinear combination coefficients. We prove that our simple E-PQNN architecture is able to generate arbitrary piecewise quadratic decision surfaces. We present preliminary results obtained for an IR vehicle detection problem.
A simple method for visualizing, understanding, interpreting, and recognizing 3D objects from 2D images is presented. It extended the linear combination methods, uses parallel pattern matching and can handle 3D rigid concave objects as well as convex objects, yet, needs only a very small number of learning samples. Some real images are illustrated, with future research discussed including more complicated images such as 3D concave and articulated objects.
A new approach to object recognition is proposed. The main concern is on irregular objects which are hard to recognize even for a human. The recognition is based on the contour of an object. The contour is obtained with morphological operators and described with a Freeman chain code. The chain code histogram (CCH) is calculated from the chain code of the contour of an object. For an eight-connected chain code an eight dimensional histogram, which shows the probability of each direction, is obtained. The CCH is a translation and scale invariant shape measure. The CCH gibes only an approximation of the object's shape so that similar objects can be grouped together. The discriminatory power of the CCH is demonstrated on machine-printed text and on true irregular objects. In both cases it is noted that similar objects are grouped together. The results of experiments are good. It has been shown that similar objects are grouped together with the proposed method. However, the sensitivity to small rotations limits the generality of the method.
THe focus of the paper is the nature and computation of features used in 3D shape representations within the context of recent research in bottom-up and top-down approaches for object recognition. Bottom-up approaches compute rich representations of low-level representations and then proceed to derive higher level features, typically using grouping heuristics and without high level knowledge. Representative of this category are computations on the primal sketch level and the 'shape from' processes. Top-down approaches specify a prototypical object shape representation and search for and verify its presence in the early representation. Representative of this category are descriptions in terms of part configurations or through deformable prototypes. The representations and associated features will be discussed in view of their usefulness for object recognition as well as for reasoning about object function. This paper compared and contrasts these approaches within the framework of a hierarchical shape representation based on a surface decomposition into the largest convex patches. It will be shown that grouping processes in bottom- up approaches directly relate to high order descriptors.
This paper describes a simple scheme for distinguishing between two very similar watermilfoils, Eurasian or Myriophyllum spicatum L. and Northern or Myriophyllum exalbescens. Leaf images were isolated from underwater images of the plants. Characteristic features consisting of the ratio of black to white pixels within the convex hull of an edge-mapped leaf, the eccentricity of the ellipse surrounding the leaf and a spatial-dependency analysis, measuring the frequency of change of pixel intensity of an edge-mapped leaf were combined to provide a measure which could be used to determine whether a leaf was Northern or Eurasian.
An important aspect of 2D object recognition is matching. Based on a generalized distance transform method, Borgefors introduced the concept of the Chamfer scheme to 2D object matching. In Chamfer matching, a successful match is obtained when a scene in a reference image is correctly matched to the corresponding scene in the target image. The successful match corresponds to the lowest matching measure computed from the sum of Chamfer distances. However, if the target image has not been correctly corrected for distortions, there may not be an ideal best fit between the reference and target images. Instead, matching measures corresponding to local minima exist, for which it is difficult to choose the correct matching measure that corresponds to the best fit. The match fit corresponding to the global minimum measure may not necessarily be the best fit as there are many sub-optimal measures that could provide equally good fits between the target and reference images. This paper advocates that aside from the Chamfer matching measure, additional considerations need to be factored in to improve on finding the true global minimum measure which corresponds to the optimal fit. Int his paper, a new method has been designed to find the optimal fit between poorly-rectified target and reference images. The new approach uses simulated annealing to improve on the matching results. Simulated annealing is a powerful stochastic optimization technique for nonlinear problems. In our novel technique, the energy function for simulated annealing comprises two terms: a smoothness constraint term and the Chamfer matching measure term. The high computational burden of simulated annealing is reduced by using edge information for the matching process. Results are presented to illustrate the new matching technique.
A new image filtering algorithm useful for line detection and aerial image matching with multiple target occurrences is presented. The filtering algorithm is based upon concepts of similar pixels and a pixel significance. A matching method of the complexity of O(n2), n: number of potential candidates, is addressed. A temporal evaluation of the corresponding sequential algorithm on DEC ALPHA 4/166 is provided.
This paper investigates the recognition performance of a geometric matching approach to the recognition of free-form objects obtained from range images. The heart of this approach is a closest point matching algorithm which, starting from an initial configuration of two rigid objects, iteratively finds their best correspondence. While the effective performance of this algorithm is known to depend largely on the chosen set of initial configurations, this paper investigates the quantitative nature of this dependence. In essence, we experimentally measure the range of successful configurations for a set of test objects and derive quantitative rules for the recognition strategy. These results show the conditions under which the closest point matching algorithm can be successfully applied to free-form 3D object recognition and help to design a reliable and cost-effective recognition system.
A parallel algorithm for affine matching of aerial images using the dynamic programming as a means for images comparison under real time constrains is addressed. It includes not only data flow approach in order to speed up its execution, but also a new cost function allowing for very precise evaluation of images resemblance. Temporal evaluations of the corresponding sequential algorithm, and of the proposed parallel algorithm on a bi-SPARC 20/60 workstation are provided.
A thermography based cutaneous blood flow monitoring system prototype for physiological studies and micro surgery was constructed. The prototype has already been used to analyze blood flow during several operations and physiological experiments. The system filters raw frames from a far IR camera and compresses them, so that we can store images into a small fraction of the original size for filing and later inspection by our system or other suitable Windows programs. Multicolor display combined to image enhancement filtering substantially helps clinical personnel in the interpretation of thermal image information.
The face recognition, one of the most important ability of intelligent vision, are discussed in this paper. A new idea of model-based processing are presented. The algorithms of face modeling, global matching and fine matching, and feature points extraction etc. are given. Finally, a rapid and robust scheme for human face recognition are resulted.
Tracking and recognizing of human body postures in image sequences is a challenging task in computer vision and pattern recognition, far from being solved. Most previous research in this area utilizes edge information. However, many edge-based methods are unstable because image edges are very sensitive to noise and usually cannot be completely and/or precisely extracted. To overcome these difficulties, we developed a region-based method for recognizing and tracking postures. We also suggested a 'virtual-double- camera' configuration to take images from two different view angles with respect to the human body. Using fused information from different images to direct the posture recognizing process, we achieved quite robust results. Principles of our method and examples of experiment results are presented in this paper.
This paper proposes an optimized model of the laser ranging system in the case of colored noise, which is based on the theory of maximum-likelihood estimation. With this model the ranging error is analyzed in detail for colored noise in different cases in which the optical receiver is approximated as first, second and third order low pass filter, respectively. The theoretical results are further verified by means of more detailed numerical simulations, where some characterizing parameters such as signal-noise- ratio, bandwidth of the optical receiver and laser power etc., are taken into account. According to theoretical and simulated results a ranging system with a novel delay-locked loop is experimentally realized and investigated. This paper will also present measurement results to demonstrate the practical feasibility of the optimized laser ranging system.
We will present an analog optical method for measurement of the distance to an object by using its stochastical structure. This is achieved by dividing the lens pupil into tow halves. The light from both halves of the pupil are correlated separately by a rectangular grating and integrated by two photodiodes. The correlation is implemented by an oscillating array of prisms. Depending on the position of the focus, there will be a phase difference between the two time signals, which allows calculation of the distance to the object. By using an array of lenses and an array of photodiodes, the solid angle that can be seen by the lens will be divided into several elements. This makes the determination of the average distance in each solid angle element possible and ensures extracting multiple distance data. The main advantages of this method are: 1) no structured illumination is required, i.e. it is a passive method, 2) it is a real-time method and 3) the average distance to all objects in one solid angle element is calculated.
A neural network approach that automatically maps measured 2D image coordinates to 3D object coordinates for shape reconstruction is described. The appropriately trained radial-basis function network eliminates the need for rigorous calibration procedures. The training and test data are obtained by capturing successive images of the intersection points between a projected light line and horizontal strips on a calibration bar. Once trained, the 3D object space coordinates that correspond to an illuminated pixel in the image plane is determined from the neural network. In addition, the generalization capabilities of the neural network enable the intermediate points to be interpolated. An experimental study is presented in order to demonstrate the effectiveness of this approach to 3D measurement and reconstruction.
This paper considers the work done by Thomas Marill in a series of papers on the recognition of two-dimensional wire- frame figures as 3D objects without the use of models. Marill discovered that if one minimizes the standard deviation of the angles found at each vertex of the figure, the likelihood of the computer interpretation of the figure matching the human interpretation is much higher than might be expected a priori. Here it is observed that the human mind's tendency to simplify inputs and find patterns even where there are none might be at least partly responsible for the observed phenomenon. It is conjectured that if this is indeed the case, it should be possible to get similar behavior from minimizing the standard deviation of other features. In particular, segment length presents itself as being an excellent choice of a test feature--it is very different from angles and is less computationally intensive. Thus another approach is considered: minimum standard deviation of segment magnitudes is explored in lieu of minimum standard deviation of angles. Marill's original experiment is then carefully repeated with several additional figure s that were deliberately chosen not to have all equal angles. The experiment is described in detail, and all the failures of Marill's algorithm are carefully studied and explained. The problem, of straight angles is touched upon and the difficulties of solving it as a special case are briefly discussed. A new program is the written to minimize the standard deviation of segment magnitudes instead of minimizing the standard deviation of angles. This program is run on the same test figures as the original algorithm. Its successes and failures are noted and explained, and its behaviors are studied. The results of the two algorithms zero then compared and the differences noted. It is of particular interest that the two algorithms have different areas of failure, suggesting that a combined algorithms should be able to produce better results than either one alone. This and some other avenues of future work are suggested. Finally, some comments about the basic behaviors of both algorithms are made.
We discuss the uniqueness of 3D shape recovery of a polyhedron from a single shading image, and propose an approach to uniquely determine the concave shape solution by using interreflections as a constraint. We show that if interreflection distribution is not considered, multiple convex shape solutions usually exist for a pyramid with three or more visible facets. However, if interreflection distribution is used as a constraint to limits the shape of polyhedron, polyhedral shape can be uniquely determined. Interreflections, which were considered to be deleterious in conventional approaches, are an important constraint for shape-from-shading.
Knowledge of the bidirectional optical scatter function of surfaces comprising a 2D scene is used to predict and optimize characteristics of a digitally captured image of the scene. Mathematical expressions that describe certain image properties including contrast, brightness, glare, etc., are developed from the scatter functions. It is these mathematical expressions that are maximized or minimized with respect to a set of coordinates used to describe the orientation of a collimated light source and imaging system relative to the scene. In this way, it is shown how illumination can be analytically prescribed to attenuated or accentuated certain properties of digital images without the trial-and-error procedure currently in practice.
Current developments in the field of automated assembly systems show an increasing interest in systems that are flexible in both CAD based product design and CAD based assembly. For the application addressed in this paper, coupling the vision system and a CAD database is of prime importance in order to achieve the required automatic reconfiguration of the assembly cell when new parts are defined. This paper presents a 3D CAD-based vision system for obtaining 3D data about the scene. After the images are acquired, edge detection is preformed and the detected edges are stored as chaincodes. Following that, a stereo vision algorithm is applied for finding the recognition features. The output are lists of features that are combined into a 3D wireframe representing the scene. The recognition algorithm takes the observed wireframe outputs from the stereo vision system, and compares them with a set of model wireframes derived from previous models, in order to select the 'best match', where the previous models used for recognition are derived from a product data model (PDM). The PDM is an interface between the CAD database and the recognition system, which allows the automatic generation of new models when new parts are introduced into the system. The vision system described in this paper is part of an intelligent robotic assembly cell, where the aim is to build a flexible intelligent robotic assembly cell, such that robots would be able to automatically assemble a random variety of small- batch products.
Robust range estimation is one of the most important tasks in mobile robotics. This paper presents a new optical arrangement for utilizing the previously known 'depth from defocus' principle. The arrangement makes it possible to apply standard video lenses and camera modules for making a compact range camera system. Real-time processing is made possible with a single-board DSP card.
We consider two problems: first, the problem of detection of objects in images of 3D planetary terrain; second, the task of finding corresponding points for stereo matching of this type of images. We propose an approach that is simultaneously applicable to both problem areas. The approach uses a bank of filters based on different 2D Gabor functions. By detection we mean locating multiple classes of targets with distortions present and in a cluttered background. It is desirable to minimize false alarms due to clutter, image noise, and the presence of other objects. In the scenario of stereo matching, the pixel location where we search for the corresponding point is the target, while all ambiguous matches are non-targets. In this work, we use Gabor filter banks in two versions. First, for fast detection of targets, the single filter outputs of the bank are fused by linear combination. Second, for stereo matching, the outputs of the filters form a feature vector used to find the best match. We refer to both types of filters as a macro Gabor filter. In the linear combination case, the filter bank forms a single filter. This filter is correlated with an input image, followed by local maximum detection, and thresholding to yield the finally detected targets. The new aspects are: combining real and imaginary parts of GFs into one filter using centered on off-center GFs, separately optimizing the fusion coefficients of the GFs by controlling the shape of the correlation outputs of each filter alone, and the application to two different scenarios.
Our goal is to match primitives of a pair of images, thereby solving the correspondence problem, in order to estimate depths of 3D scene points from the relative distance between matched features. We propose a feature-based approach to solve the correspondence problem by minimizing an appropriate energy function where constraints on radiometric similarity and projective geometric invariance of coplanar points are defined. The method can be seen as a correlation based approach which takes into account the projective invariance of coplanar points in computing the optimal matches.
3D reconstruction of highly textured surfaces on unvegetated terrain is of major interest for stereo vision based mapping applications. We describe a prototype system for automatic modeling of such scenes. It is based on two frame CCD cameras, which are tightly attached to each other ensuring constant relative orientation. One camera is used to acquire known reference points to get the exterior orientation of the system, the other records the surface images. The system is portable to keep image acquisition as short as possible. Automatic calibration using the images acquired by the calibration camera permits the computation of exterior orientation parameters of the surface camera via a transformation matrix. A robust matching method providing dense disparities together with a flexible reconstruction algorithm renders an accurate grid of 3D points on arbitrarily shaped surfaces. The results of several stereo reconstructions are merged. Projection onto the global shape allows easy evaluation of volumes, and thematic mapping with respect to the desired surface geometry in construction processes. We report on accuracy and emphasize on the practical usage. It is shown that the prototpye system is able to generate a proper data set of surface descriptions that is accurate and dense enough to serve as documentation, planning and accounting basis.
Based on the concept of object- and behavior-oriented stereo vision, a method is introduced which enables a robot manipulator to handle two distinct types of objects. It uses an uncalibrated stereo vision system and allows a direct transition from image coordinates to motion control commands of a robot. An object can be placed anywhere in the robot's 3D work space which is in the field of view of both cameras. The objects to be manipulated can either be of flat cylindrical or elongate shape. Results gained from real- world experiments are discussed.
This paper presents a hierarchical method, based on a deterministic variant of the self-organizing map, that provides an elegant solution for automated surface processing, e.g. for robot painting and sand-blasting. Given a set of data points in arbitrary order from the object surface, the proposed method is able to generate a path, where the robot hand position and its direction are optimized using separate criteria, and the tool path is smooth and covers the object uniformly. Input data may come from a laser measurement system, CAD model, digital camera, or from human assisted object digitizing system. The algorithm is reliable and easy to implement, and a good alternative for costly manual training of a robot.
One of the most important properties of neural networks is generality, as the same network can be trained to solve rather different tasks, depending on the training data. This is also one of the most prominent problems when practical real world problems are solved by neural networks, as existing domain knowledge is difficult to incorporate into the models. In this contribution we present methods for adding prior knowledge to neural network modeling. The approach is based on training the knowledge on the network of hard-coding the knowledge in advance to the connections or weights. The knowledge is specified as target values or constraints for different order partial derivatives of the network. This approach can be viewed as a flexible regularization method that controls directly the characteristics of the resulting mapping. The proposed algorithms have been implemented in a neural network modeling tool that supports modular network design and domain knowledge representation with fuzzy-like terms. In this paper we present examples of the effect of incorporating different degrees of information from the modular structure and the functional behavior of the target processes in the model building and training.
We propose a novel method for improving association abilities of an optical associative memory by using wavelet transform. On the basis of an edge-enhancement and multiscaling properties of the wavelet transform, a sharp correlation peak between a partial input and a stored datum can be produced. Therefore, higher association abilities could be performed.
Biologically plausible model of the system with an adaptive behavior in a priori environment and resistant to impairment has been developed. The system consists of input, learning, and output subsystems. The first subsystems classifies input patterns presented as n-dimensional vectors in accordance with some associative rule. The second one being a neural network determines adaptive responses of the system to input patterns. Arranged neural groups coding possible input patterns and appropriate output responses are formed during learning by means of negative reinforcement. Output subsystem maps a neural network activity into the system behavior in the environment. The system developed has been studied by computer simulation imitating a collision-free motion of a mobile robot. After some learning period the system 'moves' along a road without collisions. It is shown that in spite of impairment of some neural network elements the system functions reliably after relearning. Foveal visual preprocessor model developed earlier has been tested to form a kind of visual input to the system.
The challenging task of automated handling of variable objects necessitates a combination of innovative engineering and advanced information technology. This paper describes the application of a recently developed control strategy applied to overcome some limitations of robot handling, particularly when dealing with variable objects. The paper focuses on a novel approach to accommodate the need for sensing and actuation in controlling the pickup procedure. An experimental robot-based system for the handling of soft parts, ranging from artificial components to natural objects such as fruit and meat pieces was developed. The configuration comprises a modular gripper subsystem, and an industrial robot as part of a distributed control system. The gripper subsystem features manually configurable fingers with integrated sensing capabilities. The control architecture is based on a concept of decentralized control differentiating between positioning and gripping procedures. In this way, the robot and gripper systems are treated as individual handling operations. THis concept allows very short set-up times for future changes involving one or more sub-systems.
In this paper, we present a low-cost vision based system for handling of hazardous waste in an unstructured environment. The prototype system described shows the feasibility of sorting and removal of randomly mixed and oriented flexible objects from a bin using inexpensive and widely available hardware. Our experiments show that the utilization of even the simplest constructs is sufficient for the manipulation of flexible objects.
As one realization out of the class of behavior-based robot architectures a specific concept of situation-oriented behavior-based navigation has been proposed. Its main characteristic is that the selection of the behaviors to be executed in each moment is based on a continuous recognition and evaluation of the dynamically changing situation in which the robot is finding itself. An important prerequisite for such as approach is a timely and comprehensive perception of the robot's dynamically changing environment. Object-oriented vision as proposed and successfully applied, e.g., in freeway traffic scenes is a particularly well suited sensing modality for robot control. Our work concentrated on modeling the physical objects which are relevant for indoor navigation, i.e. walls, intersections of corridors, and landmarks. In the interest of efficiency these models include only those necessary features for allowing the robot to reliably recognize different situations in real time. According to the concept of object- oriented vision recognizing such objects is largely reduced to a knowledge-based verification of objects or features that may be expected to be visible in the current situation. The following results have been achieved: 1) By using its vision system and a knowledge base in the form of an attributed topological map the robot could orient itself and navigate autonomously in a known environment. 2) In an unknown environment the robot was able to build, by means of supervised learning, an attributed topological map as a basis for subsequent autonomous navigation. 3) The experiments could be performed both under unmodified artificial light and under natural light shining through the glass walls of the building.
Automated unmanned guided vehicles have many potential applications in manufacturing, medicine, space and defense. A mobile robot has been designed for the 1996 Automated Unmanned Vehicle Society competition which was held in Orlando, Florida on July 15, 1996. The competition required the vehicle to follow solid and dashed lines around an approximately 800 ft. path while avoiding obstacles, overcoming terrain changes such as inclines and sand traps, and attempting to maximize speed. The purpose of this paper is to describe the algorithm developed for the line following. The line following algorithm images two windows and locates their centroid and with the knowledge that the points are on the ground plane, a mathematical and geometrical relationship between the image coordinates of the points and their corresponding ground coordinates are established. The angle of the line and minimum distance from the robot centroid are then calculated and used in the steering control. Two cameras are mounted on the robot with a camera on each side. One camera guides the robot and when it loses track of the line on its side, the robot control system automatically switches to the other camera. The test bed system has provided an educational experience for all involved and permits understanding and extending the state of the art in autonomous vehicle design.
This paper proposes a new localization method for indoor mobile robots. Using two cameras and one laser range finder on board a TRC mobile robot, the initial position and pose of the robot can be obtained by multisensor fusion and scene matching based on geometric hashing. No correspondence calculation and special pattern recognition are needed during the scene matching. This localization method can be implemented in five stages: 1) Model the indoor environment. Some selected indoor environment features are firstly modeled off-line into hashing tables. 2) Perform system calibration and information fusion from two cameras and the range finder. 3) Extract the vertical edge points corresponding to the horizontal scanning plane of the 2D laser range finder from the scene images and transform them into geometric invariants. 4) Perform scene matching and matching verification by geometric hashing and model back- projection method respectively. 5) Perform position and pose estimation by a least square fit method. Experimental results show that the accuracy and reliability of this localization method ar quite high.
Fuzzy logic has been promoted recently by many researchers for the design of navigational algorithms for mobile robots. The new approach fits in well with a behavior-based autonomous systems framework, where common-sense rules can naturally be formulated to create rule-based navigational algorithms, and conflicts between behaviors may be resolved by assigning weights to different rules in the rule base. The applicability of the techniques has been demonstrated for robots that have used sensor devices such as ultrasonics and infrared detectors. However, the implementation issues relating to the development of vision-based, fuzzy-logic navigation algorithms do not appear, as yet, to have been fully explored. The salient features that need to be extracted from an image for recognition or collision avoidance purposes are very much application dependent; however, the needs of an autonomous mobile vehicle cannot be known fully 'a priori'. Similarly, the issues relating to the understanding of a vision generated image which is based on geometric models of the observed objects have an important role to play; however, these issues have not as yet been either addressed or incorporated into the current fuzzy logic-based algorithms that have been purported for navigational control. This paper attempts to address these issues, and attempts to come up with a suitable framework which may clarify the implementation of navigation algorithms for mobile robots that use vision sensor/s and fuzzy logic for map building, target location, and collision avoidance. The scope for application of this approach is demonstrated.
Today's computer vision applications often have to deal with multiple, uncertain, and incomplete visual information. In this paper, we apply a new method, termed 'active fusion', to the problem of generic object recognition. Active fusion provides a common framework for active selection and combination of information from multiple sources in order to arrive at a reliable result at reasonable costs. In our experimental setup we use a camera mounted on a 2m by 1.5m x/z-table observing objects placed on a rotating table. Zoom, pan, tilt, and aperture setting of the camera can be controlled by the system. We follow a part-based approach, trying to decompose objects into parts, which are modeled as geons. The active fusion system starts from an initial view of the objects placed on the table and is continuously trying to refine its current object hypotheses by requesting additional views. The implementation of active fusion on the basis of probability theory, Dempster-Shafer's theory of evidence and fuzzy set theory is discussed. First results demonstrating segmentation improvements by active fusion are presented.
This paper addresses the problem of local navigation for an autonomous guided vehicle (AGV) in a structured environment that contains static and dynamic obstacles. Information about the environment is obtained via a CCD camera. The problem is formulated as a dynamic feedback control problem in which speed and steering decisions are made on the fly while the AGV is moving. A decision element (DE) that uses local information is proposed. The DE guides the vehicle in the environment by producing appropriate navigation decisions. Dynamic models of a three-wheeled vehicle for driving and steering mechanisms are derived. The interaction between them is performed via the local feedback DE. A controller, based on fuzzy logic, is designed to drive the vehicle safely in an intelligent and human-like manner. The effectiveness of the navigation and control strategies in driving the AGV is illustrated and evaluated.
If the nuclear retinal layers of the human eye are interpreted as 3D phase gratings, the aperture effects in human vision, namely the Stiles-Crawford effects I and II and trichromatic vision, can be explained in terms of interference optics. A multilayer grating situated in the image plane of the eye fixes the direction of the diffraction orders through its 3D geometry. The ratio between (lambda) ' or (nu) ' of the light cone incident at an angle and (lambda) and (nu) of the cone incident at 0 degrees can thus be differentiated as a brightness, hue and saturation shift in 3 chromatic RGB diffraction orders in the near field behind the grating, thus providing information on the relative position, distance, 3D shape and movement of objects in the 3D space. The direction cosine of the light cones in the von Laue equation means that lateral distances and movements relative to the visual axis and longitudinal movements relative to the focused distance give rise to the aperture effects and a space-time microrelief of the 3D world. This is regarded as an optical basis for monocular spatial vision and motion detection. Temporal patterns in human vision therefore produce spatial patterns and movement information and vice versa. The intrinsic oscillations of the nuclear layers transform the constant interference-optical object representations become possible. The interference-optical local lateral connections and the retinal feedback NN structure allow the possibility of parallel-optical image correlation in real time. The psychophysical transformations in the retinal 3D grating correspond to an image transformation into a reciprocal grating; the retinal clock is set adaptively by means of (lambda) max of the 111 diffraction order and via the trichromatic white standard.
This paper focuses on simulating a model of 3D-color vision system based on synthetic nonlinear modulation. The model is set up to recover 3D and color properties from a colored object through evaluating several rf-interferograms sampled by a black-white CCD camera. Colorizing a black-white CCD camera in a 3D-vision system implies high resolution. The synthetic nonlinear modulation is different from other 3D- color vision systems. Different colored lights are synchronously modulated with characterizing rf-frequencies to detect a 3D object. Recovering colors is equally treated as recovering 3D information. Optical filters are not used. Instead, a suitable algorithm is adopted for recovering color and 3D information. Since a modulated optical rf- signal is used as a detecting probe rather than an un- modulated optical wave, higher orders of harmonic signals may be caused by electrical or optical components. Although linear matching techniques are adapted to prevent the problem, it is necessary to simulate the vision system for predicting its performances. An 8-bit black-white CCD camera with different signal to noise ratios is taken as an example in the simulation. 3D color properties are evaluated for the system in the case of nonlinearity and noise. An optimized result is obtained for realizing this vision system.
This paper describes the use of color segmentation to assist the detection of blemishes and other defects on fruit. It discusses the advantages and disadvantages of different color spaces including RGB and HSI and different supervised learning techniques including maximum likelihood, nearest neighbor and neural networks. It then compares the performance of various combinations of these on the same training and test set. A selection of images segmented by the best combination is presented and conclusions made.
The ability to assess the severity of dermatoses by measuring the area of involvement is important in both clinical practice and research, but it has been shown that physicians, nurses and other groups are unable to do this accurately. A common practice in current use is the 'rule of nine' method, but wide variations have been found between observers' estimates. The purpose of this work was to test and demonstrate the feasibility of a computer vision technique for measuring the area of involvement in skin diseases by developing a system for psoriasis area assessment form slides, which can be operated in an image processing environment. The exact percentage of the slide area involved varied from 1 percent to 59 percent, thus providing realistic material for the system. The system proved sufficiently accurate, and the techniques evidently have a potential for inclusion as parts of a more accurate and rapid method for area measurement in the case of skin diseases.
A new feature space trajectory (FST) description of 3D distorted views of an object is advanced for active vision applications. In an FST, different distorted object views are vertices in feature space. A new eigen-feature space and Fourier transform features are used. Vertices for different adjacent distorted views are connected by straight lines so that an FST is created as the viewpoint changes. Each different object is represented by a distinct FST. An object to be recognized is represented as a point in feature space; the closest FST denotes the class of the object, and the closest line segment on the FST indicates its pose. A new neural network is used to efficiently calculate distances. We discuss its uses in active vision. Apart from an initial estimate of object class and pose, the FST processor can specify where to move the sensor to: confirm class and pose, to grasp the object, or to focus on a specific object part for assembly or inspection. We advance initial remarks on the number of aspect views needed and which aspect views are needed to represent an object. We note the superiority of our eigenspace for discrimination, how it can provide shift invariance, and how the FST overcomes problems associated with other classifiers.
Visual tracking is a vital task in active vision research, traffic surveillance, face following, robotics, and many other applications. This paper investigates the principles of finding optimal tracking performance depending on image tesselation and window size. Square windows reach best performance when image sampling time equals image processing time. This is valid in all cases where the algorithm investigates each pixel in the window and for both tracking with fixed or steered camera/s. Linear windows can improve tracking performance, though performance is limited, too. Best performance yield space-variant image tessellations. Image pyramids or log-polar sampled images show steadily increasing tracking performance with increasing sensor size. The reason is that the resolution drops as sensor size increases.
Active vision is identified by a closed loop linking sensing with acting. Thus, an active vision system's behavior is directly determined by what it senses. To date however, the responses produced by active vision systems have tended to be relatively low-level, generally designed to facilitate improved sensing, by enhancing the duration or speed of object tracking, for example, or optimizing the focused application of more intensive image processing. This is probably adequate if the active vision system is designed as a front end to other processes or to specialized application systems, or if it is a demonstration in support of a theoretical vision model. However, this leaves unanswered the problems of i) how to select an appropriate action when many different alternatives are available, and ii) how best to modify the behavioral repertoire of the system. These problems are especially important in two situations: firstly, when an autonomous system faces a novel situation and must respond adaptively without the benefit of a priori knowledge, and secondly, when systems attempt higher levels of perception and response, and links between the absolute properties of the incoming image data and the actual objects of perception become increasingly attenuated. This paper discusses methods for linking learning with active vision so that the behavior of the system is optimized over time for the achievement of goals. We argue the necessity of system goals in learning vision systems, and discuss methods for propagating goals through all levels of loose hierarchies. In the last section we outline an architecture in which high and low level perception operate interactively and in parallel.
This paper describes an ultrasonic spatial localization system for a sonometric probe, to build 3D images of a fetus. The main objective of such a system is a medical diagnostic help. A method to improve accuracy of ultrasonic telemeter is developed and gives us encouraging results. We arrive to measure a distance with accuracy around 0.6 mm for 1.5 meters ranging. To aim a localization accuracy less than 0.1 mm, we work on this system with better techniques like programmable analogic device and numerical systems.
The term 'active vision' was first used by Bajcsy at a NATO workshop in 1982 to describe an emerging field of robot vision which departed sharply from traditional paradigms of image understanding and machine vision. The new approach embeds a moving camera platform as an in-the-loop component of robotic navigation or hand-eye coordination. Visually served steering of the focus of attention supercedes the traditional functions of recognition and gaging. Custom active vision platforms soon proliferated in research laboratories in Europe and North America. In 1990 the National Science Foundation funded the design of a common platform to promote cooperation and reduce cost in active vision research. This paper describes the resulting platform. The design was driven by payload requirements for binocular motorized C-mount lenses on a platform whose performance and articulation emulate those of the human eye- head system. The result was a 4-DOF mechanisms driven by servo controlled DC brush motors. A crossbeam supports two independent worm-gear driven camera vergence mounts at speeds up to 1,000 degrees per second over a range of +/- 90 degrees from dead ahead. This crossbeam is supported by a pan-tilt mount whose horizontal axis intersects the vergence axes for translation-free camera rotation about these axes at speeds up to 500 degrees per second.
A special case of civilian active vision has been investigated here, namely, a vision system by car anti-fog headlamps. A method to estimate the light-engineering criteria for headlamp performances and simulate the operation of the system through a turbid medium, such as fog, is developed on the base of the analytical procedures of the radiative transfer theory. This method features in include the spaced light source and receiver of a driver's active vision system, the complicated azimuth-nonsymmetrical emissive pattern of the headlamps, and the fine angular dependence of the fog phase function near the backscattering direction. The final formulas are derived in an analytical form providing additional convenience and simplicity for the computations. The image contrast of a road object with arbitrary orientation, dimensions, and shape and its limiting visibility range are studied as a function of meteorological visibility range in fog as well as of various emissive pattern, mounting, and adjustment parameters of the headlamps. Optimization both light-engineering and geometrical characteristics of the headlamps is shown to be possible to enable the opportunity to enhance the visibility range and, hence, traffic safety.
Fault diagnostics of rotating machines requires the concept of novelty. For a set of similar new machines, coming form the assembly line, the typical features of vibration differ from one machine to another. Consequently, one must make a specific model for every machine and test if new, possibly harmful, vibrations will occur during the use of the machine. The classification system must discriminate between familiar and unfamiliar patterns with inclination to reject unseen patterns rather than accept badly distorted familiar ones. In this paper we define the problem and present a solution, based on a self-organizing map. It allows us to cluster different normal runtime characteristics of machines and classify new measurements. Detection of novelty is made by examining the difference between class features of old and new observations.
Rapid detection of independently moving objects by a moving camera system is essential for automatic target recognition (ATR). The image analysis performed by the ATR system must be able to clearly distinguish between the image flow generated by the changing position of the camera and the movement of potential targets. In this paper, a qualitative motion detection algorithm that can deal with imprecise knowledge of camera movement is described. This algorithm is based on the notion that the true velocity at any point on an image, arising from a camera moving through a rigid environment, will lie on a 1D locus in the vx - v(subscript y$. velocity space. Each point on this line maps a constraint circle that represents all components of the true velocity that are parallel to the direction of the spatial gray-scale gradient. If the camera motion is known, then an independently moving target can be detected because the corresponding gradient-parallel components of velocity are unlikely to fall in the constraint region arising from the union of all the circles generated by the points along the 1D locus. The algorithm is made more robust by modeling the projected camera velocities as radial fuzzy sets with supports in the 2D velocity space. Approximate knowledge of the translational and rotational components of camera motion can be used to define the parameters of the corresponding fuzzy constraint region. In terms of detecting independently moving targets, the algorithm tags the gradient-parallel velocity vectors that violate this fuzzy constraint on camera motion. An estimate of the true velocity is computed only at the pixel locations that violate the constraint. To illustrate this approach, a simulation study involving a translating camera system and an independently moving target is presented.
We will address the problem of segmenting single images into parts corresponding to those intuitively provided by human perception. To this effect a resistive network analogue of the edge image is used, in which electric resistances correspond to edge segments. Compact contours including given segments can then be found by introducing current sources in these segments, and following the path of largest current. In order to overcome the artifacts of edge finders and apply to partially occluded contours, the method requires the detection of gaps in L-junctions and collinearities, and the introduction of virtual resistances at these locations. Since contours must be found serially, the segmentation can be guided by a knowledge-based attentional mechanism, as seems to happen in human perception. The method also offers a natural framework for fusing information from various image understanding mechanisms. When a contour containing a given seed segment is sought, or entering into perceptually significant relationships with the seed segment, such as symmetry, skew symmetry or parallelism. The electric circuit part of the method can be implemented as a very simple neural network, which raises intriguing questions about the existence of such a structure in the human visual system.
Feature point tracking from an image sequence is an important step in many methods of image understanding including shape from motion and mobile robot navigation. Assuming an affine camera model, this paper proposed a new tracking method using affine invariance. Any 3D feature point can have unique coordinates with reference to an affine basis and the affine coordinates are invariant to affine transformation: camera rotations and translations. The images of a set of 4 control points defining an affine basis are tracked in an image sequence using a conventional method. Under this assumption, given a feature point in any image, its locus in the first image is a straight line. The straight lines of the corresponding features from the image sequence will intersect at a point, the corresponding feature point, in the first image. A Hough transform technique is designed to detect this intersection point and track the corresponding feature points in the image sequence. This technique is suitable for tracking a large number of feature points. Its performance is practically unaffected by missing features in some images and large motion steps. Accurate and reliable results had been obtained in real experiments using the method.
Specifications and design are provided for a low-cost scatterometer build to study the relationship between optical scatter and image formation. Design principles and considerations are discussed as they relate to digital imaging. Specifications and instrument limitations both mechanical and optical are examined and certain design choices are explained. The process of calibrating the instrument is considered and methods of noise suppression both electronic and optical are discussed. Finally, some measured data is presented.
This paper will develop a hybrid robot arm for propeller blade grinding. The grinding work requires a high stiffness robot arm to reduce deformation and vibration. The hybrid robot arm is proposed by combining a parallel and a serial mechanism. The parallel mechanism supports a moving platform by three active and one passive leg so that it provides high stiffness but small workspace. To compensate the small workspace, the serial mechanism, which is the wrist of the robot arm, is mounted in the moving platform. Therefore, the robot arm has the large range of workspace as well as high stiffness.
A camera system to be used in a tactile vision aid for blind persons has been built and tested. The camera is based on individual adaptive photoreceptors modelled after the biological example and realized in standard CMOS technology. The system exhibits a large dynamic range of approximately 7 orders of magnitude in incident light intensity and a pronounced capability to detect moving objects. It is planned to connect such a camera to a set of mechanical actuators which will transmit processed information about the image to the skin of a person. This paper describes simulations and measurements carried out with single adaptive pixels as well as results obtained with two complete prototype camera systems.
Augmented reality is a term used to describe systems in which computer-generated information is superimposed on top of the real world; for example, through the use of a see- through head-mounted display. A human user of such a system could still see and interact with the real world, but have valuable additional information, such as descriptions of important features or instructions for performing physical tasks, superimposed on the world. For example, the computer could identify and overlay them with graphic outlines, labels, and schematics. The graphics are registered to the real-world objects and appear to be 'painted' onto those objects. Augmented reality systems can be used to make productivity aids for tasks such as inspection, manufacturing, and navigation. One of the most critical requirements for augmented reality is to recognize and locate real-world objects with respect to the person's head. Accurate registration is necessary in order to overlay graphics accurately on top of the real-world objects. At the Colorado School of Mines, we have developed a prototype augmented reality system that uses head-mounted cameras and computer vision techniques to accurately register the head to the scene. The current system locates and tracks a set of pre-placed passive fiducial targets placed on the real-world objects. The system computes the pose of the objects and displays graphics overlays using a see-through head-mounted display. This paper describes the architecture of the system and outlines the computer vision techniques used.