A machine-learning technique is used to enable a robot, consisting of a vision system and a manipulator arm, to learn by observation, and then perform, a variety of block-stacking assembly operations. The robot can perform these tasks in spite of variation of the initial locations of the blocks, imprecision in its movements, and initial uncertainty about its camera location and arm geometry. Important aspects of this work are how continuous phenomena are perceived and mapped into discrete representations, and how the arm and camera are coordinated with each other.
This paper describes ongoing research into methods to allow a mobile robot to effectively function in a manufacturing environment; specifically, generation of the ballistic motion phase of the docking behavior. This overall docking behavior causes the robot to move to a workstation and park in an appropriate position. The docking behavior consists of two distinct types of motion. Ballistic motion rapidly moves the robot to an area near the dock where recognition of the dock triggers the slower, more accurate orienting motion for the final positioning. The ballistic motion is supported with two simple low-level behaviors: a phototropic (light seeking) behavior and a temporal (motion) detection behavior. The phototropic or temporal activity perceptual strategy maneuvers the vehicle toward the bright light or the abundant motion usually associated with workstations. These vision algorithms have been selected because strict knowledge of the initial position of the dock is not needed and each requires limited computational resources. This system has been implemented and shown to successfully generate ballistic motion in support of docking in typical manufacturing environments.
As mobile robots are used in more uncertain and dangerous environments, it will become important to design them so that they can survive falls. In this paper, we examine a number of mechanisms and strategies that animals use to withstand these potentially catastrophic events and extend them to the design of robots. A brief survey of several aspects of how common cats survive falls provides an understanding of the issues involved in preventing traumatic injury during a falling event. After outlining situations in which robots might fall, a number of factors affecting their survival are described. From this background, several robot design guidelines are derived. These include recommendations for the physical structure of the robot as well as requirements for the robot control architecture. A control architecture is proposed based on reactive control techniques and action-oriented perception that is geared to support this form of survival behavior.
Multiple partial representations greatly facilitate the construction of complex autonomous robots. Such representations are desirable since they are easy to develop and quick to compute. We describe how to select various representations, how to string them together to perform specific tasks, and how to provide the resulting distributed system with an overall goal. As with the original Subsumption Architecture, our approach allows the incremental development of a system by the accretion of successive performance layers. The application of these techniques is illustrated by the analysis of an existing mobile robot which collects soda cans in an unstructured office environment.
The problem of tracking and capturing a moving obstacle in two dimensions by a mobile robot equipped with a wide-beam sonar system is discussed. In a practical system the range and azimuth measurements contain random errors and are available over a limited region. This pursuer/prey problem is treated as a feedback control system for which the pursuer action is dependent on the observed measurements. Two measures of performance are considered: the capture probability and the mean capture time when capture occurs. The lower bound for the mean capture time is determined from game theory, which assumes complete information about the prey. Strategies employing either qualitative (prey is to the left or right) or quantitative (range and azimuth to prey) information are implemented and compared. It is found that qualitative information is sufficient for prey capture, although quantitative information allows more efficient prey capture.
Self-awareness is the ability for a mobile robot to, on its own, detect and deal with operational abnormalities. Such an ability is needed for the robot to operate robustly in an unpredictable environment. For the robot to be self-aware it must be capable of sensing its own internal state (such as detecting hardware failures or a drop in battery charge level), reacting quickly to such internal inputs, and infer the implications of the resulting actions. At the lowest level, the ability to detect hardware failures enables a reactive robot to substitute functionally equivalent behaviors for those that no longer work because of the failures. If the failures are serious, the robot should be able to abort the current and initiate a new task/mission to correct the problem. At the highest level, self-awareness would give the robot a sense of its `well-being,' limitations, capabilities, and needs. This paper describes how the self-awareness ability is being implemented on a mobile robot, called SmartyCat, that uses high level reasoning to coordinate and specialize its low level reactive behaviors to the mission goal. The multilevel mechanisms, ranging from behavior substitution to mission replanning, needed for self- awareness are discussed.
This paper describes a method to automatically generate optimal, collision-free maneuvers for a vehicle to follow. The technique requires information about the vehicle dimensions, wheel layout and turning radius, so that the vehicle's kinematic capabilities can be computed automatically. Locations of obstacles in the maneuvering environment must also be given. The maneuver computed is kinematically feasible, and is given in terms of the control parameters of the vehicle, namely steering angle and forward/backward motion. The method includes techniques for transforming obstacles, performing cost wave propagation, and using kinematically correct neighborhoods to generate an augmented configuration space specific to a vehicle. With this structure, the vehicle can move from any starting position, correct for run-time deviations, and drive to the goal position. A 1/10 scale radio-controlled testbed is used to verify that the theoretical path can indeed be carried out in practice.
There are two themes addressed in this paper: (1) the contrast between hierarchical and feedback-based approaches to designing robot systems, and (2) the need for an efficient human-machine interface. In the rapid pursuit of automation it is sometimes overlooked that an elaborate human-machine interface is quite necessary, despite the fact that a fully automated system, by definition, would not require much human interaction. In the future, real-time sensing, intelligent processors, and dextrous manipulators will become more viable, but until then it is necessary for human operators to provide the intelligent control functions. It is not obvious, however, how automated subsystems could account for human intervention, especially if a philosophy of pure automation dominates the design process. Teleoperation, on the other hand, emphasizes the creation of fast hardware pathways (e.g., hand controllers, exoskeletons) to communicate low-level control data to various mechanisms, while providing sensory feedback in a format suitable for human consumption (e.g., stereo display, force reflection), leaving the `intelligent' control functions to the human. These differences make it difficult to tie automation together with teleoperation while providing for graceful transitions at the appropriate times. The analysis of these issues has come of late under the title: supervisory control. Supervisory control systems must deal with various types of intelligence distributed throughout the layers of control. Typical layers are real-time servo control, off-line planning and reasoning subsystems and finally, the human operator. Design methodologies must account for the fact that the majority of the intelligence will reside with the human operator. This paper focuses on hierarchical decompositions and feedback loops as conceptual building blocks that provide a common ground for man-machine interaction. Several examples and candidate computer architectures are also discussed.
We have given the motion planning strategy for certain prototypical situations arising out of isothetic workspace in 2-D and rectangle as a moving object. We have suggested three possible variations of rotation and translation of the moving object to negotiate through the L- type paths, and we have shown that the same can be extended to the other prototypical situations as well. We have also given simulation results to compare the three cases of proposed motion.
The ability to detect junctions and make choices among the possible pathways is important for autonomous navigation. In our script-based navigation approach where a journey is specified as a script of high-level instructions, actions are frequently referenced to junctions, e.g., `turn left at the intersection.' In order for the robot to carry out these kind of instructions, it must be able (1) to detect an intersection (i.e., an intersection of pathways), (2) know that there are several possible pathways it can take, and (3) pick the pathway consistent with the high level instruction. In this paper we describe our implementation of the ability to detect junctions in an indoor environment, such as corners, T-junctions and intersections, using sonar. Our approach uses a combination of partial scan of the local environment and recognition of sonar signatures of certain features of the junctions. In the case where the environment is known, we use additional sensor information (such as compass bearings) to help recognize the specific junction. In general, once a junction is detected and its type known, the number of possible pathways can be deduced and the correct pathway selected. Then the appropriate behavior for negotiating the junction is activated.
In this paper, a method for mobile robot positioning using range data is investigated. A ranging sensor is used to collect range data within an environment modeled as a closed polygon, and a compass is available to provide the ranging sensor with direction. The problem of position uncertainty is defined as the existence of multiple position solutions when angle readings are noiseless. When the compass is not ideal, the problem is further complicated by now having imprecise multiple solutions. Both cases are solved by introducing a function defined as the region entropy, which provides us with the ranging angle directions that minimize the uncertainty in position estimation. A least square position estimation method is then proposed. Finally, an example is used to illustrate the technique described above.
The problem of determining the position and orientation of a mobile robot has been addressed by several researchers using sensors of different modalities, including video cameras. Invariably, all the vision-based approaches for robot localization consider that the camera is mounted on the robot and that the robot working environment is assumed to contain prominent landmarks at known locations. In this paper we propose a robot localization scheme where the robot itself serves as the landmark for cameras that are positioned in the environment to cover the entire work area of the robot. Although the proposed approach is applicable for the robots of any regular shape, we develop the solution to the localization problem by assuming a cylindrical shape for the robot. A compete mathematical analysis of the localization problem is given by extending the three-dimensional structure-from-rotational motion approach to the present task. We also examine the implementation issue of the proposed approach and present experimental results to show its effectiveness.
We describe a parallel-distributed approach to simultaneously solving the correspondence problem and the pose estimation problem. When the correspondence is known our method reduces to a generalized Hopfield network with a node for each image point that has an activation that converges to the ray-length through the image point to the object point. Inexact knowledge of the correspondence between object points and image points is represented as an assignment array that encodes the degree of confidence in each of the possible correspondences, similar to the traditional `assignment problem' of optimization theory. Two neural networks, one for pose estimation and one for solving the assignment problem, interact via the introduction of `pseudo image points,' created from the inexact correspondences. These `pseudo image points' reduce to the actual image points when the correspondence is known.
Several methods for planning paths in a terrain represented by a digital map under different strategies and constraints for an autonomous vehicle are presented. The planning task is formulated as a variational problem in a space with a non-Euclidean metric depending on the applied strategy. The resulting nonlinear Hamilton-Jakobi equation can be approximated by a linear Fokker-Planck equation suitable for analytical calculations. Furthermore, an efficient dynamic programming algorithm is presented with complexity linear in the points of the discretized space. Because of the digitization bias, a second calculation step based on direct optimization can be appended. The method is applied to plan optimal flight paths in three dimensions.
Mobile robots require knowledge of the environment to plan movements and accomplish tasks. Most path planning algorithms assume complete knowledge of the robot environment. But many situations exist where environment maps are not available to the robot, thereby making it impossible to implement and execute planned tasks. Such situations require that the robot either construct a map of the environment or operate in a local sensing mode with its concomitant absence of planning. This paper addresses the problem of map building. Map building involves robot motion, environment sensing, and sensor data integration. Most mapping algorithms described in the literature are based upon an environment in which objects and boundaries are made up of flat walls (polygons), and a sensor model that does not take into account the finite range and distortion encountered in real sensors. In this paper we present a mapping algorithm that imposes less restrictions on the environment and sensor. The algorithm described here uses a sensor with a limited sensing range to map an environment populated by objects of any shape and size. The mapping area can be controlled by defining an imaginary boundary or envelope around the region that is to be mapped. The algorithm proceeds by defining bounded regions enclosed by peripheral curves which subsequently become trajectories for further exploration of the environment, and includes procedures for circumnavigating objects using primitive robot motion and sensing operators such as MOVE, ROTATE, and SCAN. The algorithm has been tested in a simulated environment and the results of some of the mapping operations are described.
This paper describes a system developed for the robotic processing of naturally variable products. In order to plan the robot motion path it was necessary to use a sensor system, in this case a machine vision system, to observe the variations occurring in workpieces and interpret this with a knowledge based expert system. The knowledge base was acquired by carrying out an in-depth study of the product using examination procedures not available in the robotic workplace and relates the nature of the required path to the information obtainable from the machine vision system. The practical application of this system to the processing of fish fillets is described and used to illustrate the techniques.
The accuracy and speed of motion determination are critical factors for vision based environment modeling of autonomous moving machines. In this paper two motion estimation algorithms for this purpose are compared using simulations. The algorithms, based on extended Kalman filtering (EKF) and Gauss-Newton minimization, determine the translations and rotations from corresponding 3-D point pairs measured from consecutive stereo images. Both solutions take the stereo measurement uncertainties into account. The simulation results show that with the same input data the accuracy of EKF is slightly lower than with the Gauss- Newton minimization approach, but its computational cost is substantially smaller.
In order to recover an accurate representation of a scene containing multiple moving objects, one must use estimation methods that can recover both model parameters and segmentation at the same time. We introduce a new layered model of scene segmentation based on explicitly representing the support of a homogeneous region. Our model employs parallel robust estimation techniques, and uses a minimal-covering optimization to estimate the number of objects in the scene. Using a simple direct motion model of translating objects, we successfully segment real image sequences containing multiple motions.
The effects of regularization techniques applied to the estimation of visual motion are investigated. A new method to compute the optical flow from a sequence of time-varying images is used. It is based on second-order derivatives of the image brightness and it is applied to evaluate 3-D motion parameters like the time-to-impact from translational motion. In particular, quantitative results are presented on the influence that spatio-temporal filtering of real image sequences with 2-D and 3-D (where the third dimension is time) smoothing operators has on the estimation of such parameters. The experiments show that, performing a three-dimensional filtering of the image sequence, remarkable accuracy can be reached, even using a small spatial kernel. Moreover, computing the optical flow only on intensity edges it gives the same 3-D motion parameter estimate as considering the whole field, provided that a small spatial kernel is used. The experiments presented in this paper demonstrate the applicability of the methods to a large number of computer vision applications.
A number of optical flow techniques exist for recovering the relative sensor-scene motion from image sequences, for subsequent use in vehicle guidance and navigation tasks. However, the problems of finding depths and motion, as well as various differential and kinematic approximations within the algorithms themselves, often lead to difficulties in the presence of noisy data. A re-examination of the basic geometry of the problem when posed as a set of matched points in two or more frames, each separated by an arbitrary motion, has produced a revised error metric for use with iterative solution methods. In experiments with small motions the method has been demonstrated to be less susceptible to noise than the standard linear formulation from which it is derived. Improved performance has also been observed in the recovery of rotational motion.
In this paper we describe an efficient system for recovering the 3-D motion and structure of visual systems from an evolving image sequence. The technique utilizes the image flow velocities in order to recover the 3-D parameters. We develop a real-time algorithm for recovering the 2-D flow vectors on the image plane with sub-pixel accuracy. The flow estimates are then examined for the possibility of errors, mistakes, and uncertainties in the visual system. Uncertainty models are developed for the sensor and for the image processing techniques being used. Further filtering and a rejection mechanism are then developed to discard unrealistic flow estimates. The 2-D flow models are then converted into 3-D uncertainty models for motion and structure. We further develop an algorithm which iteratively improves the 3-D solution given two successive image frames and then discuss a multiframe algorithm that improves the solution progressively by using a large number of image frames.
Many current optical flow algorithms are not suited for practical implementations such as tracking because they either require massively parallel supercomputers, specialized hardware, or up to several hours on a scientific workstation. One particular reason for this is the quadratic nature of the search algorithms used in these problems. We present two modifications to these types of algorithms which can convert quadratic-time optical flow algorithms into linear-time ones. The first uses a variable image sampling rate which trades space for time and yields an algorithm that is at worst linear, and at best constant, in the speed of the moving objects in the image. This technique finds the fastest motion in an image and is ideal for tracking, since the fastest moving objects in a robot's environment are generally the most interesting. The second modification extends this approach to create a multiple-speed optical flow field by transforming quadratic searches over space into linear searches in time. This space-time inversion has the effect of searching for faster moving objects in each image earlier than for slower moving ones, with additional effort being exerted to search for slower objects only when desired. A system of velocity masking allows a tradeoff of angular resolution (but not magnitude resolution) for an optical flow algorithm only linear in the range of velocities present.
This paper presents an alternative approach to building representations for the environment of a mobile robot. This approach is based on recording the geometrical relationships between the observed features rather than their absolute position with respect to an arbitrary coordinate frame of reference. The resulting representation takes the form of a graph where the nodes represent the observed features and the edges represent the relationships between the features. This representation is particularly well suited for recognition tasks which can be reformulated as graph matching problems. Unlike Cartesian maps, relational maps can be built and maintained without any estimates for the position of the robot which means that the errors in this representation will be independent of any errors in the estimates for the robot position.
Curved road tracking is one of the problems in active vision for Autonomous Vehicles. In search of a solution to this problem, we started with circular road following. In this paper we report on a new vision based method and its real time implementation for achieving the task of circular road following. Unique features of this method are: (1) it does not require the velocity information of the vehicle, and (2) it is very simple.
In this paper, we address the problem of the object recognition in a complex 3-D scene by detecting the 2-D object projection on the image-plane for an autonomous vehicle driving; in particular, the problems of road detection and obstacle avoidance in natural road scenes are investigated. A new implementation of the Hough Transform (HT), called Labeled Hough Transform (LHT), to extract and group symbolic features is here presented; the novelty of this method, in respect to the traditional approach, consists in the capability of splitting a maximum in the parameter space into noncontiguous segments, while performing voting. Results are presented on a road image containing obstacles which show the efficiency, good quality, and time performances of the algorithm.
Road segmentation is the critical step in a vision system for outdoor road following. This paper presents a new method for color road segmentation. The basic idea of the method is as follows: each pixel of a color image consists of red, green, and blue intensity values; it can be considered as a feature vector in 3-D RGB space, and the road segmentation can be translated into a statistic classification problem. First, a color normalization transform is used to erase the shadows in a color image; then, the optimal discriminant plane technique is used for feature decorrelation; finally, road segmentation is completed by a minimum distance classifier designed on the optimal discriminant plane. We have done a lot of experiments, and the results show that the present method is effective.
How to extract a set of significant pixels in an image in order to represent the visual information, and to eventually derive 3-D information about the scene, is the subject of this paper. First we describe a relaxation method to detect contrast points in images, and then we propose an approach of maximum pattern resemblance based on the characteristics of these points to find point matches between pairs of images as well as the matches among images in a sequence. After a few iterations of the detection process, a `stable' pattern of discrete points is obtained and `good' corresponding points are then clustered successfully.
We discuss the application of nonlinear median, alpha-trimmed mean filters and moving average to images from an intensified CCD camera which is mounted inside a vehicle on the dashboard. Discrete filtering strategies based on alpha-trimmed mean as a prefilter to prepare an image to automatic moving average has been done. In fact, the smoothing characteristics of the order statistics filters and the integrated value of the moving average have been combined to produce a good-looking image to the human eye. In some cases, an object shape isn't completely defined in the new image after the filters have been applied. In these cases resulting from poor acquisition due to subexposition or intensive shading, a histogram modification has been effected. The intensified camera system performs at light levels as low as 0.0001 lux, equivalent to an overcast moonless night. The regular CCD cameras operate in light levels of around 0.1 lux, scene illumination at least equivalent to the light on a full moonlit night. We show, through a series of examples, that these filters can be effectively used to partially eliminate veiling glare effect and brightness fluctuation with time, which comprise the most significant noise in an intensifier tube application.
This paper describes how fusion between thermal and range imaging allows us to discriminate different types of materials in outdoor scenes. First, we analyze how pure ViSiOfl segmentation algorithms applied to thermal images allow discriminating materials such as rock and sand. Second, we show how combining thermal and range information allows us to better discriminate rocks from sand. Third, as an application, we examine how an autonomous legged robot can use these techniques to explore other planets.
This paper discusses how the operation of a mobile robot in a real environment can be decomposed into three concurrent modes -- the reflexive, deliberate action, and self-awareness modes. In the reflexive mode, the robot reacts quickly to changes in the environment (e.g., the robot pauses when objects cross its path). When the robot needs to carry out more deliberate actions, such as moving between rooms of a large building or within a space colony structure, it has to invoke the deliberate action mode in order to be able to plan and reason about what it wants to do and how to go about doing it. Both these modes use the robot's ability to sense and interact with its external environment. The robot must also have a sense of what it can and cannot do given its current internal conditions. Such an ability is provided for by the self- awareness mode in which the robot's internal states (e.g., power pack energy level) are continually sensed and monitored to determine if the current mission or task should be completed, suspended, or aborted (e.g., suspend the current task and replace/recharge the power pack when the energy level is too low).
Current segmentation and classification algorithms are not sufficiently robust to provide reliable real-time sensor-based vehicle guidance for Intelligent Vehicle Highway Systems. An alternative technique based on neural networks was developed for scene classification as these algorithms can handle missing and fuzzy data. The back-propagation algorithm was successfully used in conjunction with new activation functions for classification of roads, lane markers, shadows, grass, and edge transitions in real-highway scenes. Extracted from 15 training images were 131 subregions of size 3 X 3 with known classes such as roads, lane markers, shadows, grass, and low and high-edge transitions. Two different neural network architectures based on image and edge data were defined. The neural network was then trained for learning the characteristics of desired classes. After convergence of the training phase was completed, the test images were correctly classified into the desired classes. The training time of 2.14 hours was significantly lower than that of days to a week, as reported by other researchers for similar applications. The processing speed and classification accuracy based on visual criteria were excellent for the test images.
Recently there has been an increasing interest in the development of a robot system capable of moving inside buildings. The problem of continuously establishing the position of a vehicle is fundamental in goal-oriented navigation. In this work we propose a vision system for an autonomous vehicle capable of determining its real position with respect to the planned one. A set of control points is given in a fixed coordinate system. If these points are identified in the image plane, the location from which the image has been obtained, and therefore the vehicle position, can be determined.
The benefits of sensor based computer control in the international mining industry are many. Consequently, the industry expends considerable effort researching and developing automated systems. These efforts span the total industry including exploration, extraction, processing, and refining. All of the major segments of the mining industry: underground and surface, industrial minerals, metal, non-metal, and coal have participated in these developments. The result is a large body of information that is impossible to cover completely in a conference paper. To provide interesting information to the conference audience, this paper focuses on efforts to automate mobile production equipment. Most of the efforts apply to international underground metal and coal mines. Mobile mining and construction equipment presently utilize mechanization and remote control. For example, remote controlled equipment, where an operator is stationed some distance from the machine, but within line-of-sight, is found on many continuous mining machines in underground coal mines and LHD'S in underground metal mines. Teleoperated equipment where operators can be stationed further away, beyond-line-of-sight, is currently being tried at mines in the USA, Canada, Australia, and Sweden. Research projects currently focus on intelligent analysis of data from sensors to produce control algorithms which will be termed "mining robotics" to distinguish it from simpler closed ioop control termed "automation." The reason current research focuses on intelligent analysis of sensor data to produce control algorithms is that mining takes place in the geological environment where conditions are highly variable and unpredictable. As a result, mining systems must have substantial cognitive abilities to recognize and deal with these unpredictable variations. Robotics has been described as the intelligent connection between perception and action. It may provide the greatest benefits in productivity and safety since it incorporate artificial-intelligence-based algorithms that show promise for adapting to the dynamic mining environment. Many mines have purchased the latest available technology from equipment manufacturers. These large, expensive, advanced machines have high production capabilities and are very reliable. For these machines to provide the highest cost effectiveness, they should be fully utilized; however this is not always the case. For example, underground coal continuous miners are only utilized 17% of the shift (Suboleski and King, 1990, and King and Suboleski, 1991). These problems occur in surface and underground mines, but underground mining has the most opportunity for improvement. In addition, maintaining an artificial environment 348 / SPIE Vol. 1613 Mobile Robots VI (1991) 0-8 194-0750-X/92/$4.00 in an underground mine that is conducive to optimal work performance is expensive. By its nature, underground mining can be hot or humid, in addition, the equipment produces dust, noise, fumes, and other hazards. Explosive or toxic gases and the constant danger of rock falls add to the increasing list of health and safety concerns that have initiated volumes of federal and state regulations for maintaining safe work places for humans in underground mines. These regulations cause tremendous capital and operating expenditures that prevent many mineral deposits from being mined profitably. For example, in both underground coal and metal mines, miners excavate many more entries or drifts than necessary to produce and remove the ore or coal in order to provide a safe environment for humans. These difficulties will increase as mines deplete nearsurface reserves that have the best working conditions. As a result, some areas are not mined because of the expense of providing an adequate environment for human operators. A good example is narrow ore veins in metal mines that would normally require cut and fill stoping methods to provide adequate ground support. With teleoperated or robotic equipment, less expensive open stoping practices can be used. Another example is automated longwall systems that can be operated more freely in a bi-directional cutting mode because they eliminate respirable dust exposure on the face.
This paper presents eight visual cues to be used as part of a control loop for autonomous landing. The idea is based on fixating the camera at the vanishing point of the projection of the runway in the image. Two-dimensional geometrical cues are used to derive the visual cues. The visual cues that are extracted are the relative location of the camera to the runway in terms of the runway width, orientation of the camera, and relevant angles such as glide slope angle.
To use a world model, a mobile robot must be able to determine its own position in the world. To support truly autonomous navigation, I present MARVEL, a system that builds and maintains its own models of world locations and uses these models to recognize its world position from stereo vision input. MARVEL is designed to be robust with respect to input errors and to respond to a gradually changing world by updating its world location models. In over 1000 recognition tests using real-world data, MARVEL yielded a false negative rate under 10% with zero false positives. MARVEL also fits into a world modeling system under development.