Mobile robot designers frequently look to computer vision to solve navigation, obstacle avoidance, and object detection problems such as those encountered in parking lot surveillance. Stereo reconstruction is a useful technique in this domain and can be done in two ways. The first requires a fixed stereo camera rig to provide two side-by-side images; the second uses a single camera in motion to provide the images. While stereo rigs can be accurately calibrated in advance, they rely on a fixed baseline distance between the two cameras. The advantage of a single-camera method is the flexibility to change the baseline distance to best match each scenario. This directly increases the robustness of the stereo algorithm and increases the effective range of the system. The challenge comes from accurately rectifying the images into an ideal stereo pair. Structure from motion (SFM) can be used to compute the camera motion between the two images, but its accuracy is limited and small errors can cause rectified images to be misaligned. We present a single-camera stereo system that incorporates a Levenberg-Marquardt minimization of rectification parameters to bring the rectified images into alignment.
Most goal-oriented mobile robot tasks involve navigation to one or more known locations. This is generally done using GPS coordinates and landmarks outdoors, or wall-following and fiducial marks indoors. Such approaches ignore the rich source of navigation information that is already in place for human navigation in all man-made environments: signs. A mobile robot capable of detecting and reading arbitrary signs could be tasked using directions that are intuitive to hu-mans, and it could report its location relative to intuitive landmarks (a street corner, a person's office, etc.). Such ability would not require active marking of the environment and would be functional in the absence of GPS. In this paper we present an updated version of a system we call Sign Understanding in Support of Autonomous Navigation (SUSAN). This system relies on cues common to most signs, the presence of text, vivid color, and compact shape. By not relying on templates, SUSAN can detect a wide variety of signs: traffic signs, street signs, store-name signs, building directories, room signs, etc. In this paper we focus on the text detection capability. We present results summarizing probability of detection and false alarm rate across many scenes containing signs of very different designs and in a variety of lighting conditions.
Model-based Automatic Target Recognition (ATR) algorithms are adept at recognizing targets in high fidelity 3D LADAR imagery. Most current approaches involve a matching component where a hypothesized target and target pose are iteratively aligned to pre-segmented range data. Once the model-to-data alignment has converged, a match score is generated indicating the quality of match. This score is then used to rank one model hypothesis over another. The main drawback of this approach is twofold. First, to ensure the correct target is recognized, a large number of model hypotheses must be considered. Even with a highly accurate indexing algorithm, the number of target types and variants that need to be explored is prohibitive for real-time operation. Second, the iterative matching step must consider a variety of target poses to ensure that the correct alignment is recovered. Inaccurate alignments produce erroneous match scores and thus errors when ranking one target hypothesis over another. To compensate for such drawbacks, we explore the use of situational awareness information already available to an image analyst. Examples of such information include knowledge of the surrounding terrain (to assess potential occlusion levels) and targets of interest (to account for target variants).
Mobile robot designers frequently look to computer vision to solve navigation, obstacle avoidance, and object detection problems. Potential solutions using low-cost video cameras are particularly alluring. Recent results in 3D scene reconstruction from a single moving camera seem particularly relevant, but robot designers who attempt to use such 3D techniques have uncovered a variety of practical concerns. We present lessons-learned from developing a single-camera 3D scene reconstruction system that provides both a real-time camera motion estimate and a rough model of major 3D structures in the robot’s vicinity. Our objective is to use the motion estimate to supplement GPS (indoors in particular) and to use the model to provide guidance for further vision processing (look for signs on <i>walls</i>, obstacles on the <i>ground</i>, etc.). The computational geometry involved is closely related to traditional two-camera stereo, however a number of degenerate cases exist. We also demonstrate how SFM can use used to improve the performance of two specific robot navigation tasks.
Mobile robots currently cannot detect and read arbitrary signs. This is a major hindrance to mobile robot usability, since they cannot be tasked using directions that are intuitive to humans. It also limits their ability to report their position relative to intuitive landmarks. Other researchers have demonstrated some success on traffic sign recognition, but using template based methods limits the set of recognizable signs. There is a clear need for a sign detection and recognition system that can process a much wider variety of signs: traffic signs, street signs, store-name signs, building directories, room signs, etc. We are developing a system for Sign Understanding in Support of Autonomous Navigation (SUSAN), that detects signs from various cues common to most signs: vivid colors, compact shape, and text. We have demonstrated the feasibility of our approach on a variety of signs in both indoor and outdoor locations.
The success of any potential application for mobile robots depends largely on the specific environment where the application takes place. Practical applications are rarely found in highly structured environments, but unstructured environments (such as natural terrain) pose major challenges to any mobile robot. We believe that semi-structured environments-such as parking lots-provide a good opportunity for successful mobile robot applications. Parking lots tend to be flat and smooth, and cars can be uniquely identified by their license plates. Our scenario is a parking lot where only known vehicles are supposed to park. The robot looks for vehicles that do not belong in the parking lot. It checks both license plates and vehicle types, in case the plate is stolen from an approved vehicle. It operates autonomously, but reports back to a guard who verifies its performance. Our interest is in developing the robot's vision system, which we call Scene Estimation & Situational Awareness Mapping Engine (SESAME). In this paper, we present initial results from the development of two SESAME subsystems, the ego-location and license plate detection systems. While their ultimate goals are obviously quite different, our design demonstrates that by sharing intermediate results, both tasks can be significantly simplified. The inspiration for this design approach comes from the basic tenets of Situational Awareness (SA), where the benefits of holistic perception are clearly demonstrated over the more typical designs that attempt to solve each sensing/perception problem in isolation.
Situational Awareness (SA) is a critical component of effective autonomous vehicles, reducing operator workload and allowing an operator to command multiple vehicles or simultaneously perform other tasks. Our Scene Estimation & Situational Awareness Mapping Engine (SESAME) provides SA for mobile robots in semi-structured scenes, such as parking lots and city streets. SESAME autonomously builds volumetric models for scene analysis. For example, a SES-AME equipped robot can build a low-resolution 3-D model of a row of cars, then approach a specific car and build a high-resolution model from a few stereo snapshots. The model can be used onboard to determine the type of car and locate its license plate, or the model can be segmented out and sent back to an operator who can view it from different viewpoints. As new views of the scene are obtained, the model is updated and changes are tracked (such as cars arriving or departing). Since the robot's position must be accurately known, SESAME also has automated techniques for deter-mining the position and orientation of the camera (and hence, robot) with respect to existing maps. This paper presents an overview of the SESAME architecture and algorithms, including our model generation algorithm.
One of NASA's goals for the Mars Rover missions of 2003 and 2005 is to have a distributed team of mission scientists. Since these scientists are not experts on rover mobility, we have developed the Rover Obstacle Visualizer and Navigability Expert (ROVANE). ROVANE is a combined obstacle detection and path planning software suite, to assist in distributed mission planning. ROVANE uses terrain data, in the form of panoramic stereo images captured by the rover, to detect obstacles in the rover's vicinity. These obstacles are combined into a traversability map which is used to provide path planning assistance for mission scientists. A corresponding visual representation is also generated, allowing human operators to easily identify hazardous regions and to understand ROVANE's path selection. Since the terrain data often contains uncertain regions, the ROVANE obstacle detector generates a probability distribution describing the likely cost of a given obstacle or region. ROVANE then allows the user to plan for best-, worst-, and intermediate-case scenarios. ROVANE thus allows non-experts to examine scenarios and plan missions which have a high probability of success. ROVANE is capable of stand-alone operation, but it is designed to work with JPL's Web Interface for Telescience, an Internet-based tool for collaborative command sequence generation.
The primary data used in ground-based, global path planning for NASA's Planetary Rovers are stereo images down-linked from the rover and range data derived from those images. The range data are often incomplete: the sensors are inherently noisy and sections of the landscape are blocked. This missing data complicates the path planning process and necessitates the help of human experts. We present the Rover Obstacle Visualizer and Navigability Evaluator (ROVANE), which assists these human experts and allows non-experts to plan missions without expert help. ROVANE generates a hazard map identifying slow, impassable, or dangerous regions with varying degrees of certainty. This map is used to create possible paths, which are assigned variable costs based on possible hazards. A hazard visualization is also produced, allowing the user to visually identify hazards and understand the system's path selection. As target locations are entered by the user, the system finds appropriate paths using a variation of the A* algorithm. A found path can be further modified by the user and output in a format suitable for commanding an actual rover. The system is capable of stand-alone operation, but is designed to be integrated into the Jet Propulsion Laboratory’s Web Interface for Telescience.
Planning paths for onmi-directional vehicles (ODVs) can be computationally unfeasible because of the large space of possible paths. This paper presents an approach that avoids this problem through the use of abstraction in characterizing the possible maneuvers of the ODV as a grammar of parameterized mobility behaviors and describing the terrain as a covering of objectoriented functional terrain features. The terrain features contain knowledge on how best to create mobility paths—sequences of mobility behaviors—through the object and around obstacles. Given an approximate map of the environment, the approach constructs a graph of mobility paths that link the location of the vehicle with the goals. Which of these paths are actually followed by the vehicle are determined by an A* search through the graph. The effectiveness of the strategy is demonstrated
in actual tests with a real robotic vehicle.
Planning paths for omni-directional vehicles (ODVs) can be computationally infeasible because of the large space of possible paths. This paper presents an approach that avoids this problem through the use of abstraction in characterizing the possible maneuvers of the ODV as a grammar of parameterized mobility behaviors and describing the terrain as a covering of object-oriented functional terrain features. The terrain features contain knowledge on how best to create mobility paths -- sequences of mobility behaviors -- through the object. Given an approximate map of the environment, the approach constructs a graph of mobility paths that link the location of the vehicle with the goals. The actual paths followed by the vehicle are determined by an A* search through the graph. The effectiveness of the strategy is demonstrated in actual tests with a real robotic vehicle.