At least three of the five senses must be fully addressed in a successful virtual reality (VR) system. Sight, sound, and touch are the most critical elements for the creation of the illusion of presence. Since humans depend so much on sight to collect information about their environment, this area has been the focus of much of the prior art in virtual reality, however, it is also crucial that we provide facilities for force, torque, and touch reflection, and sound replay and 3-D localization. In this paper we present a sampling of hardware and software in the virtual environment maker's `toolbox' which can support rapidly building up of customized VR systems. We provide demonstrative examples of how some of the tools work and we speculate about VR applications and future technology needs.
A high speed image transformation engine (ITE) was designed and a prototype built for use in a generic electronic light table and image perspective transformation application code. The ITE takes any linear transformation, breaks the transformation into two passes and resamples the image appropriately for each pass. The system performance is achieved by driving the engine with a set of look up tables computed at start up time for the calculation of pixel output contributions. Anti-aliasing is done automatically in the image resampling process. Operations such as multiplications and trigonometric functions are minimized. This algorithm can be used for texture mapping, image perspective transformation, electronic light table, and virtual reality.
The increased importance and sophistication of modern computer modeling capabilities has led to a need for high-quality databases to represent complex scenes. Such databases must contain complete and accurate three-dimensional descriptions of selected areas, along with material classifications and other relevant information. Typically, no single sensor input can provide such complete information; data from multiple sources must be integrated to construct the database. This paper surveys key issues in scene representation. In particular, it outlines methods to construct a database from different image and cartographic sources, describes required database architecture and content, and presents an object-oriented representation approach which we call object hierarchy.
Most knowledge-directed vision systems are tailored to recognize a fixed set of objects within a known context. Generally, the programmer or knowledge engineer who constructs them begins with an intuitive notion of how each object might be recognized, a notion which is refined by trial-and-error. Eventually the programmer finds a combination of features (e.g., color, shape, or context) and methods (e.g., geometric model matching, minimum-distance classification, or generalized Hough transforms) that allow each object to be reliably identified within the domain. Unfortunately, human engineering is not cost-effective for many real-world applications. Moreover, there is no way to ensure the validity of hand-crafted systems: their performance (in terms of accuracy and reliability) is unknown, as is their efficiency in comparison to other strategies. Worst of all, when the domain is changed, the systems often have to be rebuilt from scratch. The schema learning system (SLS) automates the construction of knowledge-directed recognition strategies. Starting with a set of potential actions called `knowledge sources,' SLS builds a strategy capable of recognizing the first training instance. With each successive training image, SLS generalizes the strategy (if necessary) to account for any new examples, while keeping the strategy as efficient as possible. The final result is a strategy capable of recognizing every instance in a training set at a minimal cost.
Advances in the field of machine learning technology have yielded learning techniques with solid theoretical foundations that are applicable to the problems being encountered by object recognition systems. At Honeywell an object recognition system that works with high-level, symbolic, object features is under development. This system, named object recognition accomplished through combined learning expertise (ORACLE), employs both an inductive learning technique (i.e., conceptual clustering, CC) and a deductive technique (i.e., explanation-based learning, EBL) that are combined in a synergistic manner. This paper provides an overview of the ORACLE system, describes the machine learning mechanisms (EBL and CC) that it employs, and provides example results of system operation. The paper emphasizes the beneficial effect of integrating machine learning into object recognition systems.
This paper describes an investigation into the use of genetic algorithm techniques for selecting optimal feature sets in order to discriminate large sets of Arabic characters. Human experts defined a set of over 900 features from many different classes which could be used to help discriminate different characters from the Arabic character set. Each of the features was assigned a cost, based on the average amount of CPU time necessary to compute it for a typical character. The goal of the optimization was to find the subset of features which produced the best trade-off between recognition accuracy and computational cost. Using all of the features, or particular subsets, we obtained high recognition rates on machine-printed Arabic characters. Application of the genetic algorithm to selected subsets of characters and features demonstrates the ability of the method to significantly reduce the computational cost of the classification system and maintain or increase the recognition rate obtained with the complete set of features.
Mobile robot architectures have been based on many different design principles: AI, control theory, hierarchical organization, etc. Brooks argues for a `subsumption' approach, based on layers of very simple, real-time computations. The CMU Navlab project takes a more pragmatic approach. The bottom layer is real-time, based on local coordinates, with no high- level models or central data structures to be bottlenecks. But the architectural tools developed for the Navlab also provide hooks for a higher level, based in world coordinates and using AI planning, to control the lower layer.
Least-squares and robust methods were presented for determining the location and orientation of a mobile robot from visual measurements of modeled 3-D landmarks. However, building the 3-D landmark models is a time consuming and tedious process. For landmark-based navigation methods to be widely applicable, automatic methods have to be developed to build new 3-D models and enhance the existing models. Ideally, a robot would continuously build and update its world model as it explores the environment. This paper presents techniques to determine the 3-D location of image features from a sequence of monocular 2-D images captured by a camera mounted on the robot. The approach adopted here is to first build a partial model (possibly noisy) either manually, by stereo, or by tracking and reconstructing shallow structures over a sequence of images using the constraint of affine trackability. This model is subsequently used to compute the pose that relates the model coordinate system and the camera coordinate system of the image frames in the sequence. The unmodeled 3-D features (those not already in the model) are tracked over the image sequence and their 3-D locations recovered by a pseudo-triangulation process, a form of `induced stereo.' The triangulation process is also used to make new 3-D measurements of the initial model points. These measurements are then fused with the previous estimates to refine the set of initial model points.
We believe that the field of autonomous mobile robots is reaching a sufficient level of maturity that robust fieldable systems are technically possible. Such systems will draw upon both the ideas of deliberative planning and situated reaction (or situated skills). Situated skills will provide the system with the local competence required to handle the world's immediate uncertainties and carry out simple tasks. Deliberative planning, on the other hand, will serve to orchestrate the activation of situated skills in order to lead the system towards globally defined objectives. This paper outlines the mobile robot navigation problem and describes what is meant both by situated skills and deliberative planning. Finally, an outline of MITRE's current effort in developing a software architecture for autonomous mobile robots is presented.
A novel fast magnetic resonance imaging (MRI) technique is proposed. The technique is based on a proper design of the excitation profile. In particular, the excitation profile is chosen to be highly regular along its boundaries. It is shown that the free induction decay (FID) signals that result from such excitations decay much faster than those produced by traditional MRI approaches as long as the underlying magnetization and a number of its derivatives are continuous. The proposed technique exploits this fact together with an exact model of the slow decaying components of the FID signal (those produced by discontinuities in the magnetization or its derivatives) to produce fast high quality reconstructions of the underlying magnetization. Experimental results obtained with a 4 Tesla whole body scanner are included to demonstrate the viability of this approach.
The delineation of brain lesion boundaries in computerized tomography (CT) or magnetic resonance imaging (MRI) sequences is important in many medical research environments and clinical applications. For example, computer-aided neurosurgery requires the extraction of boundaries of lesions in a series of CT or MRI images in order to design the surgical trajectory and complete the surgical planning. Currently, in many clinical applications, the boundaries of lesions are traced manually. Manual methods are not only tedious but also subjective, leading to substantial inter- and intraobserver variability, and confusions between lesions and coexisting normal structures pose serious problems. Automatic detection of lesions is a nontrivial problem. Because of the low resolution, the border regions between lesions and normal tissues are typically of single-pixel width in CT images, and the intensity gradient at the lesion boundary varies considerably. These characteristics of lesions within CT images, in conjunction with the generally low signal-to-noise ratio of CT images, render simple boundary detection techniques inadequate. Recent work in the field of computer vision has shown multiscale analysis of objects in gray scale images to be effective in many applications. This paper describes and illustrates the application of multiscale morphological techniques to the delineation of brain tumors.
A comparative analysis of the performance of five classifier types combined with four feature extraction techniques is presented for the automatic recognition of land use/cover categories from aerial imagery through texture analysis. The classification accuracy of the linear, Bayes quadratic, k-nearest neighbor, Parzen, and backpropagation-trained multi-layer perceptron classifiers are evaluated in combination with the following texture measures: spatial gray-level co-occurrence matrix, Laws, Liu-Jernigan, and Fourier domain rings and wedges. Examples of four land use/cover classes -- urban, fields, trees, and water -- are manually delineated from commercial aerial survey panchromatic images per the U.S. Geological Survey Land Use/Land Cover Classification System. Through leave-one-scene-out sampling, each classifier type is trained and tested using feature vectors generated by each feature extraction technique. Mean classification error and an 80% confidence interval for each combination of classifier- feature extraction method is determined. Error overlap is analyzed to assess improvement of performance though fusing the results from two or more classifier-feature set combinations. The significance of this work likes both in the results of the comparative analysis and in its adherence to formal experimental methodology. We anticipate that these results will be applicable to a wide variety of image recognition problems where texture is a principle discriminant, including medical screening, remote sensing, and materials identification.
We have developed a binary pattern matching system for the recognition of segmented logos and icons corrupted by random and correlated noise. It is based on the POLCAR transform. The POLCAR transform is invariant to pattern translation and converts size and orientation changes into translations. It has a high pattern discrimination ability, requires no training, and should be capable of running in real-time. The initial tests of this system are very encouraging.
The fundamental theorem of projective geometry states that the transformation mapping coplanar points from an object onto an image plane can be determined given the correspondence between four or more object points and their projections in the image. This theorem can be used to predict where other features in the object plane will appear in the image, and conversely, to project new image features back onto the object plane. In this paper we examine what happens when these mathematical results are applied to real-world data containing random sensor noise and deviations from coplanarity.
We are designing and building a prototype infrared tracking system using commercially available components and development platforms. This system has increased capabilities over past infrared trackers. The advantage of this system will be its ability to be configured for a variety of image processing and detection techniques, using the same development platform, allowing easier system upgrades with the availability of new componnents.
A simple algorithm for the detection of a cluster of faint point signals in noise is described. The algorithm, which uses the properties of the amplitude statistics of the data, appears sufficiently robust so as to be applicable to a wide range of cases. Because the algorithm is not computationally intensive, it can be implemented with a small amount of hardwired logic and/or a simple microprocessor. It can also be used to supplement other more conventional cluster algorithms. Examples are shown using 15 to 50 points within a cluster area of 400 pixels. SNR for the individual points ranges between 1.0 and 2.5. For these examples, the algorithm improves the probabilities of detection and false detection and is equivalent to an increase in point signal SNR by a factor of 2 to 3.