Mesoscale oceanographic features are important aspects of ocean circulation. The high volume of satellite-derived oceanographic data coupled with the high level of human skill associated with the detection of oceanographic features in the data has necessitated automating the interpretation process. Morphological edge detectors produce better results than the conventional template- and differentiation-based edge detectors. A grayscale morphological edge-detection algorithms is developed for automatic delineation of mesoscale structure in digital satellite IR images of the ocean. We compare performances of three morphological edge detectors in sea surface temperature fields. We provide experimental results on images from the North Atlantic under various image settings.
We have developed a method that uses genetic algorithms (GAs) to optimize rules for categorizing the terrain in Landsat data. A rule has two parts: a left side (the `if' clause) and a right side (the `then' clause). When the `if' clause is true, the functions in the `then' clause are executed to process the Landsat data. Examples of functions for processing the data include pixel by pixel threshold and a linear combination of six bands. Optimized rules are used to identify different terrain categories within Landsat data. Optimization is performed by comparing the results of the rules with ground truth using an objective function which minimizes the number of false positives and false negatives. Those rules that generate results close to the ground truth (those rules that return a small number of false positive and false negative pixel identifications) are highly rewarded and are used to create the next generation of rules. United States Geological Survey Land Use Land Cover data and an analyst's interpretation of the Landsat image were used as ground truth. The GA produced promising results for terrain categorization. More work in the area of terrain categorization is planned to build on these promising results.
The paper describes how, given a ridge-and-valley image of a fingerprint, a flow map can be extracted. The first step is a unique grayscale algorithm based on 3D mathematical morphology to extract the ridge lines under a wide range of conditions. The paper then compares the results of finding the ridge flow using micropatterns, extended templates (on binary and grayscale images), and several well-known, traditional techniques. The question of spatial resolution is also addressed.
Computer equipment originally designed to age progress photographs of missing persons is also being employed in cases requiring facial reproduction from skeletal remains. This process increases the interaction and communication between the artist and anthropologist to create rapid, realistic facial images and permits alternative images or modifications. Other uses, such as photographic superimposition, have been successful in comparing antemortem photographs with recovered crania and mandibles. Composite drawings from witness descriptions are also prepared or modified on the computer rapidly and accurately depending on the recall of the witness and the skill level of the artist interviewer.
This paper discusses the use of neural networks to locate regions of interest for fingerprint classification using feature-encoded fingerprint images. The target areas are those useful for the classification of fingerprints: whorls, loops, arches, and deltas. Our approach is to limit the amount of data which a classification algorithm must consider by determining with high accuracy those areas which are most likely to contain relevant features (effective for classification). Several feature sets were analyzed and successful preliminary results are summarized. Five feature sets were tested: (1) grayscale data, (2) binary ridges, (3) binary projection, and (4 & 5) 4- and 8-way directional convolutions. Four-way directional convolution produced accurate results with a minimal number of false alarms. All work was conducted using fingerprint data from NIST Special Database 4. The approach discussed here is also applicable to other general computer vision problems. In addition to fingerprint classification, an example of face recognition is also provided to illustrate the generality of the algorithmic approach.
The combination of hyperspectral imaging systems and neural networks are changing the approach to the challenging problem of automatic target recognition. This paper summarizes a research effort to demonstrate the usefulness of neural networks in processing hyperspectral imagery for target detection and segmentation.
Successful object classification is highly dependent upon initial segmentation of an object from its background. For complex, real-world imaging applications, this task is extremely challenging and critical to the success of the recognition system. Traditional object segmentation techniques often rely heavily upon noise removal during preprocessing and subsequently employ image-level segmentation strategies. Because effective noise-removal strategies are often difficult to develop for real-world imagery, alternate methods are required for object segmentation. An alternate approach is to determine target/nontarget status of image regions at the pixel level. In this manner, noise removal and object segmentation are performed in a single process. The approach takes advantage of the large amount of information contained in present-day, multispectral imagery. The key issues associated with this approach are a robust pixel information representation and an information fusion algorithm to process pixel-level information.
A hierarchical recognition methodology using abductive networks at several levels of object recognition is presented. Abductive networks--an innovative numeric modeling technology using networks of polynomial nodes--results from nearly three decades of application research and development in areas including statistical modeling, uncertainty management, genetic algorithms, and traditional neural networks. The systems uses pixel-registered multisensor target imagery provided by the Tri-Service Laser Radar sensor. Several levels of recognition are performed using detection, classification, and identification, each providing more detailed object information. Advanced feature extraction algorithms are applied at each recognition level for target characterization. Abductive polynomial networks process feature information and situational data at each recognition level, providing input for the next level of processing. An expert system coordinates the activities of individual recognition modules and enables employment of heuristic knowledge to overcome the limitations provided by a purely numeric processing approach. The approach can potentially overcome limitations of current systems such as catastrophic degradation during unanticipated operating conditions while meeting strict processing requirements. These benefits result from implementation of robust feature extraction algorithms that do not take explicit advantage of peculiar characteristics of the sensor imagery, and the compact, real-time processing capability provided by abductive polynomial networks.
This work concerns computationally efficient computer vision methods for the search for and identification of small objects in large images. The approach combines neural network pattern recognition with pyramid-based coarse-to-fine search, in a way that eliminates the drawbacks of each method when used by itself and, in addition, improves object identification through learning and exploiting the low-resolution image context associated with the objects. The presentation will describe the system architecture and the performance on illustrative problems.
It is preferred in low-light-level imaging to maximize the large-area signal-to-noise output of automated intensified cameras. In part this actually stems from the desired ruggedness of the cameras and historical technical limitations. Review of the current status of automated intensified camera technology suggests that alternative methods of automated operation are both feasible and desired; maximizing the signal-to-noise ratio at a range of low-light levels requires compromise in other parameters influential to the quality of the final output image, and the resulting subjective image quality may not be the best potentially attainable. The two opposing factors in low-light-level imaging are, noise and image blur, determined by the photon statistics plus image intensifier gain, and the optical transfer function (OTF) of the camera, respectively. The results of psychophysical techniques used to determine subjective detection thresholds in images containing known amounts of noise and image blur facilitate the optimization of image quality in an intensified camera, through correlation with measured OTF and noise power spectra for the low-light-level operation of the camera.
Polarization vision has recently been shown to simplify some important image understanding tasks that can be very difficult to perform with intensity vision. This, together with the more general capabilities of polarization vision for image understanding, motivates the building of camera sensors that automatically sense and process polarization information. Described in this paper is a design for a liquid crystal polarization camera sensor that has been built to automatically sense partially linearly polarized light and computationally process this sensed polarization information at pixel resolution to produce a visualization of reflected polarization from a scene and/or a visualization of physical information in a scene directly related to sensed polarization. As the sensory input to polarization camera sensors subsumes that of standard intensity cameras, they can significantly expand the application potential of computer vision for object detection. A number of images taken with polarization cameras are presented showing potential applications to image understanding, object recognition, circuit board inspection, and marine biology.
This paper presents a brief overview of our research in the development of an OCR system for recognition of machine-printed texts in languages that use the Arabic alphabet. The cursive nature of machine-printed Arabic makes the segmentation of words into letters a challenging problem. In our approach, through a novel preliminary segmentation technique, a word is broken into pieces where each piece may not represent a valid letter in general. Neural networks trained on a training sample set of about 500 Arabic text images are used for recognition of these pieces. The rules governing the alphabet and character-level contextual information are used for recombining these pieces into valid letters. Higher-level contextual analysis schemes including the use of an Arabic lexicon and n-grams is also under development and are expected to improve the word recognition accuracy. The segmentation, recognition, and contextual analysis processes are closely integrated using a feedback scheme. The details of preparation of the training set and some recent results on training of the networks will be presented.
Segmentation is a key step in current OCR systems. It has been estimated that half the errors in character recognition are due to segmentation. We have developed a novel approach that performs OCR without the segmentation step. The approach starts by extracting significant geometric features from the input document image of the page. Each feature then `votes' for the character that could have generated that feature. Thus, even if some of the features are occluded or lost due to degradation, the remaining features can successfully identify the character. In extreme case, the degradation may be severe enough to prevent recognition of some of the characters in a word. In such cases, we use a lexicon-based word recognition technique to resolve ambiguity. Inexact matching and probabilistic evaluation used in the technique allow us to identify the correct word, by detecting a partial set of characters. This paper first presents an overview of our segmentation-free OCR system and then focuses on the word-recognition technique. Preliminary experimental results show that this is a very promising approach.
One high-payoff applications of image character recognition (ICR) technology is the automatic processing of business transaction documents. For remittance types of transactions, these documents consist of checks and invoices. While the general layout of a check is consistent with respect to types of data fields present, there is variability in the locations of the data fields printed on the document. Furthermore, checks are printed using multiple fonts, character sizes, and line spacing on a single document. In contrast, invoice contents and layouts vary widely from business to business. The high degree of layout variability of business documents poses significant problems with respect to data field location and extraction in preparation for ICR. This paper describes an image understanding approach for locating and extracting data fields from various business documents. This approach is shown to be tolerant to image rotation, multiple fonts, multiple character sizes, and line spacing. Actual checks and invoices are processed using the described algorithms. The resulting field isolation and extraction performance, together with the resulting ICR read rates, are statistically analyzed and presented.
Document database preparation is a very time-consuming job and usually requires the involvement of many people. Any database is prone to having errors however carefully it was constructed. To assure the high quality of the document image database, a carefully planned implementation methodology is absolutely needed. In this paper, an implementation methodology that we employed to produce the UW English Document Image Database I is described. The paper also discusses how to estimate the distribution of errors contained in a database based on a double-data entry/double verification procedure.
This paper describes the technologies and strategies underlying a state-of-the-art system for automatic handwritten address interpretation. The system is capable of interpreting both street addresses and post office box addresses. The input to the system is a grayscale image of a handwritten address and the goal is to determine the ZIP+4 code corresponding to the destination address on the mail piece. Processing is accomplished through an integrated series of steps involving preprocessing, numeral field recognition (ZIP codes, street numbers, post office box numbers), national postal database retrieval, word and phrase recognition, database record matching, and a decision strategy. In a formal test, this system encoded 38.7 percent of the mail pieces, with an encode error rate of 8.4 percent. Adjusting system parameters designed to tradeoff encode rate for error rate produces an encode rate of 33.8 percent with a 3.9 percent encode error rate.
This research explores the interaction of textural and photographic information in document understanding. Specifically, it presents a computational model whereby textural captions are used as collateral information in the interpretation of the corresponding photographs. The final understanding of the picture and caption reflects a consolidation of the information obtained from each of the two sources and can thus be used in intelligent information retrieval tasks. The problem of performing general-purpose vision without a-priori knowledge is very difficult at best. The concept of using collateral information in scene understanding has been explored in systems that use general scene context in the task of object identification. The work described here extends this notion by incorporating picture specific information. A multistage system PICTION, which uses captions to identify humans in an accompanying photograph, is described. This provides a computationally less expensive alternative to traditional methods of face recognition. It does not require a prestored database of face models for all people to be identified. A key component of the system is the utilization of spatial and characteristic constraints (derived from the caption) in labeling face candidates (generated by a face locator).
Sensor-based vehicle guidance systems are gathering rapidly increasing interest because of their potential for increasing safety, convenience, environmental friendliness, and traffic efficiency. Examples of applications include intelligent cruise control, lane following, collision warning, and collision avoidance. This paper reviews the research trends in vision-based vehicle guidance with an emphasis on VLSI chip implementations of the vision systems. As an example of VLSI chips for vision-based vehicle guidance, a stereo vision system is described in detail.
We have developed a radial basis function network (RBFN) for visual autonomous road following at the University of Maryland Computer Vision Laboratory. Preliminary testing of the RBFN was done using a driving simulator, and the RBFN was then installed on an actual vehicle at Carnegie-Mellon University for testing in an actual road following application. The RBFN had some success, but it experienced some significant problems such as jittery control and driving failure. Several improvements have been made to the original RBFN architecture to overcome these problems, and they are described in this paper.
The traditional approach to visual road following involves reconstructing a 3D model of the road. The model is in a world or vehicle-centered coordinate system, and it is symbolic, iconic, or a combination of both. Road-following commands (as well as other commands, e.g., obstacle avoidance) are then generated from this 3D model. Here we discuss an alternative approach in which a minimal road model is generated. The model contains only task-relevant information and a minimum of vision processing is performed to extract this information in the form of visual cues represented in the 2D image coordinate system. This approach leads to rapid and continuous update of the road model from the visual data. It results in inexpensive, fast, and robust computations. Road following is achieved by servoing on the visual cues in the 2D model. This approach results in a tight coupling of perception and action. In this paper, two specific examples of road following that use this approach are presented. In the first example, we show that road-following commands can be generated from visual cues consisting of the projection into the image of the tangent point on the edge of the road, along with the optical flow of this point. Using this cue, the resulting servo loop is very simple and fast. In the second example, we show that lane markings can be robustly tracked in real time while confining all processing to the 2D image plane. Neither knowledge of vehicle motion nor a calibrated camera is required. This system has been used to drive a vehicle up to 80 km/hr under various road conditions. The algorithm runs at a 15 Hz update rate.
Many digital imaging applications require the detection of subtle localized changes in a sequence of background scenes. Often the principle limitation to the process is uncontrollable pointing changes in an electro-optic sensor, which result in apparent image displacements in the sequence. The interpolation of one of the images followed by subtraction from another has served as a mainstay for change detection. This method is extremely suboptimal within the general context of linear filtering. Conventional registration/subtraction is replaced in this report by dual difference filtering (DDF), a symmetric method that generalizes the concept of interpolation. Over a wide range of images, DDFs have been shown to increase the signal to clutter ratio for small moving targets by an average of 31 dB, compared to older, interpolative methods. A fundamental optimization equation for DDFs is derived, and a solution is presented for a spatial spectrum typical of imagery. DDFs are shown to permit the identification of subtle differences in image sequences that were not detectable with previous methods. It is also shown that, in principle, all first-order aliasing errors can be eliminated by using DDFs. Applications include medical imaging, autonomous and cued surveillance, remote sensing, and astronomy.
Automatic object recognition is a difficult and as yet unsolved problem. There exist, however, a number of benign circumstances in which a great deal is known about all aspects of the problem--enough, in principle, to predict exactly the appearance of the image when given a small amount of information about a few objects present and their locations. In those case, it is rather straightforward to reverse the process and infer from the image the identity and locations of the objects.