There are now a wide variety of image segmentation techniques, some considered general purpose and some designed for specific classes of images. These techniques can be classified as: measurement space guided spatial clustering, single linkage region growing schemes, hybrid linkage region growing schemes, centroid linkage region growing schemes, spatial clustering schemes, and split-and-merge schemes. In this paper, we define each of the major classes of image segmentation techniques and describe several specific examples of each class of algorithm. We illustrate some of the techniques with examples of segmentations performed on real images.
In this paper we consider the analysis of complex scenes such as commonly appear in aerial scenes of urban areas. In developing a computer vision system for these scenes we have had to address several problem areas. The first problem area is to structure a vision system so that all possible information is utilized in the analysis of the scene. Examples are spatial and hierarchical relationships that can occur in the scenes. Another problem is to develop operators that can interrogate the scene data and derive useful information about the presence of objects that may appear in the scene. When one considers the great variety of patterns that may occur in aerial scenes, it is apparent that this is a formidable problem. It seems unlikely that traditional methods of edge detection or spectral classification will suffice. It is impera-tive that one have appropriate operators since no vision system will perform properly without relevant information returned from the image data. Another problem area is devising methods for integrating the information returned from the operators to obtain the best possible interpretation of the scene. In this paper we discuss our work on the first two of these problems.
The SLICE segmentation system identifies image regions that differ in gray-level distribution, color, spatial texture, or some other local property. This report concentrates on textured-image segmentation using local texture-energy measures and user delimited training regions. Knowledge of target textures or signatures is combined with knowledge of background textures by using histogram-similarity transforms to identify regions that are similar to a target texture and dissimilar to other textures.
Current edge detection techniques are often insensitive to some perceptually significant edges in an image. Similarly, segmentation techniques do not always segment an image into perceptually meaningful regions. Edge detection and segmentation algorithms can be improved by incorporating the natural constraints used by humans in the perception of line drawings. To uncover some of these natural constraints, psychophysical experiments were performed. The results show that the perceptibility of individual lines in a drawing depends upon the presence of particular local features. First, short lines appear to have lower contrast than long lines. Second, there is a hierarchy of three types of connections between the ends of line segments, and the perceived contrast of a line increases by different amounts, depending upon which types of end connections are present. The natural constraints uncovered in the psychophysical experiments were incorporated into a computer vision module which selectively enhances and segments line drawings. The in-put to the module is a drawing containing straight and/or curved lines, and the output is an enhanced and segmented version of the line drawing. The computations in the module are local and can be performed in parallel. They are efficient because of the simplicity of the operations involved, and because they are performed just once-no iterative relaxation is required. The use of natural constraints results in lines being enhanced in accordance with their perceptual significance. For example, lines which are part of the outer contours of objects are enhanced more than other lines, while lines that are part of object edges are enhanced more than lines which form part of textures or noise. In addition, when one object occludes another, the object in the foreground is enhanced versus the occluded object. When objects are accidentally aligned, the segmentation computed by the module agrees with the segmentation used by humans. By using the natural constraints, many tasks which previously had been thought to require domain dependent knowledge, can be performed using data driven, bottom-up processing.
A forward-chaining production system has been constructed to analyze sequences of forward-looking infrared (FLIR) images. The system exploits relationships between various scene elements to aid in the identification of small objects. The system also integrates semantic considerations into the process of object/background segmentation. Parameters are first set in a standard edge-based segmenter so as to produce an over-segmentation of the image. A set of production rules is used to merge segments and adjust parameters depending on the initial interpretations of the segments, relationships between segments, and on interpretations of the final segments. Results compare favorably with those obtained from the same standard segmenter used with parameters optimized for interpretation-blind operation.
This paper discusses an ongoing development effort to utilize knowledge based techniques for analyzing terrain. The application focuses on analyzing terrain to determine the likely location of various military forces and to determine the movement (traversability) and likely classifications of those forces. The effort has involved (i) the development of representations of terrain objects that facilitate automated reasoning and (ii) the development of a rule based system that reasons about the terrain objects and develops hypotheses about terrain characteristics. The system is largely goal directed and can answer questions such as: Is this a good location for a headquarters unit? Is there a traversable path (by, say, a tank) between points A and B? Where are likely locations for artillery sites? The rules from the system were obtained from domain experts on military tactics. The terrain representation is semantically oriented (rather than grid oriented). The scene is represented as a series of "trees" (using largely quadtree and k-d tree formulations) with each tree representing a terrain feature. Trees are logically "combined" to fire more complex rules. The data in the system can be derived from DMA data bases or other sources.
This paper presents a computer vision system for identification of overlapping workpieces. The system consists of preprocessing, model training and workpiece separation. To perform workpiece separation, structure information of both boundary and shadow is utilized. The proposed separation procedure consists of two steps. The first step locates the position and the orientation of the workpieces and groups the segments belonging to the same workpiece together for further use. The second step identifies the top workpiece through a hypothesis and verification procedure. The hypothesis about the top workpiece is based on the information from the first step. The inside edge, which contains partial boundary of the top workpiece, is used to verify the hypothesis. A database is established for representing the structure and geometric properties of the workpieces. Three frames represent outer boundary , inside edge and a single workpiece model respectively, and they compose a three-frame system. Experimental results are presented for the scenes with two or three overlapping workpieces.
A vision system is described for depalletizing steel cylindrical billets for a forging application. An algorithm for accurately locating and measuring the billets is described in some detail. Highlights of this discussion include an algorithm for adaptive threshold selection to accommodate changing image brightnesses and a special robot calibration procedure which enables inference of depth using only a single camera view combined with prior knowledge about the scene. Experimental results are presented which show that the system provides accurate measurements in spite of poor, inconsistent contrast.
Industrial robots are now proven technology in a variety of applications including welding, materials handling, spray painting, machine loading and assembly. However, to fully realize the potential of these universal manipulators , "intelligence" needs to be added to the industrial robot. This involves adding sensory capability and machine intelligence to the controls. The "intelligence" may be added externally or as integral components of the robot. These new "intelligent robots" promise to greatly enhance the versatility of the robot for factory applications. The purpose of this paper is to present a brief review of the techniques and applications of intelligent robots for factory automation and to suggest possible designs for the intelligent robot of the future.
Determining the planar orientation of the ground surface provides useful additional cues for enhanced performance in various image processing applications. For example, it can provide conflict resolution in target aspect and thus improvement in target classification. It can help in ground vehicle navigation where extremely high slopes or banks would reduce the traversibility of the vehicle. Similarly, it can also be useful in nap-of-the-earth air vehicle navigation through terrain following terrain avoidance. This paper discusses a passive technique to estimate ground plane orientations for scene interpretation. The scene registration problem is handled through a two-dimensional segment-free differential operator. Ranges to points in the scene are computed via an optical flow generation technique and the scene reconstruction is accomplished by a combination of first and second order gradients of the range map.
This paper presents a total system approach to image exploitation. An ensemble of auto-matic object recognizers (AOR) is described, each with a unique partition of the image being exploited. These AORs are individually tailored to a specific task, thereby limiting their computational requirements. Understanding the contextual content of the image is an important feature of this strategy. The extent of this contextual knowledge is used to partition the image, thereby restricting the range of each AOR. An autonomous navigation rule system is involved to register the image with a prior scene model knowledge base. This strategy uses a knowledge of sensor geometry and geographic area to determine an estimate of the absolute scene location. Image registration is refined by deriving a scene model of the image under exploitation. Performing a minimum distance graph measure determines the difference between the derived key scene features and the stored feature set in the knowledge base. An absolute scene partition that subdivides the terrain into regions used to direct the AORs is recorded along with the matching scene graph in the knowledge base. This strategy permits exploitation of imagery from various sensors regardless of the type of sensor used to build the knowledge base.
The acronym DIMAPS stands for the group of experimental Digital Image Manipulation, Analysis and Processing Systems developed at the IBM Scientific Center in Palo Alto, California. These are FORTRAN-based, dialog-driven, fully interactive programs for the IBM 4341 (or equivalent) computer running under VM/CMS or MVS/TSO. The work station consists of three screens (alphanumeric, high-resolution vector graphics, and high-resolution color display), plus a digitizing graphics tablet, cursor controllers, keyboards, and hard copy devices. The DIMAPS software is 98% FORTRAN, thus facilitating maintenance, system growth, and transportability. The original DIMAPS and its modified versions contain functions for the generation, display and comparison of multiband images, and for the quantitative as well as graphic display of data in a selected section of the image under study. Several functions for performing data modification and/or analysis tasks are also included. Some high-level image processing and analysis functions such as the generation of shaded-relief images, unsupervised multispectral classification, scene-to-scene or map-to-scene registration of multiband digital data, extraction of texture information using a two-dimensional Fourier transform of the band data, and reduction of random noise from multiband data using phase agreement among their Fourier coefficients, were developed as adjuncts to DIMAPS.
Thinning algorithms, such as the prairie fire or Medial Axis Transformation (MAT) algorithm, are used to extract structure-preserving networks or skeletons from segmented imagery. This is a useful function in image understanding applications where syntactical representation of object shapes is desired. The MAT has several shortcomings, however. The MAT skeletons thinned from two similar shapes may be structurally different due to the introduction of random noise into the segmentation process. This noise may exist as random "holes" within the segmented shape, as minor contour variations, or as spatial quantization effects. This problem is often solved by filtering the image prior to segmentation or thinning, but fine detail may be lost as a result. A syntactical method of removing these noise artifacts from image skeletons and of inferring a unique structure is demonstrated. The algorithms for performing this syntactical processing are coded in LISP. Conditions under which image processing functions are served best by the LISP environment are discussed. Image enhancement and noise are discussed in terms that embrace statistical and syntactical methods of image processing.
The paper describes a pattern recognition method based on syntactic image analysis applicable in autonomous systems of robot vision for the purpose of pattern detection or classification. The discrimination of syntactic elements is realized by polygonal approximation of contours employing a very fast algorithm based upon coding, local pixel logic and methods of choice instead of numerical methods. Semantic information is derived from attributes calculated from the filtered shape vector. No a priori information on image objects is required, and the choice of starting point is determined by finding the significant directions on the shape vector. The radius of recognition sphere is minimum Euclidian distance, i.e. maximum similarity between the unknown model and each individual grammar created in the learning phase. By keeping information on derivations of individual syntactic elements, an alternative of parsing recognition is left. The analysis is very flexible, and permits the recognition of highly distorted or even partially visible objects. The output from syntactic analyzer is the measure of irregularity, and the method is thus applicable in any application where sample deformation is being examined.
A structure for visual tracking system is presented which relies on information developed from previous tracking scenarios stored in a knowledge base to enhance tracking performance. The system is comprised of a centroid tracker front end which supplies segmented image features to a data reduction algorithm which holds the reduced data in a temporary data base relation. This relation is then classified vio two separate modes, learn and track. Under learn mode, an external teacher-irector operator provides identification and weighting cues for membership in a long-term storage relation within a knowledge base. Track mode operates autonomously from the learn mode where the system determines feature validity by applying fuzzy set membership criteria to previously stored track information in the database. Results determined from the classification generate tracker directives which either enhance or permit current tracking to continue or cause the tracker to search for alternate targets based upon analysis of a global target tracking list. The classification algorithm is based on correlative analysis of the tracker's segmented output presentation after low pass filtering derives lower order harmonics of the feature. The fuzzy set membership criteria is based on size, rotation, Irame location, and past history of the feature. The first three factors are lin-ear operations on the spectra, while the last is generated as a context relation in the knowledge base. The context relation interlinks data between features to facilitate tracker operation during feature occlusion or presence of countermeasures.
The advent of advanced computer architectures for parallel and symbolic processing has evolved to the point where the technology currently exists for the development of prototype autonomous vehicles. Control of such devices will require communication between knowledge-based subsystems in charge of the vision, planning, and conflict resolution aspects necessary to make autonomous vehicles functional in a real world environment. This paper describes a heuristic route planning system capable of forming the planning foundation of a autonomous ground vehicle. The route planner described herein is applicable to a variety of natural terrain ground systems such as autonomous tactical vehicles and mobile robot sentries. The route planner discussed consists of four processing stages: (1) scene recognition, (2) route planning, (3) scene matching (4) knowledge-based validation. Each of these stages is discussed and examples are provided where appropriate.
Human photo-interpreters use expert knowledge and contextual information to help them analyze a scene. We have experimented with the Lockheed Expert System (LES) to see if contextual information can be useful in interpreting aerial photographs. First, the grey-scale image is segmented into uniform or slowly-varying intensity regions or contiguous textured regions using an edge-based segmentation technique. Next, the system computes a set of attributes for each region. Some of these attributes are based on local properties of that region only (e.g., area, average-intensity, texture-strength, etc.), while others are based on contextual or global information (e.g., adjacent-regions and nearby-regions). Finally, LES is given the task of classifying all the regions using the attribute values. It makes use of multiple goals and multiple rule sets to determine the best classification; regions, which do not satisfy any of the rules, are left unclassified. Unlike programs which use statistical methods, LES uses contextual information such as the fact that cars are likely to be adjacent to roads, which significantly improves its performance on regions which are difficult to classify.
Industry today has a severe problem in the automatic testing of analog cards. At the Air Force Institute of Technology, we are developing an Expert System based on the structure and function of an analog circuit card to drive automatic test equipment. This system uses, the information contained in the schematic diagram of the circuit as well as fundamental knowledge of electronics and past experience in maintaining the card. One of the most important aspects of this system is its ability to reason about possible faults based upon the function of the subsections of the circuit. This task is accomplished using the type of "second principles" which an electronic engineer would use. It generates which tests the test equipment will conduct and, based upon the results of these test, determine the best test to perform next. In this paper, the basic reasoning mechanism for these systems is discussed.
This paper describes and discusses an airborne mission/route planner system currently under development at Lockheed-Georgia Co. Some of the tasks performed by this system require reasoning symbolically with subjective, incomplete information. Other tasks require precise, synchronized processing of aircraft control parameters. Still other tasks are a blend of these two extremes. This paper presents the design and implementation approach which is being followed to develop this system.
This paper concerns search as an aspect of Artificial Intelligence (AI). In the paper , five topics are called out as necessities to develop AI. The one that we are discussing appears there as the first of the five, and that is for good reason: the role of search is fundamental.
Given a set of image-derived vehicle detections and/or recognized military vehicles, SIGINT cues and a priori analysis of terrain, the force structure analysis (FSA) problem is to utilize knowledge of tactical doctrine and spatial deployment information to infer the existence of military forces such as batteries, companies, battalions, regiments, divisions, etc. A model-based system for FSA has been developed. It performs symbolic reasoning about force structures represented as geometric models. The FSA system is a stand-alone module which has also been developed as part of a larger system, the Advanced Digital Radar Image Exploitation System (ADRIES) for automated SAR image exploitation. The models recursively encode the component military units of a force structure, their expected spatial deployment, search priorities for model components, prior match probabilities, and type hierarchies for uncertain recognition. Partial and uncertain matching of models against data is the basic tool for building up hypotheses of the existence of force structures. Hypothesis management includes the functions of matching models against data, predicting the existence and location of unobserved force components, localization of search areas and resolution of conflicts between competing hypotheses. A subjective Bayesian inference calculus is used to accrue certainty of force structure hypotheses and resolve conflicts. Reasoning from uncertain vehicle level data, the system has successfully inferred the correct locations and components of force structures up to the battalion level. Key words: Force structure analysis, SAR, model-based reasoning, hypothesis management, search, matching, conflict resolution, Bayesian inference, uncertainty.
In this paper we present some of the insight we have gained regarding the development of a flexible know ledge intensive ship message interpreter. In retrospect, it seems as if the following four factors operated to our advantage and contributed to the successful completion of the system: (a) The available natural language parsing technology provided good developmental tools. (b) The know ledge domain was well defined. (c) The university environment in which the research was carried out provided a supportive intellectual context necessary for the development new ideas (d) The productive collaboration with an Al company enabled us, at the last stage of the project, to transformed the computer program into a maintainable product.
The extraction and classification of significant points along a contour is fundamental to many image processing tasks. In this paper, we present a simple process for extracting such points with several appealing properties: the operation is developed in terms of contours which are represented discretely; it is completely local and hence suitable for real time operation in vector or parallel processors; and it is tunable to extract significant points at different resolutions of orientation change along a contour. We also describe its use in linear feature extraction and processing restricted cases of environmental motion where the interest operator associates parameterized attributes with extracted image points. Matching features using these attributes allows for significant computational reductions over schemes based upon correlation matching without any loss of robustness, especially for such cases of restricted motion.
A computer system is described for unsupervised analysis of five sets of ultrasound images of the heart. Each set consists of 24 frames taken at 33 millisecond intervals. The images are acquired in real time with computer control of the ultrasound apparatus. After acquisition the images are segmented by a sequence of image-processing programs; features are extracted and stored in a version of the Carnegie- Mellon Blackboard. Region classification is accomplished by a fuzzy logic expert system FLOPS based on OPS5. Preliminary results are given.
The aim of the experiment described in this paper is to investigate the possibility of describing the motion of a two-dimensional object represented by a sequence of snapshots. The interest of this experiment resides in the fact that the motion patterns we are dealing with are the irregular ones generated when the shape of an object changes the structure of its boundary slightly from frame to frame. The main motivation for considering such kind of motion is given by problems rising in biomedicine. Presently, many diagnostic results are represented by dynamic imagery where a two-dimensional display of an organ must be tracked in a time interval. The experiment is carried out by decomposing the lines representing the contour of the shapes; then a correspondence process is generated to relate segments from frame to frame and a set of low level descriptors are extracted by comparing related segments. Indications concerning a methodology for constructing higher level descriptors are given.
An intelligent system is one that has the inherent capability to achieve specified ends in the face of variations, complexities and uncertainties posed by its task environment . Consequently, an intelligent system must be able to integrate information from a variety of sources and, based on that information, plan and execute a course of action. The focus of this paper is on real-time planning for the class of intelligent systems which includes decision-support systems for piloted vehicles and completely autonomous vehicles.
We describe the development and implementation of DIID, a data-independent natural language interface (NLI) from the perspective of Artificial Intelligence research. In general, we discuss the implications and applications of AI epistemology, methodology and technology for developing state-of-the-art software products such as our DIID system. In particular, we discuss how a unified knowledge representation coupled with appropriate heuristics is necessary to build a powerful, robust, and extensible NLI.
The use of unmanned air vehicles to perform tactical operations is an increasing factor in battlefield strategy. Such systems can inexpensively satisfy a number of military missions in highly contested senarios without hazard to an aircrew. Computer controlled and piloted preplanned missions can be accomplished by autonomous air vehicles. However, such systems are lacking in flexibility to a degree that fails to respond to the fluidic processes of a modern battlefield. The incorporation of piloting capabilities to the unmanned air vehicles greatly increases their flexi-bility, enabling a wider range of mission capabilities, a higher success ratio and a greater survivability. New technological developments, taking advantage of quick response possibilities, allow of real time oper-ation under battlefield conditions. In future warfare, due to interdiction of operational airfields, unman-ned vehicles are likely to be the major source of the exercise of tactical air power. This paper discusses the piloting requirements for unmanned air vehicles as imposed by command and con-trol sequences, visual display, communications and the design and operation of remote control stations.
ERIK (Evaluating Reports using Integrated Knowledge) is a working system that was developed for the U.S. Coast Guard to parse ship messages. ERIK is capable of parsing at an impressive rate of 1000 to 2000 messages per day, sent from merchant vessels in all parts of the world. Since these reports contain vital information it is important that the system can parse and correct them quickly and accurately, and furthermore know when it has failed to do so. This paper will focus on the following three algorithms: The IntaaLatad spaliaL/ReqQanizeL integrates the tasks of recognizing items on the input stream and spelling correction. Traditionally these tasks were separated, with items that could not be recognized passed on to a separate speller. We will describe the process that allows fast expectation-based spelling correction and recognition in one unit. The InteLRLatara is a general control structure that allows parsing of the various fields even when the reports fail to follow a fixed format, contains various types of ambiguities (both structurely and conceptually) and can handle the intrusion of noisy and irrelevant information. Changing Contaxta without laking has the ability to recover from wrong assumptions due to erroneous information and correct the previously parsed structures without the need to reparse what has already been processed. These three algorithms provided the core of the ERIK system allowing it to accurately parse and correct ship messages with confidence in a real time, real world situation.
Automated software generation has been proposed as the solution to the current software crisis. According to this idea, the end user provides a high level specification of his or her needs to a machine and a working implementation is synthesized from this specification. Formal approaches have been found impractical for handling large applications and in most cases heuristics and domain-specific knowledge are used, trading off accuracy and optimality for decreased complexity in execution. The impact of these tradeoffs in the reliability of the final product is investigated and a knowledge organization scheme is presented and used as a framework to describe the architecture of an automatic program synthesis system.
Realistic practical problem domains (such as robotics, process control, and certain kinds of signal processing) stand to benefit greatly from the application of artificial intelligence techniques. These problem domains are of special interest because they are typified by complex dynamic environments in which the ability to select and initiate a proper response to environmental events in real time is a strict prerequisite to effective environmental interaction. Artificial intelligence systems developed to date have been sheltered from this real-time requirement, however, largely by virtue of their use of simplified problem domains or problem representations. The plethora of colloquial and (in general) mutually inconsistent interpretations of the term "real-time" employed by workers in each of these domains further exacerbates the difficul-ties in effectively applying state-of-the-art problem solving tech-niques to time-critical problems. Indeed, the intellectual waters are by now sufficiently muddied that the pursuit of a rigorous treatment of intelligent real-time performance mandates the redevelopment of proper problem perspective on what "real-time" means, starting from first principles. We present a simple but nonetheless formal definition of real-time performance. We then undertake an analysis of both conventional techniques and AI technology with respect to their ability to meet substantive real-time performance criteria. This analysis provides a basis for specification of problem-independent design requirements for systems that would claim real-time performance. Finally, we discuss the application of these design principles to a pragmatic problem in real-time signal understanding.
In this paper problem of detecting unique objects in high resolution multispectral imagery is addressed. The approach is to effectively utilize information extracted from images with available knowledge about various attributes characterizing objects. The analysis involves processing of both spectral and spatial domain information. Feasibility of the approach is verified by performing experiments using high resolution multispectral images. Results are promising and it is believed that computational requirements for the approach can meet specifications of an operational system.
In many fields of pattern recognition it is important to define the pattern-resolution limit of the recognizer. In this paper a preliminary study to determine the power of pattern-resolution in the human vision system is presented. Some experiments have been carried out taking into account hand-written numerals.