All automatic target recognizers (ATRs) either explicitly or implicitly use models and compute features, many use feedback, and all rely on some form of data for development and test. This paper compares three popular paradigms used for ATR--the prescreen, segment, classify (PSC) paradigm (commonly used when applying statistical pattern recognition techniques to image sensors), the matched filter (MF) paradigm, and the model-based vision (MBV) paradigm. This comparison is performed initially by considering how each contends with the ATR model space, which is introduced to aid the paradigm comparison. These paradigms are then compared in detail by examining how each uses models, features, feedback, and data to perform target recognition. Based on these discussions, three ATR sensors are examined in terms of the ATR model space to analyze how each sensor's imaging properties either help or hinder the solution of the ATR problem. The ATR model space concept is then used to motivate model-based vision solutions to the multi-sensor fusion problem and to suggest novel sensor combinations that could be used synergistically to attack the ATR problem.
A layered object recognition paradigm is described in this paper. The lower layers of the proposed system extracts rich feature information in the sense of a primal sketch including oriented edges, blobs, corners, and texture primitives from a raw image. The middle layers of the system extracts object parts such as faces, sides and adjacency relationships between them. The highest layers of the system use the information obtained beneath them to recognize the objects. The system consists of a combination of different types of neural networks making appropriate use of their different capabilities. That is, a collection of unsupervised neural networks are employed for generic feature extraction, while a similar collection of supervised networks are employed for learning object-specific shape information. We present some results of a partial implementation of this system.
The matching component of a model-based vision system hypothesizes one-to-one correspondences between 2D image features and locations on the 3D model. As part of Wright Laboratory's ARAGTAP program [a synthetic aperture radar (SAR) object recognition program], we developed a matcher that searches for feature matches based on the hypothesized object type and aspect angle. Search is constrained by the presumed accuracy of the hypothesized aspect angle and scale. These constraints reduce the search space for matches, thus improving match performance and quality. The algorithm is presented and compared with a matcher based on geometric hashing. Parallel implementations on commercially available shared memory MIMD machines, distributed memory MIMD machines, and SIMD machines are presented and contrasted.
This paper is concerned with the problem of searching large model databases. To date, most object recognition systems have concentrated on the problem of matching using simple searching algorithms. This is quite acceptable when the number of object models is small. However, in the future, general purpose computer vision systems will be required to recognize hundreds or perhaps thousands of objects and, in such circumstances, efficient searching algorithms will be needed. The problem of searching a large model database is one which must be addressed if future computer vision systems are to be at all effective. In this paper we present a method we call data-driven feature-indexed hypothesis generation as one solution to the problem of searching large model databases.
In this research, three parallel thinning algorithms and their implementation on the hypercube parallel computer are studied. The analysis and comparison of the algorithms are based on the topological analysis, shapes of the skeletons, processing time, and load balance. A special labeling technique is used which can both save the memory space and improve the processing time in implementation of parallel algorithms on parallel computers.
Recent advances in smart filters make correlator architectures (optical or digital) attractive for automatic target recognition (ATR). They allow one to significantly reduce the search space typically required in matched filter template matching. Use of a hierarchy of correlation filters allows one to address all levels of model based vision including detection (prescreening), segmentation (image enhancement and edge detection), recognition (reduction of false alarms, pose, and size estimation, etc.), and classification (identification of the object type). This significantly reduces the object geometry and contrast search space required.
In order to achieve successful recognition in practical model-based vision applications, many diverse sources of knowledge need to be exploited. By incorporating such information sources as sensor phenomenology, viewing geometry, model part geometry, scene environment, and external situational constraints, we can better recognize parameterized and obscured objects, resolve conflicting hypotheses, and derive the scene interpretation in an efficient manner. A Bayesian modeling and reasoning approach is well-suited for solving this problem by satisfying these goals: integration of diverse evidence sources in an order-independent manner; efficient manipulation of the system for opportunistic control; rational reasoning behavior in terms of graceful degradation as uncertainty increases; and explanation of results in terms of domain properties. We have developed a general model-based recognition Bayesian modeling and reasoning paradigm and tested it by applying it to two model-based recognition problems: military battlefield scene analysis and vehicle classification. The capabilities of this modeling representation include: hierarchical reasoning (coarse to fine, sub-part to whole) for favorable combinatorics; multiple belief propagation strategies for adapting to complex feature interactions; incorporation of static and dynamic constraints; and probabilistic reasoning over both discrete and continuous variables.
The problem of region-based segmentation is examined and a new algorithm for MAP segmentation is introduced. The observed image is modeled as a composite of two processes: a high-level process that describes the various regions in the images and a low-level process that describes each particular region. A Gibbs-Markov random field model is used to describe the high-level process and a simultaneous autoregressive random field model is used to describe the low-level process. The MAP segmentation algorithms is formulated from the two models and a recursive implementation for the algorithm is presented. Results of the algorithm on various synthetic and natural textures clearly indicate the effectiveness of the approach to texture segmentation.
Adaptive topology surfaces can be used to put the segmenting problem of fitting that type of surface (modelized by spline tensorial products) with the edges of an object detected with low- level operators. The advantage of this method is that continuity degrees can give an appropriate solution to missing datas and surface parametrization ensures global coherence to detected edges. Moreover, this method is by nature adaptative: it is easy to control the number of B-spline used to represent the frontier, which makes it possible to improve progressively the result of the segmentation. A priori knowledge can easily be taken into account if provided under the form of a CAD model of the object to segment. Some topological problems may appear: surfaces topologically equivalent to spheres may have to be transformed into more complex topology surfaces (torus for instance). We propose a method to solve that problem.
The typical process of statistical pattern classification is to first extract features from an object presented in an input image, then using the Bayesian decision rule, to compute the a posteriori probabilities that the object will be recognized by the system. When recursive Bayesian decision rule is used in this process, the phase of feature-extraction can be mixed with the phase of classification such that the a posteriori probabilities after adding each feature can be computed one by one. There are two reasons for thinking about which feature should be extracted first and which should go next. First, feature extraction is usually very time consuming. The extraction of any global feature from an object at least needs time in the order of the size of the object. Second, it is very often that we do not need to use all features in order to obtain a final classification; the a posteriori probabilities of some models will become zero after only a few features have been used. The problem is how to arrange the order of feature-extraction operations such that we can use a minimum order of operations to do the right classification. This paper presents two information-theoretical based heuristics for predicting the performance of feature-extraction operations. The prediction is then used to arrange the order of these operations. The first heuristic is the power of discrimination of each operation. The second heuristic is the power of justification of each operation and is used in the special case that some points in the feature space do not belong to any model. Both heuristics are computed from the distributions of models. The experimental result and its comparison to our previous works will be presented.
This paper describes a powerful inexact matching algorithms which has been applied with success to high-level 3D object representations in a 3D object recognition system. The algorithm combines in a promising way several approaches proposed in the past couple of years: an extension to the backtrack strategies for inexact matching of attributed relational sub- graphs, error correction isomorphism, determination of local attribute similarity, and global transformation fitting, features which are efficiently used for search-tree pruning. The algorithm was tested successfully in a series of experiments involving scenes with single and multiple objects.
A computer integrated manufacturing software architecture based on solid modeling techniques and using an underlying distributed database is proposed. Solid modeling based applications development using a tool box concept is suggested. These tools can be used to facilitate efficient functioning in the environment of applications. A case study of mechanical assembly simulation is discussed.
A model-based method is developed with the overall objective of estimating multiple object motion in 3-D from one or more image sequences. Object 3-D positions and orientations are obtained by recognition of 3-D objects from an object library and determining object viewpoints for each image frame. There are two stages in the viewpoint determination. First, the relational structures in an edge line image are matched with the relational structures in 2-D projections of 3-D model lines to select candidate object models and associated viewpoints from the model library. In the second stage, the selected object model is verified using a viewpoint consistency constraints. Thereafter, motion information may be derived. Stereo sensors, where available, provide additional and redundant information. A comprehensive example is shown.
We describe a model-based tracking algorithm designed to follow bright, slowly deforming filaments through a series of two dimensional, gray-scale images. The images can have both structured and unstructured background noise. Tens of filaments can exist in each image; the filaments can cross each other and need not be uniformly bright along their lengths. The algorithm has 3 basic steps. First, a line segment detector (AVS) is applied to find parts of the filaments. Second, these parts are matched to a model of the filaments in the image. The match uses an interpretation tree to find the globally best match of parts of filaments. Third, a new model is produced by choosing a subset of the parts matched to the filaments and interpolating between neighboring parts on the same filament. The algorithm is applied to moving, fluorescently labelled cells.
This paper presents a new method for model-based object recognition and orientation determination which uses a single, comprehensive analytic object model representing the entirety of a suite of images of the object. In this way, object orientation and identity can be directly established from arbitrary views, even though these views are not related by any geometric image transformation. The approach is also applicable to other real and complex- sensed data, such as radar and thermal signatures. The object model is formed from 2-D Hermite function decompositions of an object image expanded about the angles of object rotation by Fourier series. A measure of error between the model and the acquired view is derived as an exact analytic expression, and is minimized over all values of the viewing angle by evaluation of a polynomial system of equations. The roots of this system are obtained via homotopy techniques, and directly provide object identity and orientation information. Results are given which illustrate the performance of this method for noisy real-world images acquired over a single viewing angle variation.
A key capability for an intelligent machine vision system is the ability to autonomously acquire new information about an environment. This is especially true in model-based 3-D object recognition, where a bottleneck exists in the generation of geometric models and the selection of suitable sets of features for recognizing each model. We describe a new model representation, based on the random graph, for use in 3-D object recognition. The random graph is a probabilistic representation of an ensemble of attributed graphs which can describe variations in both the structure and attribute values of structural patterns. The random graph is well-suited to accommodate the uncertain and incomplete nature of real-world data and is able to meet the information requirements for object recognition through the representation of feature visibility, detectability, and variability. In the random graph object model, vertices represent geometric features, such as points, edges, and planar surfaces, and arcs represent topological relations. Uncertainty in geometric feature attributes and in model structure is described by attaching probability distributions to model vertex and arc attribute values. A specific example, the point feature random graph model, has been implemented and is described in greater detail.
A model-based computer vision system that can recognize three-dimensional objects from unknown viewpoints in single gray-scale images is presented. The system is based on an off-line model preprocessing stage, where a 3D recognition-oriented model and a strategy hierarchy are automatically generated. This strategy
hierarchy provides the representation of associations between features detected bottom-up and the data base of object models, enabling the on-line recognition algorithm to be particularly efficient by reducing the recognition to a 2D-matching process. In order to perform an efficient indexing of the model data base (base level of the hierarchy), feature groupings based on the phenomenon of Perceptual Organization are used. Those groupings and structures in the image are likely to be invariant over a wide range of viewpoints. Since an initial estimate for the object and its viewpoint is found, a process of spatial correspondence is performed. This process brings the projections of 3D-models into direct correspondence with the 2D-image, solving the unknown viewpoint and model parameters.
The aspect graph is basically a multiple-view representation of the designated 3-D object. In this paper, we present a complete algorithm for constructing the aspect graph of non-convex polyhedral objects from perspective point of view.
This paper presents a view-independent relational model (VIRM) for use in a vision system designed for recognizing known 3D objects from single monochromatic images within unknown environments. The aim if to establish a model of an object suitable for its recognition automatically without invoking pose information. To generate the VIRM, the system projects a wireframe model of the object from a number of different viewpoints, and performs a statistical inference to select relatively view-independent relationships among component parts of the object. These relations are stored as a relational model of the object represented as a hypergraph associated with procedural constraints. Three-dimensional component parts (model features) of the object, which can be associated with extended image features defined by simple 2D geometrical attributes, are used as nodes of the hypergraph. Co- visibility of model features is represented by arcs of the hypergraph. Other pairwise view- independent relations are used as procedural constraints associated with arcs of the hypergraph.
In this paper, we describe an algorithm for estimating the camera position and orientation (pose) from which a rigid, 3D solid model is viewed. The algorithm consists of two stages. In the hypothesis stage, using a bounded error criterion, maximal sets of corresponding model and image points are found that are consistent with a camera pose. In the verification stage, the local noise statistics and the probability that a visible model point may not be detected are used in pooling responses from individual image regions to decide whether or not to accept the match.
Edge detection is a two-stage process: edge enhancement followed by edge linking. In this paper we develop a linking algorithm for the combination of edge elements enhanced by an optimal filter. The linking algorithm is based on sequential search. From a starting node, transitions are made to the goal nodes based on a maximum likelihood metric. Results of our search algorithm are compared to the nonmaximal suppression approach, as well as to the sequential edge linking algorithm. It is shown that the metric described here is very easy to implement and provides more accurate results.
Three dimensional object recognition is an essential capability for any advanced machine vision system. We present a new technique for the recognition of 3-D objects on the basis of comparisons between 3-D models. Secondary representations of the models, which may be considered as complex scalar transform descriptors, are employed. The use of these representations overcomes the common dependency of matching individual model primitives (such as edges or surfaces). The secondary representations used are one-dimensional histograms of components of the visible orientations, depth maps and needle diagrams. Matching is achieved using template matching and normalized correlation techniques between the secondary representations. We demonstrate the power of this new technique with several examples of object recognition of models derived from actively sensed range data.
The images acquired by means of coherent radiation have a speckled-like appearance caused by random intensity distribution which is formed when the incident waves are either reflected from a rough surface or propagate through a transmission medium with random refractive index fluctuations. Although we are concerned with the speckle fields generated when laser light is scattered, it should be mentioned that closely related phenomena arise also in other regions of the electromagnetic spectrum: synthetic aperture radar imagery, scattering of x rays by liquids, and also for particles like electron scattering by amorphous carbon films. We should not omit the occurrence of the phenomenon in ultrasound imagery. The speckle in images was regarded first as an adverse side effect of the coherency, but later it was found that the speckle process itself is carrying useful information. Now, contributions to the study of speckle patterns are related to the following main areas: fundamental statistical properties, reducing speckle in radar, optical and holographic systems, measurement of surface roughness, applications in information processing using random carriers, applications in metrology and stellar speckle interferometry. The readers interested in a far deeper mathematical understanding of the properties of electromagnetic fields scattered by rough surfaces are suggested to consult the standard reference in this field, Beckman and Spizzichino. This paper presents a qualitative model-based approach to the study of coherent- light imagery systems, relying on Fourier analysis applied to complex speckle fields. The model includes the effect of the object roughness, the free space propagation, the optical system, and the image acquisition and processing system, and generates synthetic images simulating the behavior of the real system. The model was used to describe a time-averaging electronic speckle pattern interferometer (ESPI) system for mechanical vibrations analysis, developed by the authors at The Polytechnic Institute of Bucharest in collaboration with The Laboratory for Laser Developments, The Atomic Physics Institute of Bucharest.