Due to the high availability of aerial images, the automated interpretation has become an increasing need. One of the application fields is the interpretation of aerial images of cities. The interpretation allows for a better understanding of the traffic patterns. We have chosen this application because its a challenging task involving complex images. This paper describes the need for techniques beyond the pure numerical and statistical methods. It describes how the use of symbolic oriented rules combined with humanlike reasoning mechanisms can drastically improve results.
A technique for detecting clusters of objects in noisy, cluttered, moderate resolution imagery is discussed. The algorithm is demonstrated on synthetic aperture radar (SAR) data. The approach is based on the use of a nonlinear spatial highpass or 'antimedian' filter, the complement of the median filter. The filter is coarsely tuned to produce maximum response for structures the size of or smaller than the expected object size. The filter is followed by histogram thresholding and connected region processing. Knowledge about the object's shape and the cluster deployment patterns is then used to eliminate false detections. This detection technique is suitable for any imagery where the objects of interest produce sensor responses that form contiguous regions. False clusters due to edge leakage are discussed an a solution formulated.
Autonomous target recognition can be assisted by using CO2 laser radar data which contains 3-D information of the scene viewed from the sensor. Using efficient image processing algorithms such as the Hough transform, the orientations and dimensions of the target can be calculated. This information then can be used by a model-based recognition system to identify the target. The identification is based on an inference procedure which tests hypotheses using the available evidence from the sensory data.
This paper introduces the Smart Sensor project at the Electro-Optics Technology Center at Tufts University and describes one Smart Sensor under development. The overall intent of the project is to develop technologies for electronic "eyes" capable of identifying the shape, location and size of any closed contour (object) in the field of view. The design described herein employs algorithms simulated and proven in software. Results from an incomplete hardware realization using discrete circuit elements are presented.
This paper describes a new technique for estimating the attitude and location of rigid body objects from a single image of the object. his technique has been Implemented in software on workstation platforms utilized by TAU for image sequence analysis.
A basic assumption underlying this paper is that the construction of a high-speed, high-accuracy, and general-purpose artificial recognition system might profit from theories and data about the principles of the human visual system. More specifically, it is argued that perceptual organization in general and symmetry detection in particular are interesting processes required between the initial stage of edge detection and the final stage of object recognition. Therefore, theories and data about detection of regularities such as symmetry in human vision are summarized. Furthermore, a general scheme is proposed that might enable the detection of image regularities in a way that is in good agreement with these experimental data and theoretical models. In addition, some of our own experiments about human detection of bilateral and skewed symmetry in dot-patterns are presented to show the plausibility of the scheme. Although the results are promising, a lot of work remains to be done, both with respect to the empirical foundation of the model and with respect to its mathematical and computational specification.
Object detection is recognized as an important component of a computer vision system. In this paper, we discuss the problem of object detection in high resolution aerial images. We note that object detection task requires analysis of two types of features: intrinsic and relational. The paper presents techniques for texture operator based intrinsic feature analysis and linear structure extraction.
A common and mainly unsolved problem in image processing is occlusion. Occlusion occurs when one or more objects obstruct the sensor's view. In this paper, three methods; a neural network, a superresolving non-parametric predictor and an Extended-Post Context-free Grammar syntactic pattern recognizer are used to generate the missing data. To illustrate these methods, their application to the reconstruction of obscured Roman characters are presented.
A practical approach to tracking aircraft with similar looking silhouettes while maintaining identification is proposed and simulated. Preprocessing is performed using moment invariants that are shown to be almost constant for images in which the object is translated in position, rotated, or changed in scale. These moment invariants are used to train a neural network to identify the aircraft for different aspect angles. Consequently, tracking of a specific aircraft is achieved no matter what the aspect angle, the position, rotation, or scale.
Due to the resolution of current satellite imagery (e.g. SPOT), the extraction of roads and linear networks from satellite data has become a feasible - although labour-intensive - task for a human expert. This interpretation problem relies on structural image recognition as well as on expertise in combining data sources external to the image data (e.g. topography, landcover classification). In this paper different knowledge sources employed by human interpreters are discussed. Ways to implement these sources using current knowledge-based tools are suggested. A practical case study of knowledge integration is described.
In this paper we describe a new pattern recognition method which will allow for a synthesis of approaches based on prior analyses and contextual information with those based on Artificial Neural Networks. We develop a new iterative neural network framework based on a fully parallel probabilistic feedback dynamics. The method allows knowledge about the problem to be built into the network structure. In addition, heuristic search techniques can be incorporated by modifying the probabilities. We illustrate this method with a pattern recognition problem on an infrared image. The performance is better than that of competing methods.
We present a method to detect cylinders in a range image taken from an airplane by a laser ranger scanning the ground at a low grazing angle. First we attempt to detect the outer edges of the cylinder which appear as parallel lines of discontinuity in the range image. The radius of the cylinder is computed from the distance between the two lines of discontinuity. The cylinder rulings appear parallel to the lines of discontinuity in the projection. We find three cylinder rulings and then use two different methods based on geometrical considerations to find the axis of the cylinder given these cylinder rulings. The distance between the axes found by the two methods is used as a criterion for distinguishing between a cylinder and any other geometrical shape.
This paper describes a new technique for the analysis of a sequence of time-varying images taken by a stationary camera. The proposed algorithm is divided into three stages; (i) motion detection, (ii) object location, and (iii) trajectory tracing. In the first stage, two consecutive images are compared and a difference image is formed to detect the presence of any motion. A subimage is then defined in the difference image as the active region in which the motion is analyzed. In the second stage, the subimage is compared with both previous and next frames. The local windows enclosing the object are determined in each respective image. These windows are then used by the last stage for tracing the trajectory of the moving object. Their centers of mass points are computed and ploted on a 2-D plane. The motion trajectory is obtained simply by joining these points. The presented approach is tested for both fast and slow motion cases using the real world images of moving objects in complex scenes.
Technology has evolved to the point that image sequences can be captured, compressed, transmitted, decompressed, and displayed in real time with a system consisting of predominantly off-the-shelf components. The system we have developed is able to decode and display approximately 10 frames/sec (128x128 images) using four 16Mhz Motorola 68020s controlled by an IBM PC/AT. The 68020 processors are connected to a Multibus 1®. Via a BIT3® interface , the PC/AT sees the Multibus as an extension of its bus. We are presently optimizing the four processor system to handle 8-15 frames/sec.
Differential pulse code modulation (DPCM) is a predictive coding scheme which has been studied extensively. This paper describes an adaptive DPCM compression scheme which is based on an algorithm used in National Image Transmission Format (NITF). This scheme estimates the pixel values using linear and bilinear interpolations. It allows more bits to be assigned to pixels in the noisy areas of the image and fewer bits to pixels in the homogeneous areas. Therefore the prediction is more reliable and the bit-rate reduction is more effective. Furthermore, in this paper, a method is proposed which optimizes the quantization of prediction residuals in the sense that the decision levels and output levels are chosen to minimize the mean square quantization error. Results of applying this technique to a gray-scale image are presented. Performance measures are also given. In addition, comparisons are made between this scheme and the algorithm used in NITF.
Two new linear integer transforms, called the Newton and the Stirling transforms, are introduced, some of their properties are explored and their possible application to image data compression is discussed. Computer simulation for digital filtered image results are presented.
Variable rate image coding schemes are an efficient way to achieve low bit rates while maintaining acceptable image quality. This paper describes several ways to design variable rate product vector quantizers (VQ) which use a quad-tree data structure to communicate the VQ's block size. The first is a direct encoding method which uses VQs having previously specified rates. The second uses a threshold decision rule together with a method to compute the threshold to keep average distortion below a given level. This computation is based on the relationship between the quantizer performance function and the source variance. The third design uses a new algorithm to determine stepwise optimum VQ codebook rates to minimize rate while limiting distortion. Quad-trees are used in all cases to communicate block sizes to the receiver. Simulations show that these variable rate VQs encode over 70 percent of the Lena image at a very low rate while maintaining good fidelity. The proposed schemes also preserve edge fidelity, even at low rates.
A new tree-searched VQ scheme called delayed-decision binary tree-searched VQ is proposed in this paper. To alleviate the sub-optimal solution problem of binary tree-searched algorithms, it uses multipath search to find the best matching codevector in a binary-tree codebook. At each tree node, it examines the path error of the 2*M branches extended from M saved nodes, and Only the best M of these branches are saved for the next step. This procedure continues until the end of the tree is reached, and then the codevector of the best matched node among the final M saved nodes is used. In simulations, the delayed-decision algorithm is incorporated in a mean/residue binary tree-searched VQ. It is shown that, on the average, a 20% reduction of mean-square error is obtained when M=8. Therefore, the performance is much improved by better searching the same codebook at the expense of computational costs. Most of all, the image quality is improved without increasing the bit rate.
An intraframe encoding system is described that uses trellis coded vector quantization in both open- and closed-loop configurations. In the open-loop case, a source (image) is blocked into vectors and encoded using an efficient trellis search. In the closed-loop configuration, the difference signal between a source (image) vector and a vector prediction is encoded. An algorithm is described for the design of codebooks for use in trellis coded vector quantization. The performance is evaluated by simulation for a variety of encoding rates and vector codebook dimensions.
In this paper, an architecture suitable for real-time image coding using vector quantization is presented. This architecture is based on the concept of content-addressable memory (CAM) where the data is accessed simultaneously and in parallel on the basis of its content. In vector quantization(VQ), a set of representative vectors (codebook) is generated from a training set of vectors. The input vectors to be coded are quantized to the closest codeword of the codebook and the corresponding index(label) of the codeword is transmitted. Thus, VQ essentially involves a search operation to obtain the best match. Traditionally, the search mechanism is implemented sequentially, where each vector is compared with the codewords one at a time. For K input vectors of dimension L, and a codebook of size N, the search complexity is of order K*L*N which is heavily compute intensive making real-time implementation of VQ algorithm difficult. The architectures reported thus far employ parallelism in the directions of vector dimension L and codebook size N. However, as K>>N for image coding, a greater degree of paralleism can be obtained by employing parallelism in the directions of L and K. This means that matching must be performed from the perspective of the codewords; namely, for a given codeword, all input vectors are evaluated in parallel. A speedup of order K*L results if a content-addressable memory based implementation is employed. This speedup coupled with the gains in the execution time for the basic distortion operation, implies that codebook generation and encoding is possible in real-time (< 15 milliseconds). The regular and iterable architecture is particularly well suited for VLSI implementation.
In several papers, the basic concept of a 64 kbit/s video-codec was presented with different points of emphasis. Based on the system presented in , several methods are investigated to enhance the reconstructed picture quality of this codec. The basic coding concept consists of a reduction of the temporal and the spatial resolution, a block oriented motion compensated DPCM, and a DCT-coding. The investigations include time-filtering, dual spatial resolution, an improved motion estimation, and a modified coder and buffer control.
Pyramid image transforms have proven useful in image coding and pattern recognition. The Hexagonal orthogonal Oriented quadrature image Pyramid (HOP), transforms an image into a set of orthogonal, oriented, odd and even bandpass sub-images. It operates on a hexagonal input lattice, and employs seven kernels, each of which occupies a neighborhood consisting of a point and a hexagon of six nearest neighbors. The kernels consist of one lowpass and six bandpass kernels that are orthogonal, self-similar, and localized in space, spatial frequency, orientation, and phase. The kernels are first applied to the image samples to create the first level of the pyramid, then to the lowpass coefficients to create the next level. The resulting pyramid is a compact, efficient image code. Here we describe a recursive, in-place algorithm for computation of the HOP transform. The transform may be regarded as a depth-first traversal of a tree structure. We show that the algorithm requires a number of operations that is on the order of the number of pixels.
The reduced generalized chain (RGC) code was originally introduced for arc length measurements on digitized contours. For this purpose, it seemed a good alternative for the popular Freeman code. In this paper, it is investigated how closely RGC encoded straight lines fit their originals. The followed approach is to group code elements into specific patterns and thereby reduce the problem to a Freeman-like situation, for which evaluation techniques have already been proposed.
This paper discusses a proposed new parallel architecture, Torus Integrated Machine (TIM) which is designed to incorporate physical compactness, dedicated analog hardware and programmability. Since any typical short-range relaxation algorithm maps into the proposed hardware, the machine could be an ideal testbed for early-vision algorithms (e.g. edge detection, binocular stereo, motion, color, structure from motion.) The toroidal topology enables us to integrate, on a single panel, an entire set of dedicated algorithms, assembled into a self-feeding pipeline and executed in parallel. The result is a low-volume, portable vision machine thousands of times faster than current supercomputers.
A model is developed to simulate variations which may be introduced when printing images with a laser writing device. The following variations are discussed: beam profile, random intensity variation of the laser, random fluctuations in positioning of beam, and vibrational (sinusoidal) variations in beam positioning.
A general representation approach is described which employs a hierarchy of holes and notches. A matching procedure is also described which allows non-ideal image hierarchies to be matched to class representations. The representation and matching methods are demonstrated on a set of handgun photographs. Examples of handguns which are different in detail are shown to exhibit the same class characteristics, while other similarly shaped objects are correctly distinguished from the handgun class.
The use of human visibility functions in image coding serves two purposes. First of all, they indicate the importance of certain pixel values to the human observer. The compression scheme can use this knowledge and allow a larger error for irrelevant features, resulting in a higher compression ratio of the image without visual image quality degradation. The use of the visibility functions is demonstrated in segmentation coding schemes, since these schemes allow a local (in the space domain) modification of the compression parameters. Furthermore, properties resulting from the shape of the visibility function are hardwired into a segmentation compression scheme. The second purpose of thy, visibility functions lies is the evaluation of the compressed image. The functions can be incorporated into a quality measure, describing the visual difference between the original image and the compressed one. The results of some experiments in using the visibility functions are described.
A highly structured model, the vector Gibbs random field model, is presented for color textures. The model gives a precise mapping from a color texture onto a small number of parameters. Also, the model and its parameter values prescribe a texture for synthesis. A brief review of the black-and-white textures and scalar Gibbs random fields is also included.