We present several procedures which are intended to be part of an overall scheme for achieving rapid pattern recognition of three-dimensional objects by using parallel processing on a hexagonal grid. The basic steps of the scheme are edge detection based on gray-level differences within a seven-pixel neighborhood, local edge thinning, local feature detection and feature correction based on comparison of features within a 61-pixel superneighborhood. Experimental tests of these procedures by serial computer algorithms and by hand simulation on real digital images of simple objects such as a cube, a wedge and a cylinder have shown some success at recognizing key features of the object skeletons. The results indicate that implementation and further development of these procedures on a real hexagonal grid of photodetectors with attached parallel-processing units would be very fruitful.
We propose for a class of trigonometric transforms fast algorithms with a unified structure and a simple data exchange similar to constant geometry isomorphic to the Cooley-Tukey FFT algorithm. One can easily extend many of the parallel FFT approaches for these algorithms. The idea of the method is to localize the nonregularities into the nodes of the Cooley-Tukey FFT type computational graph. Only the basic operation in the nodes of the computational graph will be different for different transforms. Thus a simple programmable processor element for executing of node function can be the basis for parallel constructs.
Texas Instruments' Digital Light Processing TM (DLPTM) technology provides all- digital projection displays that offer superior picture quality in terms of resolution, brightness, contrast, and color fidelity. This paper provides an overview of the digital video processing solutions that have been developed by Texas Instruments for the all-digital display. The video processing solutions include: progressive scan conversion, digital video resampling, picture enhancements, color processing, and gamma processing. The real-time implementation of the digital video processing is also discussed, highlighting the use of the scanline video processor (SVP) and the development of custom ASIC solutions.
We developed a new video image processing system that consists of a laser videodisc recorder (LVR) which is controlled by a personal computer through an RS232C interface. Using this system, we can handle video images without degradation of images due to the damage of videotapes and it becomes easy to conduct multiframe image processing. We wrote a computer controlled image processing procedure for video images. We performed experiments using video images recorded by a normal 8 mm video at night. As a result, it is recognized that our new system and procedures are effective for improving the signal to noise ratio and reducing the line to line jitters. Processed results for car license plate images that are recorded on videotapes are shown. We also show examples of actual cases in which characters on the license plate can be readable after processing.
In this paper, we introduce a new class of discrete parametric trigonometric transforms. We establish the conditions of unitarity of the discrete parametric trigonometric transform matrices, by which we construct a wide range of orthonormal transform matrices. Analysis of proposed unitary trigonometric (cosine, sine and combined sine-cosine) transforms is performed, and efficient algorithms for their computation are developed.
In image rotation, a two-pass algorithm has many advantages over a one-pass algorithm in high speed computation. The reported two-pass algorithm gives a serious performance degradation in high frequency area at large rotation angles (30 degrees to 45 degrees). This paper presents a new two-pass algorithm that overcomes the limitations of previously reported approaches at large rotation angles. The hardware structure for the two-pass algorithm needs only four additional counters. We have also developed a novel three-dimensional Fourier- theoretical basis including the effect of interpolation. A brief comparison of existing techniques and the two-pass algorithm newly suggested is presented. At large rotation angles, the suggested algorithm has almost the same performance as that of the one-pass algorithm and much better performance than that of an existing two-pass algorithm.
We are developing a new statistic method based on the Two-Dimensional Discrete Cosine Transform 2D DCT. This transform is the focus point of the segmentation process which is discussed here. The basic operations are : (1) The image is iransformed from the spatial domain into the frequency domain, (2) The image is treated by blocks. The approach tries to give a detailed description of the image, making easier its following interpretation. it is directed by local indices, using the adapted treatments, selected by criteria of homogeneity and coherence. The method adopted follows a hierarchical step involving three levels. The first level, aims at extracting in a global way, the locations of the edges of the image by eliminating the homogeneous blocks. This treatment is guided by the use of local indices, luminance, energy...etc. At the second level, a classification of the blocks selected in the first phase according to the similariy criterion of the indices (average, variance, entropy...) and aggregation of these blocks allows to create the areas on which we extract the fme details. Their characterisation is made at a very local level. An effective way of characterising the areas is to focus oneself on the attributes that allow the description and characterisation of the texture. However, the study of the texture provides on the image very rich additional information. These allow the selection of the best treatment (edges extraction operators) according to its efficiency, at the quality of its detection and the fineness of the results. Once all the fmer characteristics have been extracted, we merge the results from each area.
The new algorithm of approximate Karhunen-Loeve (KL) expansion and its application to the problem of texture analysis is proposed in the present report. The main idea of the algorithm is to substitute the true two-dimensional correlation function of the image ensemble by the approximate correlation function which has a factorable form. The problem of KL basis construction may be solved, and this complete basis can be used for image processing. We have studied the efficiency of the proposed procedure in comparison with the wavelet fast approximate KL algorithm, and an earlier proposed algorithm of the diagonalization of correlation matrix experimental estimation. For the textures with well defined space- translation symmetry in horizontal or vertical direction the proposed algorithm may give the best results, the calculation complexity being quite equal.
We present a class of PDE-based algorithms suitable for a wide range of image processing applications. The techniques are applicable to both salt-and-pepper gray-scale noise and full- image continuous noise present in black and white images, gray-scale images, texture images and color images. At the core, the techniques rely on a level set formulation of evolving curves and surfaces and the viscosity in profile evolution. Essentially, the method consists of moving the isointensity contours in an image under curvature dependent speed laws to achieve enhancement. Compared to existing techniques, our approach has several distinct advantages. First, it contains only one enhancement parameter, which in most cases is automatically chosen. Second, the scheme automatically stops smoothing at some optimal point; continued application of the scheme produces no further change. Third, the method is one of the fastest possible schemes based on a curvature-controlled approach.
Charge coupled devices (CCDs) are commonly used in image capture devices to measure color information. A common inexpensive imaging device will use a single chip CCD array. Each CCD element is coupled with a filter for measuring the red, green or blue color content at a particular point. Ideally it is desirable to obtain full RGB color information for each point, however this is not possible as elements with different spectral filters can not occupy the same spatial location. As a result of this, color edges may appear at different spatial locations in the individual color planes causing artifacts such as blurry edges and false coloring. This paper proposes an algorithm for enhancing color image data which was captured with typical single chip CCD arrays. The algorithm is based on stochastic regularization using a Gaussian image model with a deterministic line process to realign the edge information. This image model is used in a maximum a posteriori estimation technique, resulting in a constrained convex optimization problem. Computationally this is optimized using an iterative constrained gradient descent algorithm. Results show that the algorithm works well to reduce, and often eliminate, the visible artifacts in this type of color image capture device.
When an interlaced image sequence is viewed at the rate of sixty frames per second, the human visual system interpolates the data so that the missing fields are not noticeable. However, if frames are viewed individually, interlacing artifacts are quite prominent. This paper addresses the problem of deinterlacing image sequences for the purposes of analyzing video stills and generating high-resolution hardcopy of individual frames. Multiple interlaced frames are temporally integrated to estimate a single progressively-scanned still image, with motion compensation used between frames. A video observation model is defined which incorporates temporal information via estimated interframe motion vectors. The resulting ill- posed inverse problem is regularized through Bayesian maximum a posteriori (MAP) estimation, utilizing a discontinuity-preserving prior model for the spatial data. Progressively- scanned estimates computed from interlaced image sequences are shown at several spatial interpolation factors, since the multiframe Bayesian scan conversion algorithm is capable of simultaneously deinterlacing the data and enhancing spatial resolution. Problems encountered in the estimation of motion vectors from interlaced frames are addressed.
While high compression ratio has been achieved using recently developed image coding algorithm, the image restoration is considered as an important subject. We have attempted to restore the images coded by discrete cosine transform (DCT) based coding such as JPEG and MPEG. Some of the conventional adaptive filters are designed to seek a homogeneous region among the predetermined polygonal subregions, then to apply a smoothing operation within the selected subregion. It shall be noted, however, that sometimes the predetermined subregions may be heterogeneous. This fact leads us to a novel idea; instead of examining the predetermined regions, define a lot more flexible region likely to be homogeneous. In order to achieve this, we introduce the binary index: each pixel is classified into either the lower intensity group or higher intensity group based on local statistics. Then a smoothing operation is applied within the pixels having the same group index as the pixel to be processed. Thus our scheme can seek a homogeneous region appropriately. Another advantage is that it can be realized with significantly low computations. It is shown that this approach can suppress the visible artifacts while retaining the fine details such as edge and texture.
Image resampling is used for several purposes such as picture enlargement, image reconstruction, correcting for geometrical distortions and obtaining sub-pixel accuracy. Most of these uses are invaluable for medical, defense and other applications. Most of the resampling and interpolation methods documented in the literature could be grouped under one of two categories; conventional or adaptive. In conventional methods an interpolation function is applied indiscriminately to the whole image. No matter how complex the chosen function is, the resulting image in general suffers from aliasing, edge blurring and other artifacts. Adaptive methods, on the other hand aim at avoiding these problems by analyzing the local structure of the source image and using different interpolation functions and different areas of support. In this paper we present an adaptive algorithm for image resampling manly for zooming up. The algorithm is based on segmenting the image dynamically into homogeneous areas and preserving edge locations and their contrast. Based on the location of the pixel to be computed (within a homogenous area, on its edge or an isolated feature) interpolation, extrapolation or pixel replication is chosen. Algorithm performance (quality and computational complexity) is compared to different methods and functions previously reported in the literature and used and in most of the commercial packages. The advantage of the method is quite apparent at edges and for large zooming factors.
We propose two key modifications to a recent motion segmentation algorithm developed by Wang and Adelson, which greatly improve its performance. They are: (i) the adaptive k- means clustering step is replaced by a merging step, whereby the hypothesis (affine parameters of a block) which has the smallest representation error, rather than the respective cluster center is used to represent each layer, and (ii) we implement it in multiple stages, where pixels belonging to a single motion model are labeled at each stage. Performance improvement due to the proposed modifications is demonstrated on real video clips.
Motion estimation is an important aspect for image sequence analysis, which is part of image sequence coding. Due to its simplicity and robustness the currently preferred motion estimation technique is block matching. The approach described in this paper eliminates the systematic limitations of conventional block matching. The algorithm is based on a resolution pyramid which is treated as a tree with one to one relationships (links) of the pixels between the levels. Within the pyramid motion estimation is done top-down by matching each pixel, which results in a multiresolution motion vector field. The links are iteratively refined according to a homogeneity criterion incorporating the intensity as well as the motion vectors. Hence a segmentation of the image is obtained and motion estimation is performed on regions instead of blocks. Motion estimation and segmentation are performed iteratively in alternating steps. The final segmentation obtained is characterized by homogeneous motion as well as homogeneous intensity.
The proposed system STABIL combines four levels of abstraction. In the first level the foreground is extracted by using a Kalman filter technique. The second level uses the foreground regions in order to seek subsequently for parts of the skin of a human. The three channel color signal is transformed into a 2D color space best representing the color of the skin. The Kalman-filtering speeds up the classification in the case of a stationary camera; in the case of a moving camera the classification is directly applied on the sequence. The regions representing the skin serve as input for the third level estimating the position of the person in the 3D space relative to the camera. The fourth level handles a model of a human using statistical data of their size. The model is adapted to a person in an iterative process taking into account the limits of the movements of the person, the restrictions on the model in order to match the skin regions in the image to the correct person. The result of processing the n-the image of an image sequence is a scaled model projected and superimposed on the n-1-th image showing the correct estimation of the position of the person.
We have developed algorithms for automatic character segmentation in motion pictures which extract automatically and reliably the text in pre-title sequences, credit titles, and closing sequences with title and credits. The algorithms we propose make use of typical characteristics of text in videos in order to enhance segmentation and, consequently, recognition performance. As a result, we get segmented characters from video pictures. These can be parsed by any OCR software. The recognition results of multiple instances of the same character throughout subsequent frames are combined to enhance recognition result and to compute the final output. We have tested our segmentation algorithms in a series of experiments with video clips recorded from television and achieved good segmentation results.
The domain of image sequence analysis in order to get trajectories is widely expanding and, usually, the designed systems match well the application itself. Indeed, according to the constraints taken from the application, more or less information can be extracted and that makes the algorithms more or less complicated. However, all of them depend on the study of temporal events and the trajectory extraction process is led by the introduction of constraints and hypotheses about the objects and then by checking they are good. The proposed paper presents a technique that can fit most of the applications by adapting the spatio-temporal association criteria and shows how it is applied to human motion analysis. After human motion analysis has been presented in the introduction, its problems are listed in part one. In part two, the tracking targets, used to materialize the pertinent parts of the body (to make their extraction easy), and the image sequence analysis system, are described. The automatic target tracking algorithm is presented in the third part. It allows the user to reduce significantly the number of operator interventions during the tracking process. Lastly, we show that the 3D tracking is practically implicit when the 2D tracking is made. In part four, results of 2D and 3D tracking are given. The solving of the target loss problem is also described with examples. This paper ends with some perspectives that can improve our tracking algorithm.
We present a method for automatic image retrieval based on query-by-example (QBE). The proposed method consists of two parts: region selection followed by shape matching. In the first part, the image is partitioned into disjoint, connected regions with more-or-less uniform color, whose boundaries coincide with spatial edge locations. The number and boundaries of resulting regions are adaptively determined by a new fusion technique for combined color segmentation and edge linking. Each region or combinations of neighboring regions constitute 'potential objects.' In the second part, the shape of each potential object is tested to determine whether it matches one from a set of given templates. To this effect, the boundary of each potential object, as well as of each template, is represented by a B-spline. We then proceed to identify correspondences between the joint points of the B-splines of potential objects and templates, respectively, by using a modal shape description. These correspondences are used to estimate the parameters of an affine mapping to register the object with the template. A proximity measure is then computed between the two contours based on the Hausdorff distance. We demonstrate the performance of the proposed method on a variety of images.
We process multispectral satellite imagery to load into our environmental database on the UCSC/NPS/MBARI-REINAS project. We have developed methods for segmenting GOES (Geostationary Operational Environmental Satellite) images that take advantage of the multispectral data available. Our algorithm performs classification of different types of clouds, as well as characterization of the cloud elevations. The resulting information is used to incorporate the texture mapped satellite imagery into a combined model/measurement visualization. The approximate cloud elevations, types, and opacities are used to develop a three-dimensional model of the cloud for use in visualization. Discrete Karhunen-Loeve transformations, or Hotelling transformations, are used to calculate the principle components from the multispectral data. The accurate segmentation and feature extraction of the clouds assists in validation and evaluation of atmospheric prediction models with true remotely sensed data. We demonstrate the integrated measurement model visualization with an open GL application using texture mapping. The spectral data is also used to control the free parameters in the texture mapping of the modeled clouds. We are working on further improvements to develop novel compression techniques utilizing the KLT with segmentation and feature extraction, and also hope to develop algorithms that visualize the compressed imagery directly.
Due to its portability and lack of ionizing radiation, three-dimensional ultrasound imaging is emerging as an important diagnostic tool in medicine. However, ultrasound images are usually noisy because of scattering and other complicated interactions between ultrasonic pulses and human tissue. This makes it difficult to automatically segment the images. Methods such as edge finding and region growing, which are primarily driven by the image data, can be easily diverted by speckle and broken contours. We have developed a new, model-driven segmentation method that automatically finds closed contours in noisy images using a global search. In particular, we have concentrated our research on horizontal image slices of the lower leg. The segmentation for such an image consists of three contours: the outer skin surface and two bones. Using a genetic algorithm, our method searches through the space of all possible contours, where contours are modeled as closed cubic splines. Candidate contours are blurred with a point spread function that approximates the point spread function of the image. These model contours are then compared to the actual image using correlation. We demonstrate the method on ultrasound images of human legs. Because this new algorithm avoids the diversions of noise, it works well in spite of the vagaries of ultrasound images, and it does so with a minimum of tunable parameters.
A dynamic programming edge following procedure is applied to ultrasound images of the carotid artery. The objective is to automatically determine the 'far wall' interfaces of the common carotid artery. The far wall interfaces are then used to estimate the far wall thickness, which is an important metric for disease diagnosis and treatment evaluation. A current system uses human readers to determine the carotid artery interfaces using digitized images on a computer display. This process is time consuming and difficult to control, since readers tend to vary over time in the way in which they identify interfaces. In addition, different readers tend to identify interfaces in slightly different ways. The edge following procedure is designed to apply a consistent and objective criteria to all images in order to reduce the variability in far wall thickness estimates. The edge following procedure works by joining local peaks in the image gradient. The gradient is estimated by a Sobel operator, and dynamic programming is used to join the peaks into a smooth edge. The dynamic programming is necessary to combat the effects of noise and speckle in the ultrasound images. The paper describes the dynamic programming cost function formulation and discuses the algorithm performance.
We have developed a robust method for image segmentation based on a local multiscale texture description. We first apply a set of 4 by 4 complex Gabor filters, plus a low-pass residual (LPR), producing a log-polar sampling of the frequency domain. Contrary to other analysis methods, our Gabor scheme produces a visually complete multipurpose representation of the image, so that it can also be applied to coding, synthesis, etc. Our sixteen texture features consist of local contrast descriptors, obtained by dividing the modulus of the response of the complex Gabor filter by that of the LPR at each location. Contrast descriptors are basically independent of slow variations in intensity, while increasing the robustness and invariance of the representation. Before applying the segmentation algorithm, we equalize the number of samples of the four layers in the resulting pyramid of local contrast descriptors. This method has been applied to segmentation of electron microscopy images, obtaining very good results in this real case, where robustness is a basic requirement, because intensity, textures and other factors are not completely homogeneous.
In this paper, we propose an algorithm which detects boundaries of objects from a color image. The result of this method is a binary image where the boundary points are only represented. A lot of methods of edge detection from color images have been developed before. One of the most efficient ones is based on vectorial computations of the tristimuli R, G, B. But, in the case of complex color images, it is difficult to automatically determine a global threshold, in order to find the boundary pixels. For this reason, we suggest a local thresholding algorithm, using co-operating relaxation process to enhance edge probabilities. The labeling relaxation algorithm processes probabilities which are the result of a gradient application to the different features of the color images. So, this algorithm is able to detect edges from a slope of intensity, saturation or hue. The relaxation algorithm analyzes four classes of pixels. Three classes represent a gradient filter output of the three color image features, R, G, B. At each pixel, the higher the response of a feature value, the higher the probability of the pixel to belong to the class corresponding to this feature. The last label represents the no edge pixel, whose probability to belong to it is computed with the probabilities of the three other ones. For each pixel, the sum of the four probabilities must be equal to 1. The process is iterated as many times as the probability of each pixel to belong to the no edge class is near 0 or 1. We consider that a pixel with a low probability to belong to no edge class represents an edge pixel. The efficacy of the relaxation algorithm depends on the choice of the compatibility coefficients. We propose to compute these coefficients with the initial probabilities of the pixels to belong to the four classes, by the evaluation of the neighboring mutual information between two classes. The compatibility coefficients definition is based on the mutual information of the classes at neighboring points. The presented method has been successfully tested on complex color images and compared with the classic edge detection methods. We show that this local segmentation method is better than a global one.
We present a segmentation algorithm of forward-looking infrared (FLIR) images, which incorporates the model-based segmentation of FLIR images and multiresolution image processing technique. This algorithm can obtain a precise and accurate segmentation of a target, while accelerating the speed of convergence and reducing the influence of background greatly. The simulation experiments with FLIR tank images prove the effectiveness of the algorithm.
When an aircraft is flying in conditions of low level mist or cloud, visibility of terrain features from the cockpit may be low. However, if image enhancement techniques are applied to a sequence of images (captured at 25 Hz with a cockpit-mounted camera), the effective visibility of terrain features can be increased. The main sources of image degradation are sensor noise and the scattering and attenuation of light by haze and fog. We present a two-stage algorithm that reduces such degradation through temporal processing. The first stage involves motion- compensated temporal averaging of a set of consecutive images. The frame-to-frame visual motion is calculated using navigational information, a model of the camera, and a database of terrain elevations. Since this motion is predicted independently of image content, it is unaffected by the degradation. The temporal averaging of a sequence of images produces an averaged image with a decreased level of sensor noise. The second stage reverses the loss of contrast caused by the atmosphere. The total light detected by the camera is the sum of the light scattered from the terrain and that scattered from the atmospheric particles, both of which are functions of the distance from the camera to the terrain. By considering the relationship between depth and brightness, a parametric model for the total light detected is proposed. The parameters of this model provide the means to subtract the component of light due to atmospheric scattering and to then scale the result to compensate for the attenuation of the light reflected from the terrain. The algorithm has been applied to forward-looking image sequences captured in both good and poor visibility conditions. The results show a considerable increase in effective visibility over the unprocessed images.