After a short review of IIR-based subband coders, we describe the features of a software implementation of an efficient image subband coder (SBC). The efficiency of an SBC is heavily dependent upon the filter bank implementation. Experience has shown that at least 90% of the execution time of SBCs is spent in the filter bank. First, we present our implementation based on extremely simple multiplier-free IIR filters. Subsequently, we discuss the implementation task from a programmer's point of view, emphasizing the algorithm implementation using only integer arithmetic. The experiments show that our IIR SBC's execution time is ten times faster than that of an efficient SBC based on FIR filter banks. Furthermore, our codes is twice as fast as the July 91 release of the Free Software Foundation's JPEG implementation, while maintaining comparable performance; our SBC- coded images had somewhat higher signal to noise ratio than those coded with JPEG at the same bit rate. The subjective quality was found to be comparable.
Separable subband coding is an increasingly popular method of image compression. This technique divides an image into spatial frequency bands by convolving it with one-dimensional low- and high-pass filters and then decimating the results. Reconstruction is performed by interpolating the subbands using a mathematically related set of filters. Since the analysis filters are not ideal, their spectral overlap results in aliasing from the subsampling step. However, certain restrictions can be made on the form of the analysis and synthesis filters which will cancel the aliasing when the images are reconstructed. Filter banks so designed are said to have the perfect reconstruction property. This property is lost when not all the subbands are used in the reconstruction process. Sometimes, lack of channel bandwidth or decoder processing power makes it necessary to discard some of the higher frequency bands and reconstruct only the lower frequencies. In this case, the aliasing is no longer cancelled, such that the resulting picture is not just a low-passed version of the original and its appearance is noticeably degraded. In order to improve the picture quality, a technique is proposed which minimizes the amount of aliasing during the reconstruction process when the high subbands are not present. This method models the spectrum of the image and employs an optimal Wiener filter instead of the normal subband low-pass synthesis filter. Although no additional computation is required beyond that for a normal reconstruction, experiments show that the images produced are improved in both a visual and a signal-to-noise ratio sense.
The purpose of this paper is to propose a new scheme for image coding from the point of view of inverse problems. The goal is to find an approximation of the image that preserves edges for a given bit rate. In order to achieve better visual quality and to save computation time, the image is first decomposed using biorthogonal wavelets. We assume that wavelet coefficient sub-images can be modeled by Markov random fields (MRF) with line process. The sub- images are then approximated so their entropy decrease and edges are preserved. Thus, the visual quality of the reconstructed image is controlled. We also look at the MRF model and a monoresolution image approximation method, along with a short overview of wavelet-based multiresolution analysis. Finally, we describe the multiresolution coding scheme and give some results.
This paper describes a study in which several progressive image transmission and reconstruction (PITAR) methods were used to evaluate the Logical Bit-Slice technique, which was developed during this study. As defined in this paper, a method identifies which pixels are to be manipulated and a technique defines how those pixels are to be manipulated. The objective was to determine an efficient technique of extracting partial image information to improve image reconstruction. The measure was to identify which method resulted in the most recognizable image sequence with the least number of transmitted bits. The Logical Bit-Slice technique was incorporated into classical methods, e.g., Zig-Zag method, to create respective modified methods, e.g., Zig-Bit method. A feature of the Logical Bit-Slice technique is that it enables the signal-to-noise ratio of the reconstructed image to be adaptively set to a user- determined values. The principal conclusion was that incorporation of the Logical Bit-Slice technique into existing methods improved PITAR performance with minimal additional processing and complexity. Furthermore, the modified Block method, i.e., the Block-Bit method, provided superior visual reconstruction of high-contrast images.
We consider an image coder that uses a multiscale decomposition. An image is first whitened, and then decomposed into subimages using a wavelet decomposition. A bit allocation algorithm is employed that assigns various rates to the subimages according to the power spectral density of the original image. Based on the bit assignments, scalar quantizers are used for encoding the coefficients. To improve the performance of this coder, we consider a bit allocation procedure that takes the response of the human visual system into account. Finally, we introduce a spatial partition on top of the multiscale decomposition, resulting in a substantial improvement to the image quality.
Representations of color images are discussed with regard to the problem of color image coding. Standard color image representations are seen to have components with considerable redundancy, and, accordingly are ill-suited for coding using standard gray-scale image coders. The notion of an uncorrelated or orthogonal representation, in which component images are independent in an L2 sense, is introduced and is shown to have features desirable as a preprocessor to a color image coder. Experiments using both spatial-domain and frequency- domain coders show that the orthogonal representation leads to a 20% - 70% compression ratio improvement over that of RGB or YIQ representations, with less visually objectionable artifacts at low peak signal-to-noise ratios.
A new algorithm for multirate vector quantization is used for coding image pyramids. The algorithm, called alphabet- and entropy-constrained pairwise-nearest-neighbor (AECPNN), design codebooks choosing sub-codebooks from a large generic codebook. The algorithm is the natural extension of the ECPNN design technique with constrained alphabet. Results of coded image pyramids of the present algorithm are comparable to the ECPNN design technique.
A set of still image compression algorithms developed by the Joint Photographic Experts Group (JPEG) is becoming an international standard. Here we apply a methodology to study the compression obtained by each step of the three-step baseline sequential algorithm. We present results, observations, and analysis on simulating the JPEG sequential baseline system. The primary compression gain comes from run-length coding of zero coefficients. Based on our simulator, a comparison of Huffman coding, WNC arithmetic coding, and the LZW algorithm is also included.
General purpose image compression algorithms do not fully exploit the redundancy of color graphical images because the statistics of graphics differ substantially from those of other types of images, such as natural scenes or medical images. This paper reports the results of a study of lossless predictive coding techniques specifically optimized for the compression of computer generated color graphics. In order to determine the most suitable color representation space for coding purposes the Karhunen-Loeve (KL) transform was calculated for a set of test images and its energy compaction ability was compared with those of other color spaces, e.g., the RGB, or the YUV signal spaces. The KL transform completely decorrelates the input color data for a given image and provides a lower bound on the color entropy. Based on the color statistics measured on a corpus of test images a set of optimal spatial predictive coders were designed. These schemes process each component channel independently. The prediction error signal was compressed by both lossless textual substitutional codes and statistical codes to achieve distortionless reproduction. The performance of the developed schemes is compared with that of the lossless function of the JPEG standard.
Image transmission is a very effective method of conveying information for a large number of applications. Vector quantization (VQ) is the most computational demanding technique that uses a finite set of vectors as its mapping space. It was shown that VQ is capable of producing good reconstructed image quality. However, it has the problem of computation complexity in the codebook creation part. We have found that neural networks is a fast alternative approach to create the codebooks. Neural network appears to be particularly well-suited for VQ applications. In neural networks approach we use parallel computing structures. Also, most neural network learning algorithms are adaptive and can be used to produce effective scheme for training the vector quantizer. A new method for designing the vector quantizer called Concentric-Shell Partition Vector Quantization is introduced. It first partitions the image vector space into concentric shells and then searches for the smallest possible codebook to represent the image vector space, while adhering to the visual perceptive qualities such as edges and textures in the image representation. In this paper, we are presenting neural networks using the frequency sensitive learning algorithm and the concentric-shell partitioning approach for VQ. This new technique will show the simplicity of the neural network model while retaining the computational advantages.
We show that in a standard transform coding scheme of images or video, the decoder can be implemented by a table lookup technique without the explicit use of inverse transformation. In this new decoding method, each received code index of a transform coefficient addresses a particular codebook to fetch a component code vector that resembles the basis vector of the linear transformation. The output image is then reconstructed by summing a small number of non-zero component code vectors. With a set of well designed codebooks, this new decoder can exploit the correlation among the quantized transform coefficients to achieve better rate- distortion performance than the conventional decoding method. An iterative algorithm for designing a set of locally optimal codebooks from a training set of images is presented. We demonstrate that this new idea can be applied to decode improved quality pictures from the bit stream generated from a standard encoding scheme of still images or video, while the complexity is low enough to justify practical implementation.
Recently, vector quantization (VQ) has received considerable attention and become an effective tool for image compression. It provides high compression ratios and simple decoding processes. However, studies on practical implementation of VQ have revealed some major difficulties such as edge integrity and codebook design efficiency. Over the past few years, a new wave of research in neural networks has emerged. Neural networks models have provided an effective alternative to solving computationally intensive problems. In this paper, we propose to implement VQ for image compression based on neural networks. Separate codebooks for edge and background blocks are designed using Kohonen self-organizing feature maps to preserve edge integrity and improve the efficiency of codebook design. Improved image quality has bee achieved and the comparability of new attempts with existing VQ approaches has been demonstrated with experimental results.
The goal of this compression algorithm is to exploit the i750TM video processor to achieve VCR quality video at CD-ROM data rates. The algorithm uses motion compensation with fractional-pixel displacements to produce good predicted images. The residual errors are then encoded using pyramid and vector-quantization techniques. A binary tree is used to efficiently encode large areas of the image with the same displacement vector, and large areas where the residual errors quantize to zero. The remaining quantities are statistically encoded for the programmable statistical decoder of the i750 video processor. This second-generation algorithm has achieved its goal and is now incorporated in current DVI products.
An image compression algorithm is described in this research paper. The algorithm is an extension of the run-length image compression algorithm. The implementation of this algorithm is relatively easy. This algorithm was implemented and compared with other existing popular compression algorithms. As we show in this paper, sometimes our algorithm is best in terms of saving memory space, and sometimes one of the competing algorithms is best. Once the data is in the real memory a relatively simple and fast transformation is applied to uncompress the file. In image processing and computer graphics a scene drawn on the screen requires considerable amount of RAM space. This space in todays computers is in the order of 1 Mb. To create animation it is often required that we calculate several scenes, some of them stored on the disk and considerable number of them stored in real memory. Our algorithm compresses images that ordinarily take in the order of 1 Mb of memory to a few tens of Kbs of memory thus enabling us to store many images in real memory in a relatively small space.
This paper presents an efficient lossless data compression scheme for various kinds of binary images as line drawings and half-tone pictures. We studied a combination of preprocessing and Lempel-Ziv universal coding. To improve the compression ratio, we modified Lempel-Ziv universal coding by dividing its dictionary into classified sub-dictionaries. We obtained improved compression ratios in computer simulation on composite images consisting of mixed test and images.
We propose a highly efficient arithmetic coding method for document images. In the arithmetic coding, series of symbols to be coded are mapped on a coordinate of a numerical line segment which corresponds to a coding space. By using a priori knowledge and statistical knowledge we can improve coding efficiency on that mapping procedure. We describe highly efficient arithmetic coding algorithm and simulation results for document images including characters, figures and pseudo gray-scale images. These document images are dominant parts of facsimile documents. The restrictions have been put in the procedure of arithmetic coding. Simulation results by using CCITT standard documents and pseudo gray-scale images show that our coder achieves about 10% - 23% reduction of total amount of code data in comparison with conventional arithmetic coders.
Huffman coding is an optimal coding method in information theory, but it requires statistical properties of coded data, and can not be realized by hardware in real-time. In this paper, a new data coding method, base-bit+overflowing-bit, is proposed. This coding method does not require statistical properties of coded data and can be realized in real- time. Experiments show that 2 - 3 times compression ratio can be obtained after coding source data with this coding method. The implementation with hardware is also described in the paper.
The regularization of the least-squares criterion has been established as an effective approach in linear image restoration. However, the quadratic smoothing functions employed in this approach degrade the detailed structure of the estimate. This paper introduces the concept of robust estimation in regularized image restoration, and addresses its potential in preserving the detailed structure. Robust estimation schemes have been used in the suppression of artifacts created by long-tailed noise processes. In this paper it is demonstrated that a robust objective function allows the existence of sharp signal transitions in the estimate, by alleviating the penalty on the error associated with such transitions. The optimization approach introduced modifies the stabilizing term of the regularized criterion according to the notion of M- estimation. Thus, an influence function is employed to restrain the contribution of large estimate-deviations in the optimization criterion. Moreover, the utilization of a new entropic criterion as the stabilizing functional is explored. The general structure of this criterion shares many common characteristics with the functions employed in robust M-estimation. The robust criteria provide nonlinear estimation schemes which efficiently preserve the detailed structure, even in the case of an over-estimated regularization parameter.
Point spread function (PSF) models derived from physical optics provide more accurate representation of real blurs compared to the simpler models based on geometrical optics. However, the restorations obtained using the physical PSF models are not always significantly better than the restorations which employ the geometrical PSF models. The insensitivity of the restoration to the accuracy of the PSF representation is attributed to the coarse sampling of the recording device and insufficiently high signal-to-noise (SNR) levels. Low recording resolutions result in aliasing errors in the PSF and suboptimal restorations. In this work, a high resolution representation of the PSF where aliasing errors are minimized is used to obtain improved restorations. Our results indicate that the SNR is the parameter which ultimately limits the restoration quality and the need for an accurate PSF model. As a rule of thumb, the geometrical PSF may be used in place of the physical PSF without significant loss in the restoration quality when the SNR is less than 30 dB.
Iterative techniques for image restoration are flexible and easy to implement. The major drawback of iterative image restoration is that the algorithms are often slow in converging to a solution, and the convergence point is not always the best estimate of the original image. Ideally, the restoration process should stop when the restored image is as close to the original image as possible. Unfortunately, the original image is unknown, and therefore no explicit fidelity criterion can be computed. The generalized cross-validation (GCV) criterion performs well as a regularization parameter estimator, and stopping an iterative restoration algorithm before convergence can be viewed as a form of regularization. Therefore, we have applied GCV to the problem of determining the optimal stopping point in iterative restoration. Unfortunately, evaluation of the GCV criterion is computationally expensive. Thus, we use a computationally efficient estimate of the GCV criterion after each iteration as a measure of the progress of the restoration. Our experiments indicate that this estimate of the GCV criterion works well as a stopping rule for iterative image restoration.
An algorithm for motion-adaptive, temporal filtering of noisy image sequences is proposed. The algorithm is applied in the temporal domain along motion trajectories that are determined using a robust motion estimation algorithm. Filtering is performed by computing weighted averages of image values over estimated motion trajectories. The weights are determined by optimizing a well-defined mathematical criterion so that they vary with respect to the accuracy of motion estimation, and hence the adaptivity of the algorithm. Our results suggest that the proposed algorithm is very effective in suppressing noise without over-smoothing the image detail. Further, the proposed algorithm is particularly well-suited for filtering sequences that contain segments with changing scene content due to a number of reasons such as rapid zooming, changes in the view of the camera, scene illumination, and in the place and time of image recording. Existing algorithms, in general, perform poorly in such cases.
This paper describes the issues associated with using a laser scanner for visual sensing and the methods developed by the authors to address them. A laser scanner is a device that controls the direction of a laser beam by deflecting it through a pair of orthogonal mirrors, the orientations of which are specified by a computer. If a calibrated laser scanner is combined with a calibrated camera, it is possible to perform three dimensional sensing by directing the laser at objects within the field of view of the camera. There are several issues associated with using a laser scanner for three dimensional visual sensing that must be addressed in order to use the laser scanner effectively. First, methods are needed to calibrate the laser scanner and estimate three dimensional points. Second, methods are required for locating the laser spot in a cluttered image. Third, mathematical models that predict the laser scanner's performance are necessary in order to enhance three dimensional data. The authors have developed several methods to address each of these and have evaluated these methods to determine how and when they should be applied. The theoretical development, implementation, and preliminary results when used in a dual arm eighteen degree of freedom robotic system for space assembly is described.
In 1987, Franke introduced a new algorithm for the extrapolation of discrete signals. It allows the reconstruction and the extrapolation of texture segments on supports where the contour information is no longer included. This method is called the selective deconvolution and works as an iterative process using the Fourier transform and it inverse. In this paper, we study the integration of the selective deconvolution method into a multiresolution scheme. In a first algorithm, we propose to apply the selective deconvolution introduced by Franke on each one of the subband signals obtained at the output of an analysis stage. The goal of this integration is both to gain computation time and to obtain a multiresolution representation of the texture segment. Franke proposes the D.F.T. as spectral operator in his algorithm. Actually, more general spectral operators can also be considered. Therefore, in a second part of this paper, we study an iterative selective deconvolution method based on a subband decomposition rather on the D.F.T..
We propose a series of procedures to construct an image as similar as possible to that detected in good illumination conditions (standard image), starting from a low light level (L3) image. In L3 conditions, only a small number of photopulses are detected in the whole image area. An image taken in these conditions appears like a few isolated light points over a dark background. This makes it nearly impossible to recognize an object represented on it. We have developed a method based on the L3 image statistics in order to estimate the intensity received by each pixel. This method consist of a spatial average performed by a photon counting mask and can be used to construct a standard image from only one L3 image. As a second step, we have studied some histogram operations to eliminate the heavy statistics dependence that remains in the post-mask image. The best results correspond to the histogram specification but, to perform it, it would be necessary to know the standard image histogram. The last step of our work is the development of a fitting method to obtain this standard image histogram. This fitting is based on the statistical behavior of the L3 image and can be done using only a post-mask histogram as data.
The problem of phase unwrapping in elevation estimation using interferometric synthetic aperture radar images is approached through the detection of fringe line positions. The position of each fringe line is estimated by fitting the corresponding enhanced phase transition region with a series of basis functions, the coefficients of which are obtained through weighted least squares estimation. The algorithm is applied to a pair of SEASAT SAR images over mountainous terrain. Cosine functions and linear splines are used as the basis functions, and results show the proposed phase unwrapping algorithm can successfully eliminate global errors and reduce local errors.
Accurate interpretation of digitized X-ray mammograms has been limited by the lack of a specialized image acquisition system. We have developed a novel image acquisition system based on an area scanning scientific grade CCD array. The distinguishing features of our system are: (1) fast method of digitizing mammograms and (2) high spatial and photometric resolution. The system is capable of acquiring 6 frames per second where each frame consists of over 1.3 million pixels digitized to 10 bits per pixel, 8 of which are displayed at any one time on a high resolution monitor. The fixed pattern and the random noise (of optical and electronic origin) are minimized using background subtraction and signal averaging techniques. The resulting image is equal to or better than that obtained by a drum laser- scanning microdensitometer. In order to restore the image from the degrading effects of the system blur and noise, Wiener filtering is used. The modulation transfer function of the system is measured using a bar pattern test object and also the classical knife edge technique. In one filter implementation the ensemble power spectrum of the mammograms is estimated from the degraded images. The noise is assumed to be independent from the signal and its power spectrum is estimated from selected smooth regions of the noisy and blurred images. The spectral energy of the relatively low contrast soft tissues images on a mammogram are generally concentrated near zero frequency. In an alternative implementation the noise to signal power ratio is assumed constant. The results show a marked improvement in detectability of smallest particles of microcalcifications when judged by a human observer. The impact of our image restoration in clinical detection of microcalcifications by a radiologist will be tested.
The concept of an error image is defined for the error diffusion algorithm. Ordinarily hidden from view, the error image is a visual representation of the internally generated errors from which the algorithm derives its name. In this paper, it is shown that the error image contains a linear component of the input image, which induces edge enhancement in the output error diffusion image. Examples are shown for three different error weight distributions: a 1-D one- ahead distribution, the standard 4-element distribution, and a 12-element error distribution. The amount of edge enhancement in the corresponding algorithm is shown to vary with the amount of input image information present in the error image.
By defining a performance metric in terms of a model of the human visual system, we can develop a criterion for the accuracy of a halftoning algorithm. Directly optimizing this criterion leads to an algorithm based on the Gibbs sampler. The resulting algorithm is more computationally expensive than error diffusion: however, it is local, highly parallel, and has a number of other desirable properties. For high quality halftoning, it is necessary to take into account the physical characteristics of the display device. The proposed algorithm can do this in a direct manner. For black and white monitors, this leads to an adjustment in the calculated brightness of the first pixel turned on in raster order; this brightness adjustment is significant, as the first pixel in such a run has been measured at only 70% of the brightness of succeeding pixels. In printed halftoned images, ink overlap leads to a decrease in the perceived darkness of adjacent dark pixels; again, the difference is significant, having been measured at 16% effective overlap. On color patterns, it is possible to directly measure the colors produced by the printer and use them in formulating the error measure for halftoning. Adjusting for these effects, which cannot be done with error diffusion algorithms, leads to perceivable increase in halftoned image quality.
We investigate an efficient color image quantization technique that is based upon an existing binary splitting algorithm. The algorithm sequentially splits the color space into polytopal regions and picks a palette color from each region. At each step, the region with the largest squared error is split along the direction of maximum color variation. The complexity of this algorithm is a function of the image size. We introduce a fast histogramming step so that the algorithm complexity will depend only on the number of distinct image colors, which is typically much smaller than the image size. To keep a full histogram at moderate memory cost, we use direct indexing to store two of the color coordinates while employing binary search to store the third coordinate. In addition, we apply a prequantization step to further reduce the number of initial image colors. In order to account for the high sensitivity of the human observer to quantization errors in smooth image regions, we introduce a spatial activity measure to weight the splitting criterion. High image quality is maintained with this technique, while the computation time is less than half of that of the original binary splitting algorithm.
The quality of color correction is dependent upon the filters used to scan the image. This paper introduces a method of selecting the scanning filters using a priori information about the viewing illumination. Experimental results are presented. The addition of a fourth filter produces significantly improved color correction over that obtained by three filters.
The use of mathematical morphology in low and mid-level image processing and computer vision applications has allowed the development of a class of techniques for analyzing shape information in monochromatic images. In this paper, we extend some of these techniques to color images. We have investigated the application of various methods for 'color morphology'. We present results of our empirical study for three different applications: noise suppression, multiscale smoothing, and edge detection.
Two issues are involved in color image quantization: color palette selection and color mapping. A common practice for color palette selection is to minimize the color distortion for each pixel (the median-cut, the variance-based and the k-means algorithms). After the color palette has been chosen, a quantized image may be generated by mapping the original color of each pixel onto its nearest color in the color palette. Such an approach can usually produce quantized images of high quality with 128 or more colors. For 32 - 64 colors, the quality of the quantized images is often acceptable with the aid of dithering techniques in the color mapping process. For 8 - 16 color, however, the above statistical method for color selection becomes no longer suitable because of the great reduction of color gamut. In order to preserve the color gamut of the original image, one may want to select the colors in such a way that the convex hull formed by these colors in the RGB color space encloses most colors of the original image. Quantized images generated in such a geometrical way usually preserve a lot of image details, but may contain too much high frequency noises. This paper presents an effective algorithm for the selection of very small color palette by combining the strengths of the above statistical and geometrical approaches. We demonstrate that with the new method images of high quality can be produced by using only 4 to 8 colors.
High frequency digital magnification is an application of two dimensional convolute integer (TDCI) technology. The results are images magnified by frequency sensitive convolution operators for replacement and interstitial point generation. The images to be presented are magnified from times two to times thirty-two with resolution enhancement. The artifacts, stair stepping diagonals, blocky gray area and blur, associated with classical digital magnification are absent. The frequency sensitivity of the operators will be illustrated with the use of mask sizes from 2 X 2 to 8 X 8 on a sample image. The use of resolution enhancement will be illustrated to show how the frequency response of an image is increased. The symmetry properties of TDCI technology which minimize the number of multiplications per convolution and enhance the overall system's throughput will be discussed, as well as execution times. This application of TDCI technology is coded in 'C' for the field of digital image processing software.
Image interpolation systems are used to render a high resolution version of an image from a lower resolution representation. Conventional interpolation systems such as bilinear interpolation and nearest neighbor interpolation often perform poorly (in a subjective sense) when acting on a spatial region of an image which has an oriented structure such as an edge, line, or corner. Recently, systems based on directional interpolation have been presented which yield improved performance on these oriented structures. However, separate models are used for the detection of edges, lines, and corners. In this work, we combine simple building blocks (a Sobel edge detector, directional interpolation, and the directional filter bank) to form a system which exploits the orientational tuning and the spatial-frequency variant sensitivity of the human visual system. The new system handles all of the oriented features in the same manner. First, the image to be processed is split into its directional components. These directional components are then individually interpolated using the directional interpolation system. Since orthogonal (or nearly orthogonal components) are contained in different directional components, corners do not exist in the directional components. Furthermore, since the directional filter bank is an exactly reconstructing structure, the representation of the image in terms of its directional components is complete and thereby an invertible decomposition. In the absence of an oriented component, the directional interpolation system reverts to bilinear interpolation (though any other interpolant could be used).
This paper describes an algorithm for enhancing very low-contrast and degraded images. This problem is solved by applying an adaptive image enhancement algorithm to improve the quality and readability of degraded documents. The algorithm is a combination of an adaptive spatial filter and an adaptive gray-scale quantizer. The adaptive spatial variant filter sharpens the image information while suppressing the background noise. It does this in two steps; first, it classifies each image pixel as either signal, background, or noise by using moment estimates in windows of various sizes. The algorithm uses this classification to choose either a high-pass or low-pass filter to filter the image region. The adaptive quantizer is implemented by estimating the average background and foreground levels of the signal and adjusting the quantizer appropriately. The algorithm has been simulated on a wide variety of scanned imagery with significant improvements in image sharpness and contrast, and text legibility. Examples are included.
We propose a new high speed template matching algorithm named edge point template matching (EPTM), which can match one gray image to another closely similar image and detect small differences between them. This method uses location, strength, and direction of contours in the template image. They are stored in a one-dimensional array. This reduced template makes the computational cost lower than previous methods which have a two- dimensional template. Generally, this kind of template reduction causes a mismatch when the image is disturbed. Contour dilation of the target image improves this situation. By applying the coarse-fine algorithm and the sequential similarity detection algorithm, our method is approximately 300 times faster than the well known cross-correlation technique. A simple hardware architecture is enough to implement the algorithm, and it is possible to execute matching a 400 X 400 template on a 512 X 512 target image within 200 msec.
The moments of local morphological granulometric pattern spectra are employed to classify texture images, the novelty being the use of maximum-likelihood techniques do design the classifier. Classification is adapted to the presence of noise and minimal feature sets are obtained. Using a database of ten textures, it is seen that a small number of granulometric moments from among the mean, variance, and skewness (resulting from a small set of structuring primitives) is sufficient to achieve very high accuracy for independent data in the absence of noise, and to maintain high accuracy in the face of some commonplace noise types so long as good noise estimates are available.
We present a new edge detection algorithm that is specially suited for stereopsis. We first define the quality criteria to be met and the properties required: structural stability, accurate localization, detailed representation, safe extraction, and fast computation. Then we describe the classic methods, explain their defects, and derive our algorithm. The main idea of this method is to use a nonlinear process that removes noise-induced edges, rather than filtering the image through a strong linear filter which would generally impair the edges and remove detail. For a given resolution and noise, our method provides a better result according to the stated criteria. Additionally, only three parameters have to be set and the behavior of this system is stable with respect to them. We think learning methods can be used to automatically compute such parameters. Finally, we show some experimental results with different types of images.
This report proposes a new hierarchical Hough transform method with a multiresolution cell. The local Hough transform is applied to each cell constructed from the neighbors of each pixel of an image. The main idea is that cell size is controlled adaptively until the maximum peak value can be achieved. This is made possible by using a multiresolution image pyramid. All peak values of the local Hough transform are then integrated into the final Hough transform value. Using this method, precise and simultaneous line extraction is made possible from characters and symbols of various sizes and line widths. This method was applied to line extraction from hand-written characters and symbols. The first example is fine-to-coarse processing that is applied to characters with wide lines, and the second is coarse-to-fine processing that is applied to small symbols. Finally, this method is applied to extract strokes of characters by a pyramid linking methods.
The basic idea of edge detection is to locate positions where changes of image values (e.g., gray levels) are large. Many edge detection algorithms based on this idea compute derivative of the image function and locate edges at local derivative maxima. One problem is that the local derivative maximum may not be at a precise edge location because it ignores the 'contribution' to the edge from the surrounding pixels where the derivatives are non- maximum. We present in this paper a new edge detection method that locates edges at the probability expectation of first-order derivative in a neighborhood of an edge. This approach enables us to achieve subpixel precision (edges may not be at sample pixels). In addition, use of expectation has noise-reduction effect. Preliminary experiments showed good results produced by the proposed method.
Binary image processing techniques are often used to reduce the computational burden associated with image processing algorithms, thereby reducing costs and increasing throughput. When applicable, this approach is particularly useful for object detection, for which grey-level processing is quite slow. In this paper, a novel algorithm for detecting objects in binary images is presented. The fundamental advantage of this algorithm is the use of binary connectivity edge maps, which represent the edge data in the binary image by detecting eight-connected neighborhood bit (intensity) patterns. Appropriate use of these connectivity edge maps yields object detection which is faster and more robust than traditional methods (image subtraction, image correlation). The efficacy of this algorithm is demonstrated by applying it to the automatic detection and location of alignment marks for semiconductor wafer alignment and directly comparing its performance against those of traditional approaches (image subtraction, image correlation) in terms of accuracy, invariance to object rotation, and algorithm speed.
This paper describes a reconstructible thinning process which is based on a one-pass parallel thinning and the morphological skeleton transformation. It reduces a binary digital pattern into a unit-width, connected skeleton to which labels are assigned enabling perfect reconstruction of the original pattern. The process uses thinning templates to iteratively remove boundary pixels and structuring templates of the morphological skeleton transformation to retain critical feature pixels for reconstruction. The thinning templates together with the extracted feature pixels ensure skeletal connectivity, unit width, and reconstructability. These essential properties are guaranteed regardless of the chosen structuring templates used in the morphological skeleton transformation. The thinning process has been analyzed and results are presented. A number of implementation issues such as the choice of structuring templates and noise filtering have also been addressed.
As a wide range of segmentation techniques have been developed in the last two decades, the evaluation and comparison of segmentation techniques becomes indispensable. In this paper, after a thorough review of previous work, we present a general approach for evaluation and comparison of segmentation techniques. More specifically, under this general framework, we propose to use the ultimate measurement accuracy to assess the performance of different algorithms. In image analysis, the ultimate goals of segmentation and other processing are often to obtain measurements of the object features in the image. Therefore, the accuracy of those ultimate measurements over segmented images would be a good index revealing the performance of segmentation techniques. We feel this measure is of much greater importance than, e.g., error probabilities on pixel labeling, or even specially developed figure of merit. There exist many features describing the properties of the objects in the image. Some of them are discussed here and their applicability and performance in the context of segmentation evaluation are studied. Based on experimental results, we provide some useful guidelines for choosing specific measurements for different evaluation situations and for selecting adequate techniques in particular segmentation applications.
There has been tremendous progress in the areas of image processing (input: images, output: images) and computer graphics (input: numbers, output: images). Unfortunately, progress in image analysis (input: images, output: numbers) has been much slower. In this paper, we first briefly introduce the ideas of image analysis using class 2 dynamical systems and image analysis using class 3 dynamical systems. Then we compare these two approaches. The similarities of the two schemes are: (1) both methods use the ideas of storing information in stable configurations of dynamical systems and compress a huge image into a tiny vector that catches the characteristics of the image efficiently. Both methods are very general and they can be generalized to any type of images. (2) The mapping from an image to numbers is determined by the mapping from a specification of a dynamical system to the corresponding attractor it contains. (3) All image analysis algorithms (quasi-enumerative search, random enumerative search, local search, simulated annealing, and greedy) are similar. (4) Technically, local minima approaches (deterministic or probabilistic) remain to be the only best approach. All the limits, including converge to wrong minima and hence increase the error of analysis, apply for both approaches. (5) Theoretically, the unsolved problems (information capacity) are similar.
There has been tremendous progress in the image processing (input: images, output: images) and computer graphics (input: numbers, output: images) area. Unfortunately, progress in image analysis (input: images, output: numbers) has been much slower. In this paper, we introduce the ideas of image analysis using Hilbert space which encodes an image to a small vector. An image can be interpreted as a representation of a vector in a Hilbert space. It is well known that if the eigenvalues of a Hermitian operator is lower-bounded but not upper- bounded, the set of the eigenvectors of the operator is complete and spans a Hilbert space. Sturm-Liouville operators with periodic boundary condition and the first, second, and third classes of boundary conditions are special examples. Any vectors in a Hilbert space can be expanded. If a vector happens to be in a subspace of a Hilbert space where the domain L of the subspace is low (order of 10), the vector can be specified by its norm, an L-vector, and the Hermitian operator which spans the Hilbert space. This establishes a mapping from an image to a set of numbers. This mapping converts an input image to a 4-tuple: P equals (norm, T, N, L-vector), where T is a point in an operator parameter space, N is an integer which specify the boundary condition. Unfortunately, the best algorithm for this scheme at this point is a local search which has high time complexity. The search is first conducted for an operator in a parameter space of operators. Then an error function (delta) (t) is computed. The algorithm stops at a local minimum of (delta) (t).
Although humans live in a 3D world, their immediate perception is 21/2D, i.e., a 2D description of each surface and their (relative) distance from the viewer. One of the most well- known processes which compute such information is the binocular stereopsis. However, depth information obtained from stereoscopic images is sparse, available mainly at the edges of objects. For such images with no texture, it has been shown that humans employ a linear interpolating mechanism to recover depth information at other points in the image. Displaying such an object poses two problems. The first is concerned with deriving a suitable representation of each surface in view. This problem is compounded by the fact that the distance information obtained is inaccurate. The second is concerned with deciding which surface a point belongs to. This problem is simple for a convex polygon but proves to be difficult for any concave or ambiguous polygons. Using an argument principle in an analytically continued region, an algorithm is derived which can determine a point's location by accumulating the angular increment along the boundary of the polygon. If the sum is +/- 2(pi) then the point is inside the polygon. If it is O(pi) the point is outside.
Image models are a fundamental component of mathematically-founded image processing algorithm. Nonlinear, adaptive, local image models have a significant similarity with methods for interpolating data in many-dimensional spaces. The applicability of these methods will be demonstrated by mathematical analysis and experimental application to natural color scenes. In particular, a modification of the method of radial basis functions will be evaluated and various techniques for determining radial basis centers will be compared: random choice, k-means optimal, and two iterative constructive techniques.
This paper describes how motion segmentation can be achieved by analyzing of a single static image that is created from a series of picture frames. The key idea is motion imaging; in other words, motion is expressed in static images by integrating, frame after frame, the spatiotemporal fluctuations of the gradient gray level at each local area. This tends to create blurred or attached line images (images with lines that show the path of movement of an object through space) on moving objects. We call this 'motion texture'. We computed motion texture images based on the animation of a natural scene and on a number of computer synthesized animations containing groups of moving objects (random dots). Moreover, we applied two different texture analyses to the motion textured images for segmentation: a texture analysis based on the local homogeneity of gray level gradation in similarly textured regions and another based on the structural feature of gray level gradation in motion texture. Experiments showed that subjective visual impressions of segmentation were quite different for these animations. The texture segmentation described here successfully grouped moving objects coincident to subjective impressions. In our random dot animations, the density of the basic motion vectors extracted from each pair of successive frames was set at a constant to compensate for the dot grouping effect based on the vector density. The dot appearance period (lifetime) is varied across the animations. In a long lifetime random dot animation, region boundaries can be more clearly perceived than in a short one. The different impressions may be explained by analyzing the motion texture elements, but can not always be represented successfully using the motion vectors between two successive frames whose density is set at a constant between the animations with the different lifetime.
The proposed artificial intelligence-based vision model incorporates natural recognition processes depicted as a visual pyramid and hierarchical representation of objects in the database. The visual pyramid, with based and apex representing pixels and image, respectively, is used as an analogy for a vision system. This paper provides an overview of recognition activities and states in the framework of an inductive model. Also, it presents a natural vision system and a counterpart expert system model that incorporates the described operations.
Most of the information related to the morphology of galaxies relies on the presence, shape and texture of patterns such as bars, rings, and spiral arms. Automatic classification of astronomical images requires therefore that objects represented in the image are indexed by the recognition of such morphological structures. We describe a methodology to segment well- known images of galaxies in their morphological features in an automated fashion. The detection, description and classification of these features is achieved through the use of attributed rewriting systems that allow the identification of structures according to rules formalizing the interpretation strategy followed by astronomers in the visual classification of galaxies. A priori knowledge about the physical characteristics of the objects in the image, as well as the image acquisition process controls the interpretation activity of the automated system allowing the design of flexible algorithms that can be tailored to a wide range of images differing in resolution, noise characteristics, and object orientation. Some results obtained applying the system to CCD images of spiral galaxies are presented.
The DE-1 satellite has gathered over 500,000 images of the Earth's aurora using ultraviolet and visible light photometers. The extraction of the boundaries delimiting the auroral oval allows the computation of important parameters for the geophysical study of the phenomenon such as total area and total integrated magnetic field. This paper describes an unsupervised technique that we call 'minimization-pruning' that finds the boundaries of the auroral oval. The technique is based on concepts that are relevant to a wide range of applications having characteristics similar to this application, namely images with variable background, high noise levels and missing data. Among the advantages of the new technique are the ability to find the object of interest even with intense interfering background noise, and the ability to find the outline of an object even if only a section of it is visible. The technique is based on the assumption that certain regions of the object are less obscured by the background, and hence the information provided by these regions is more important for finding the boundaries. The implementation of the technique consists of an iterative minimization-pruning algorithm, in which a fundamental part is a measure of the quality of the data for different regions along the boundary. Calculation of this measure is simplified by transforming the input image into polar coordinates. The technique has been applied to a set of more than 100 images of the aurora with good results. We also show examples of extraction of the inner and outer boundaries starting from the elliptical approximation and analyzing the image locally around that solution.
In order to automatically analyze electron immunomicroscopy images, we have developed a computerized scheme for the detection and characterization of nanometer-size metal particles. This scheme consists of a preprocessing step aimed at enhancing image features and of an extraction step which performs the analysis of the particles. The method is designed to work irrespective of background and scene illumination. Results are presented for the analysis of immunogold labelling of muscle tissues.
Modem photographic films typically use "T-grain" silver halide crystals. These crystals are "plate-like" in physical shape and are particularly useful because of their high surface area to volume ratio, which is desirable because it allows a relatively large amount of spectral sensitizer to be absorbed to the crystal surfaces . They have been studied for many years but have not been used until relatively recently. Consequently, it is of interest to study the processes involved in making such crystals, and to control and optimize the manufacturing procedure. Unfortunately almost every variation in technique gives rise to a heterogeneous population; hence any statistically significant variation in the heterogeneity is almost impossible to detect by visual examination of electronmicrographs. Therefore a more automated technique for such an examination is very desirable. This paper is an abbreviated version of reference 3.
We show how several basic image compression methods (predictive coding, transform coding, and pyramid coding) are based on self-similarity, and a 1/f2 power law. Phase transitions often show self-similarity which is characterized by a spectral power law. Natural images often show a self-similarity which is also characterized by a power law spectrum which is near 1/f2. Exploring physical analogs leads to greater unity among current methods of compression and perhaps lead to improved techniques.