This paper describes three trends affecting "intelligent image processing". The first trend is the increasing coupling of algorithms and computer system architectures. The second trend is the technology merger of computer graphics and image processing. The third trend is the growing industry capability to create increasingly functional systems applying intelligent image processing. These trends are illustrated by numerous current examples from the fields of military programs and applications of earth sciences. Certain technology projections are made based on the current state of the art and likely new applications.
We present an iterative algorithm for the computation of motion parameters which characterize the motion in the image plane, given a sequence of contours which change in time due to motion in space. Our method is based on the observation due to Waxman and Ullman (1985) that the second-order polynomial approximation of optical flow in the image coordinates provides sufficient information for 3-D motion computation. The use of an explicit flow model enables us to improve normal flow estimates through an iterative process. The algorithm has been tested on the synthetic time-varying images. The optical flows recovered from this scheme are accurate enough to be used for 3-D structure and motion computation. We also discuss the (in)variance of the zerocrossings of the Laplacian operating on the image intensities with respect to motion. The change of perspective view due to relative motion results certain zerocrossings not being preserved as image evolves, thereby deteriorating the accuracy of 2-D motion obtained by analyzing such zero-crossing contours. We present a theorem which relates true local measurement of zerocrossings to the tolerance of optical flow vector. Such analysis suggests an effective way to prevent "unstable" zerocrossings from being used for the matching process of motion computation.
This paper is concerned with the use of circular harmonic expansion for rotation invariant recognition, Mellin transform for scale invariant recognition and the combination of circular harmonic expansion with Mellin transform for rotation & scale invariant recognition. Distance measures between the reference image, or vector reference, and the test image are defined for comparison with a threshold for object recognition. In addition to providing the desired mathematical properties, the class of invariant recognition algorithms examined successfully detected the objects (targets) under different rotations and scale changes.
In this work a discrete fractional Brownian motion (FBM) model is applied to xray images as a measure of the regional texture. FBM is a generalization of ordinary Wiener-Levy Brownian motion. A Parameter H is introduced which describes the roughness of the realizations. Using generated realizations, a Cramer-Rao bound for the variance of an estimate of Hwas evaluated using asymptotic statistics. The results show that the accuracy of the estimate is independent of the true H. A maximum likelihood estimator is derived for H and applied to the data sets The results were close to the C-R bound. The MLE is then applied to sequences of digital coronary angiograms. The results show that the H parameter is a useful index by which to segment vessels from background noise.
A model-constraint rule-based pattern matching system is considered in this paper. The emphasis is on view angle independent object recognition. Line drawings of objects are used as the input to the system. The system also handles incomplete and ambiguous line drawings, and assigns a certainty indicator for each object recognized. A two-stage approach is adopted where the first stage involves feature extraction and the second one deals with pattern
An algorithm for preprocessing scanner input data is proposed to achieve high quality bilevelization by locally adaptive pixel density enhancement. Symbol recognition based on Segment Pair Matching (SPM) and interactive processing for creating a drawing database are also discussed. The SPM can detect symbols of different sizes simultaneously and their interior patterns are recognized by using distance transformation. Further, a structure model of a drawing is presented. Finally, there is a discussion of how the elements of each model category can be interactively processed by superimposing recognized data on the enhanced image data on a CRT screen.
Computer simulation results of sub-band coding of digital images using Quadrature Mirror Filtering (QMF) techniqueb are reported. Conditions specifying two-dimensional QMF design constraints are derived. The spectrum of the input signal is then decomposed non-uniformly into sub-band images. In the coding process, the lowest band is DPCM coded using a two-dimensional prediction scheme. The remaining bands are PCM coded by a quantizer which minimizes the perceived distortion of the reconstructed signal. The results are then presented in terms of subjective quality of the reconstructed pictures, SNR, and the resulting bit rate.
This paper reviews some recent advances in the theory and applications of morphological image analysis. In applications, we show how the morphological filters can be used to provide simple and systematic algorithms for image processing and analysis tasks as diverse as nonlinear image filtering; noise suppression; edge detection; region filling; skeletonization; coding; shape representation, smoothing, and recognition. In theory, we summarize the represen-tation of a large class of translation-invariant nonlinear filters (including morphological, median, order-statistics, and shape recognition filters) as a minimal combination of morphological erosions or dilations; these results provide new realizations of these filters and lead to a unified image alge-bra.
A feasibility study has been completed on a narrow-band video conferencing system. Gray-scale images are converted to bi-level images and further processed to remove inconsistencies within the mappings. Ordered run-length coding of difference frames is used to update the areas of the image that are most likely to change and to be perceptually important first. Sequences of 15 (128 x 120) bi-level frames/s have been successfully transmitted at rates hard-limited at 19.2 kbits/s.
At least 500,000 profoundly deaf persons in the United States communicate primarily by American Sign Language (ASL), a language quite distinct from English and not well suited to writing. Currently, telephone communication for an ASL user is basically limited to use of a teletype machine, which requires both typing skills and proficiency in English. This paper reviews recent research relevant to the development of techniques which would allow manual communication across existing telephone channels using video imagery. Two possibilities for such manual communication are ASL and cued speech. The latter technique uses hand signals to aid lip reading. In either case, conventional television video transmission would require a bandwidth many times that available on a telephone channel. The achievement of visual communication using sign language or cued speech at data rates below 10 kbps, low enough to be transmitted over a public telephone line, will require the development of new data reducing algorithms. Avenues for future research toward this goal are presented.
A differential pulse code modulation (DPCM) coding technique called "block" DPCM is proposed. It attempts to achieve a bit rate below 2.0 bits/pixel without using adaptive methods. Two slightly different schemes called "maximum" block DPCM and "sequential" block DPCM are introduced. Linear interpolation is used to fill in untransmitted samples. Filtering is also applied in order to improve image quality. The performance of these two schemes are evaluated by simulation results.
Projection onto convex sets iteration based image coding, where efficiently encodable sets are used to describe an image, is a recent approach to image compression. The technique allows the use of a variety of sets to encode an image. The focus of this paper, however, will be on two particular sets: the set of images whose cosine transform is known for certain frequencies and the set of images which are nonzero over a specified region. These sets can be used to encode interframe difference pictures of video teleconferencing images. A drawback of this new type of codec is its computational complexity. In this paper, an architecture to overcome this drawback as well as other implementation related issues such as the quantization effects will be discussed.
Achieving ubiquitous visual communication based on telecommunication networks, computers and display terminals is made difficult by the large number of visual data formats used within systems and communication protocols. Format independent visual communication addresses this problem by resolving format incompatibility and making the diversity of data formats transparent to applications, systems and users. This solution implies a powerful visual information format and an environment that supports format conversion of pictures, text and graphics. A set of Format Independent Visual Exchange (FIVE) tools and visual data types are described that may be added to existing systems to achieve format independent presentation of visual information. The FIVE data types include compressed image data and permit the creation of mixed format information. A high degree of transparency to formats is achieved by providing automatic conversion of the visual information to target formats.
At the present time there are three major video standards in the world. With the emergence of extended quality and high-definition TV systems, the video environment is becoming more and more complicated. In a video research laboratory, it is desirable to have a simulation tool that can perform the following functions: (1) acquire video data in real time, (2) allow for computer access and processing, and (3) play back the processed data in real time. To investigate the visual effects of various video formats and format conversions, it is necessary to design the simulation system such that it can accomodate different video formats at both the input and output stages. In this paper a versatile video sequence processing simulation system developed at Bell Communications Research is presented. Video signals can be captured either in the single frame mode or in the sequence mode. Both the capture and display formats are flexible. Dynamic random access memory (DRAM) chips are used to construct the frame store. Its random access feature allows video data to be read out and played back palindromically*. After its implementation for more than one year, the validity and flexibility of this system has been verified.
Theoretical analysis of an ILAN model of MAGNET, an integrated network testbed developed at Columbia University, shows that the bandwidth freed up by video and voice calls during periods of little movement in the images and silence periods in the speech signals could be utilized efficiently for graphics and data transmission. Based on these investigations, an architecture supporting adaptive protocols that are dynamically controlled by the requirements of a fluctuating load and changing user environment has been advanced. To further analyze the behavior of the network, a real-time packetized video system has been implemented. This system is embedded in the real time multimedia workstation EDDY that integrates video, voice and data traffic flows. Protocols supporting variable bandwidth, constant quality packetized video transport are descibed in detail.
Recently, ways to obtain a new generation of image-coding techniques have been proposed. The incorpordtion of the human visual system (IIVS) models and tools of the image analysis, such as segmentation, are two defining features of these techniques. In this paper, an application of the new approach to the classical linear predictive coding (LPC) of images and an HVS based segmentation technique for the second genera-tion coders will be discussed. In the case of LPC, the error image is encoded using an image decomposition approach and binary image coding. This improves the compression ratio keeping the quality nearly the same. The new segmentation technique can be used in single frame image coding applications to obtain acceptable images at extremely high compression rates.
An improved algorithm is presented which is capable of transforming thick objects in a discrete binary image into thinner representations called skeletons. The skeletal shapes produced are shown to be more isotropic than those produced using other algorithms. The algorithm uses a non-iterative procedure based on the 4-distance ("city block") transform to produce connected reversible skeletons. The types and properties of 4-distance neighborhoods, which are used in skeletal pixel selection, are developed. Local-maxima are included in the skeleton, allowing reversibility using a reverse distance transform. Improved isotropy is achieved by defining pixels with certain types of neighborhoods to be interesting. It is shown that these isotropy-improving pixels may be added to the skeletons produced by any 4-distance-based skeletonizing algorithm that retains all local-maxima without affecting connectedness.
Computer image enhancement of digitized radiographic and conventional photographs have taken advantage of several unique state-of-the-art developments to reveal anomalies in aerospace hardware. Signal processing of such imagery at TASC includes multi-frame averaging to increase signal-to-noise levels, applying specially-developed filters to sharpen details without sacrificing image information, and performing local contrast stretch and histogram equalization to display structure in low-contrast areas. Edge detection, normally complicated in radiographic images by low-contrast, poor spatial resolution, and noise, is performed as a post-processing operation and utilizes a difference-of-Gaussians method and a least-squares fitting procedure. With these software tools, multi-image signal processing allows for the precise measurement (to within + 0.02 inches, rms) of structural motion within a rocket motor during a static test firing as well as identifying stress conditions in turbine blades and matrix anomalies in composite materials. These and other image enhancement examples of aerospace hardware analysis are detailed in the presentation.
A method is proposed to reconstruct a one octave signal from its real zero-crossings. The method is based on the minimization of an Hilbert form, and therefore on an eigenvalue problem for an integral equation. Some experiments on an approximate form of the method are discussed.
This paper describes a PC-based workstation for the acquisition, enhancement, transmission and reception of image and text data. The system was designed to provide both ease of use and reasonable performance using IBM PC technology with a 512 X 512 monochrome frame grabber board (PCVISION). The system allows the user to annotate an image with text legends and graphics. The system utilizes some unique methods for achieving eye pleasing two pixel graphics. An image may also be enhanced in two ways: globally with look up table alterations and locally with contrast and edge enhancing operations. Performance enhancement is achieved by implementing primarily integer operations, thereby elimating floating point operations, and radically increasing the apparent processor speed. We show that designing a system for ease of use may also significantly enhance its responsiveness and speed. For example, the edge enhancement technique only allows selection of low, medium, and high levels of enhancement, simplifying user choices and speeding up process operations. In addition, all look up table enhancements are precalculated and stored, allowing the user to alter the contrast with keystroke rapidity without being concerned about the applicability of the algorithm to a particular image.
An adaptive cosine transform coding scheme capable of real-time operation is described. It incorporates the human visual system properties into the coding scheme. Results showed that the subjective quality of the processed images is significantly improved even at a low bit rate of 0.2 bit/pixel. With the adaptive scheme, an average of 0.26 bix/pixel can be achieved with very little perceivable degragation.
Spatial and temporal information in sequences of images is often represented in different formats and treated separately. Yet careful analysis of retinal organization indicates that it may simultaneously code the spatial and temporal organization of a scene in a fairly simple manner. In this paper we consider the structure and function of the retina and develop a new theory that describes its role in early vision. We also compare the results of this theory with some previously documented neurological data and show that the two are similar. Finally, we propose a parallel implementation that is similar in structure and function to the cellular components of the retina. The result is a convolutional space-time.
With the advancement in computational speed, transform coding has been a promising technique in image data compression. Traditionally, an image is divided into exclusive rectangular blocks or subimages. Each subimage is a partial scene of the original image and they are processed independently. In low-bit rate applications, block boundary can develop due to discontinuities between the subimages. A two-stage transform coding scheme to reduce such effect is proposed. The first stage transformation is applied to the subimages, each being a reduced under-sampled image of the original. The second stage transformation is applied to the transform coefficients obtained at the first stage. A simple coder with discrete Walsh-Hadamard transform and uniform quantization is used to compare the preliminary simulation results obtained from the proposed scheme and the traditional method.
Fitting an image block with ordinary discrete orthogonal polynomials results in large errors at and around the block center. However, using discrete orthogonal polynomials gen-eralized with a normal weight, we have smaller errors at and around the center. In this paper, we show a technique estimating block motion in two successive image frames using the generalized polynomials. The technique fits blocks of two image frames with the polynomi-als and derives constraints from which translational motion components can be determined.
A variety of techniques are present for restoration of images. One of the powerful techniques is Wiener filtering. Performing Fourier tranforms on an image processor for large multi-spectral images involves enormous computational effort. The problem considered here is a way of designing small spatial filters that approximate their frequency domain counterparts so that they can be implemented easily. Such a filter is very useful, particularly if the point spread function affecting different images or different bands of a multi-spectral image can be considered the same. It is also useful if a filter , designed for one section of a large image, can be applied to the entire image. It is a particularly valuable method in a production environment. An example of this is removal of atmospheric effects in satellite images. The objective of this paper is to show that such spatial filters can be very effective, a way to design them and how they are implemented on a commercially available image processor. The limitations of this technique are also discussed. Examples are shown on restoration of atmospherically degraded Landsat Thematic Mapper images.
A general problem with statistically-based estimators for images degraded by additive noise is their dependence on average quatities when image intensities vary rapidly and widely. The Wiener estimator, for example, given the stationary power spectrum on the object image and the noise, is known to produce a noisy effect in the flat intensity regions and a blurring or fuzzy effect in the edge regions on the restored image. The power spectra are usually estimated over regions containing both edges and flat regions and therefore are not truly representative of either regional type. In this work, we accept a nonstationary image model and utilize a novel adaptive windowing technique in conjunction with a nonlinear estimator to overcome the cited defects of other estimators. This technique is applied successively to simulated noisy one-dimensional feature waveforms, an arbitrarily selected noisy image scan line, noisy images with one-dimensional windowing, and noisy images with two-dimensional windowing. In each case, features of the edge and flat regions both are faithfully reconstructed. In fact, the restored images are remarkably sharp and clean. They appear far superior to the comparable Wiener restorations despite the fact that their mean-squared error is about the same or slightly larger.