A novel approach for developing symmetry-based shape representations is presented. Annular operators are used to identify symmetric relationships existing between sets of edge elements. Operators are applied at multiple scales to edge data which has been extracted at multiple scales from a grey level image. Symmetric structures make it possible to identify where objects are positioned within an image, and may be used as a basis for constructing course descriptors for the projected shape of the objects in the scene. The advantages of this method over previous ones are: (1) it can be applied directly to images with no prior segmentation of objects from ground, (2) it allows for the isolation and interpretation of image structure based upon scale, and (3) it is an entirely parallel process. Preliminary results are demonstrated for two grey-level images.
We propose parametric geons as a volumetric description of object components for qualitative object recognition. Parametric geons are seven qualitative shape types defined by parameterized equations which control the size and degree of tapering and bending. The models provide global shape constraints which make model recovery procedures robust against noise and minor variations in object shape. The surface characteristics of parametric geons are discussed. The properties of parametric geons and conventional geon models are compared. Experiments fitting parametric geons to multiview data using stochastic optimization were performed. Results show that unique descriptions of single-part objects with minor shape variations can be obtained with the parametric geon models.
An approach for the recognition of 3D objects from single 2D views is presented. Using perceptual organization, hierarchies of features based on parallelism, co-linearity, and intersection are generated. Our local grouping algorithm is particularly inspired by the formalism defined by Etamadi et al., which is concerned with the formation of self-consistent groupings of straight lines from which all higher level groupings may be derived. Our approach extends this formalism to circular arcs. Surfaces in the scene are then extracted based on perceptual laws of symmetry and closure. The recognition process uses relational graphs of surfaces constructed by establishing the proximity, adjacency, and inclusion relations that exist between the surfaces. We identify closures which can be interpreted as the borders of the visible surfaces of objects and can also be used to describe the 2D shape of the surfaces. We show that a graph can be constructed from the relations between these closures and that similarities can be extracted from two different graphs obtained by analyzing two views of the same scene. Typical results obtained for complex indoor scenes are presented.
We review the traditional approaches to texture modeling schemes and discuss a novel approach based on chaotic dynamics. We utilize 2D chaotic processes to construct textured gray-level patterns (icons), the detailed structures of which can be matched to textured regions in the image of interest. The base equations are relatively simple and by adjustment of very few parameters, a wide range and gradation of textures can be generated reproducibly. Here, we develop a `partial-icon' approach, in which the texture templates are derived from sampled windows within the field of a single icon. A database of such textures can then be created, in which few parameters are required to reference or invoke a texture template. A hybrid descriptor-based scheme is also described so that textured regions in natural scene images can be classified effectively in terms of the contents of a texture database.
Most sampled imaging systems produce aliasing. That is, the sampling process causes spatial frequencies beyond the system's sampling passband to fold into spatial frequencies within the sampling passband. In this way, sampling can produce potentially significant image artifacts, particularly if digital filtering is used to restore (`deconvolve,' high-boost filter) the aliased, sampled image data prior to reconstruction. In this paper we use a model-based computational simulation to process natural scenes in a way that enables the restoration-enhanced `aliased component' of the reconstructed image to be isolated and displayed unambiguously.
We propose an approach to the integration of image analysis, computer vision, and computer graphics. After motivating our work, we examine different techniques of the above mentioned areas. Within our investigation we are focussing on illumination, texture, geometry and motion which play an important role in the different areas. The performance and restrictions of the models are evaluated. As a result, we are able to set up a framework for a common model which integrates the requirements of the different areas. In fact, this represents a first step to a common symbolic image representation. Finally, an outline of an integrated system is stated based on the above examinations. First results gathered with a prototype system for the acquisition of 3D structure and texture from image sequences and the transfer to an animation and rendering system are presented.
This study developed texture extraction techniques for classifying natural background scenes using singular values features. Singular values (obtained using singular value decomposition) were used to produce a reduced one-dimensional feature space of texture attributes of natural scene regions. Scenes with tree, grass, and water regions were taken from FLIR imagery. Classification error was determined using a Bayes error estimate and Bhattacharyya distance was used to quantify separation of features between regions. Although there were variations within regional texture samples, good classification results were obtained using the singular value features.
Certain visual functions, such as reading, are dependent on the high resolution capability of the central visual field. When that area becomes dysfunctional the tasks become difficult or impossible. We have proposed an image warping prosthesis, in which the structure of the image that would otherwise be unseen owing to the scotoma is moved outward and onto portions of the retina that still function. Previously we used normally sighted volunteers with fixated foveation, synthetic scotomas, a limited form of image warping, and externally controlled reading saccades. Their reading rate showed improvement in a significant number of instances. In the next stage, we are prepared to use volunteers with actual, not synthesized, scotomas. The results will be used to design realistic prostheses. Different warpings may better help other visual tasks such as facial recognition. Some of the image warpings designed for reading are shown here and our rationale for considering them are given.
Visual communication can be regarded as efficient only if the amount of information that it conveys from the scene to the observer approaches the maximum possible and the associated cost approaches the minimum possible. To deal with this problem, Fales and Huck have integrated the critical limiting factors that constrain image gathering into classical concepts of communication theory. This paper uses this approach to assess the electro-optical design of the image gathering device. Design variables include the f-number and apodization of the objective lens, the aperture size and sampling geometry of the photodetection mechanism, and lateral inhibition and nonlinear radiance-to-signal conversion akin to the retinal processing in the human eye. It is an agreeable consequence of this approach that the image gathering device that is designed along the guidelines developed from communication theory behaves very much like the human eye. The performance approaches the maximum possible in terms of the information content of the acquired data, and thereby, the fidelity, sharpness and clarity with which fine detail can be restored, the efficiency with which the visual information can be transmitted in the form of decorrelated data, and the robustness of these two attributes to the temporal and spatial variations in scene illumination.
The dynamic range of modern image sensors can exceed the range of a video monitor by an order of magnitude or more. This means that fast and intelligent range compression and image enhancement must be interposed between the sensor and display for effective visual communication. Digital hardware can enhance an image in real time, but the common method for range compression on digital hardware, linear filtering, can severely distort the image. Nonlinear methods must be used to prevent this distortion. In this paper, dynamic range compression of video images, at video rates, is demonstrated. An analog ASIC performs all of the necessary nonlinear image filtering at low power.
At the retinal level, the strategies utilized by biological visual systems allow them to outperform machine vision systems, serving to motivate the design of electronic or `smart' sensors based on similar principles. Design of such sensors in silicon first requires a model of retinal information processing which captures the essential features exhibited by biological retinas. In this paper, a simple retinal model is presented, which qualitatively accounts for the achromatic information processing in the primate cone system. The model exhibits many of the properties found in biological retina such as data reduction through nonuniform sampling, adaptation to a large dynamic range of illumination levels, variation of visual acuity with illumination level, and enhancement of spatio temporal contrast information. The model is validated by replicating experiments commonly performed by electrophysiologists on biological retinas and comparing the response of the computer retina to data from experiments in monkeys. In addition, the response of the model to synthetic images is shown. The experiments demonstrate that the model behaves in a manner qualitatively similar to biological retinas and thus may serve as a basis for the development of an `artificial retina.'
Most image systems fail when subjected to a scene with areas of high and low intensity. The human vision system is remarkable in its ability to detect objects in deep shadows even in the presence of intensely illuminated areas. This paper describes a nonlinear theory, intensity dependent summation (IDS), which optimizes the information in a scene independent of the intensity and variation in the illumination. The IDS model is a spatially adaptive bandpass filter that is locally adaptive and robust to signal noise. For each input pixel, a spread function is generated whose height and area vary with the input pixel intensity. The output pixel intensity is the sum of all overlapping spread functions. This paper describes a large window convolver whose coefficients are a nonlinear function of the individual pixel intensity. The convolver implements the IDS model as well as more conventional linear filters. The adaptive imager (convolver) described produces a 16-bit output image at RS-170 video rate.
We desire to have a joint transform correlator track features in the image of a human retina. Previous binarized digital methods indicated unacceptable limitations in tracking through torsion motions of the eye. To create an extended range of response to eyeball rotation we tried several methods of processing the reference image. We compared laboratory measurements with digital simulations. Based on small statistics and our noiseless models, the results disagree; the digital method has less range, and the optical method has sufficient range (+/- 5 degree(s)) for our purpose.
This paper proposes a massively parallel line feature extraction technique for 2D images. This new scheme uses a modified Hough transform implemented in a massively parallel fashion to extract the line features in an input image. The algorithm is based on the recursive decomposition technique. A parallel Hough transform detects line segments in the subimages of the input image. A bottom up approach then merges these line segments into longer lines. A pointerless tree structure is utilized to store feature information at various levels of the merging process. The line segment merging process is equivalent to climbing the tree representing the line features in the entire image. Techniques for line feature merging and balancing of features, tradeoffs between determination of line properties and computation, and algorithmic complexity are addressed in detail.
A three-stage color video compression system capable of 1,000-to-1 compression has been implemented in real time on a VME based prototype system. Stage 1 digitizes the image at CCD camera resolution, and splits the data into color and contrast channels, in conformance with human perceptual channels. Stage 2 maps coordinates of color imagery to mimic the geometry of the human retinotopic mapping from retina to brain. In parallel, stage 2 also applies spatial frequency masks to the contrast channel, extracting texture patches resembling the patterns found in so-called simple cells of the visual cortex. Stage 2 therefore corresponds to a geometric `impedance match' to human visual perceptual channels, achieving compression by discarding information which cannot be perceived. Stage 3 consists of conventional numeric data compression. The system has been prototyped and demonstrated on real time color imagery.
In order to assist the National Imagery Transmission Format Standard (NITFS) Technical Board (NTB) in selecting new BWC algorithm(s), evaluations of candidate image compression algorithms were performed on the basis of objective and subjective image quality performance, bit rate control, susceptibility to channel errors, and complexity of implementation. Based on these evaluations, which were conducted under the guidance of the NTB, it was concluded that the ISO/JPEG DCT compression algorithm was the most suitable for the NITFS purpose even though two proprietary sub-band coding techniques generally performed better in subjective image quality. Moreover, it was decided that three algorithms would be further evaluated at very low bit rates where the ISO/JPEG DCT does not perform optimally.
The wavelet transform has recently become popular as a tool for multiresolution image decomposition. Simultaneous localization of spatial and spectral information makes the wavelet transform an excellent candidate for the decorrelation stage of an image compression algorithm. This paper describes a simple but effective method of grayscale image compression that uses the wavelet transform. Nonuniform quantizers are applied to the detail subimages of the transform. Detail subimages are then grouped by decomposition level and entropy coded. The lowest resolution image, which results from the all low-pass path through the decomposition, is uniformly quantized and coded directly. Reductions in output bit rate to those of interest to the NITF low bitrate evaluation are achieved through simple scaling of the quantizers and inclusion of run-length coding.
This paper describes the methodology of fractal coding and some new fractal compression results for gray scale images. Image compression is discussed from a systems point of view. Current measures of compression performance are not necessarily correlated with image system performance measures. For example, at a fixed `quality' level, bits/pixel type measures are correlated to transmission times for sending a single image under ideal conditions. However, such measures don't necessarily indicate interaction effects between image coding and systems issues such as communication channel errors and communication channel loading. It is proposed that a systems performance measure of image coding techniques be developed.
This paper proposes use of the hybrid JPEG/recursive block coding (JPEG/RBC) algorithm in low bit rate image coding and presents a quantization matrix (QM) design for the DST blocks that can be used for a wide range of low DST bit rates. The data rate optimization problem encountered in the JPEG/RBC algorithm is discussed and an empirical ratio of the bit rates for the DCT and the DST blocks is obtained for low bit rate image coding. Subjective evaluation of images coded at low bit rates placed JPEG/RBC in the top group of three algorithms among those submitted for possible inclusion as the next generation bandwidth compression algorithm in the National Imagery Transmission Format Standard.
The task of image coding is to improve the efficiency of visual communication channels. This entails minimizing the amount of data required to transmit the information about the radiance field. We assess this task in the context of visual communication channel design including image gathering, coding, and Wiener restoration which results in channel designs with significantly improved performance. Conventional assessments are limited to the digital transmission channel beginning at the output of the image-gathering device and ending at the input to the image-display device. Our end-to-end assessment, in addition, incorporates these two devices. This assessment combines Shannon's communication theory with Wiener's restoration filter and with the critical design factors of the image gathering and display devices. This provides the metrics needed to quantify and optimize the end-to-end performance of the visual communication channel. The results are described.
In this paper rank filters of adaptive length are presented. The filtering objective of these filters is the suppression of impulsive noise while preserving as much image detail as possible, and also high computational efficiency. In the proposed scheme the window size and shape are adaptively adjusted based on the amount of impulsive noise in an initial starting window. Using smaller windows increases the computational efficiency. To further improve the computational efficiency, median data blocks are used. The loss of image detail is prevented by an adaptive algorithm. Filtering examples demonstrating noise suppression, detail preservation, and computational efficiency are presented.
The results of experimental observation in the field show that the luminance contrast threshold ((epsilon) ) and the color difference threshold ((Delta) (Epsilon) *) are respectively tending towards the constants of 0.05 and 0.8 U*V*W* unit of color difference when the angular subtense ((theta) ) of the sample is larger than 30 feet. Otherwise, under the condition of small angular subtense, both (epsilon) and (Delta) (Epsilon) * are the functions of (theta) . For the black-white sample, the visual threshold is equal to the contrast threshold (epsilon) which increases exponentially with the decrease of (theta) . For the color sample, the visual threshold of color difference is determined not only by contrast, but also by chromatic difference. The effect of (theta) on color difference is reflected in the weighted components of U*, V*, and W* in CIE 1964 U*V*W* uniform colorspace. Nine color samples are used to carry out the experiment and calculation. The results are quantitatively described in this paper. These principles can be applied to image generation, especially to the perspective drawing on computers.
The first and most important link of the crop shear optimization system, profile tracking and recognition (PTR) for hot bar ends, is introduced in this paper. Several features in system architecture are described. In the design of software, several ways are proposed to speed up the processing. A unique set of algorithms for real-time image data processing in C language and assembler language is used. Different thresholding strategy in different position are useful for the system to be used in a hostile environment and with scales on the bar. Experimental results show that the system has high reliability and good anti-interference performances.