The physical model for the spatial contrast sensitivity of the eye that we presented in a previous paper is extended to the temporal domain by making some assumptions about the temporal behavior of the photoreceptors and the lateral inhibition in the ganglion cells. In this way a complete spatio-temporal model is obtained that gives a simple explanation of the complicated spatio-temporal behavior of the eye. Spatio-temporal and temporal contrast sensitivity curves calculated with the model are compared with measurements published in literature.
The Institute for Telecommunication Sciences (ITS) has developed an objective video quality assessment system that emulates human perception. The perception-based system was developed and tested for a broad range of scenes and video technologies. The 36 test scenes contained widely varying amounts of spatial and temporal information. The 27 impairments included digital video compression systems operating at line rates from 56 kbits/sec to 45 Mbits/sec with controlled error rates, NTSC encode/decode cycles, VHS and S-VHS record/play cycles, and VHF transmission. Subjective viewer ratings of the video quality were gathered in the ITS subjective viewing laboratory that conforms to CCIR Recommendation 500-3. Objective measures of video quality were extracted from the digitally sampled video. These objective measurements are designed to quantify the spatial and temporal distortions perceived by the viewer. This paper presents the following: a detailed description of several of the best ITS objective measurements, a perception-based model that predicts subjective ratings from these objective measurements, and a demonstration of the correlation between the model's predictions and viewer panel ratings.
We are developing an automatic `image quality meter' for assessing the degree of impairment of broadcast TV images. The meter incorporates a model of the human visual system derived from psychophysical and neurophysiological studies. Early visual processing is assumed to consist of a set of spatially parallel, largely independent functional modules; but later stages are more heavily resource limited and constrained by limitations on attention and memory capacity. In line with CCIR recommendations, image evaluation can focus either on detection of the impairment itself (typically, superimposed lines or noise, or color dropout) or on assessment of the perceptible quality of the depicted scene. The observer may choose to attend to either aspect. Experimental studies of human subjects suggest that these two processes are largely independent of each other and subject to voluntary control. The meter captures images directly from TV via a CCD camera and digital sampling hardware. Early visual processes are emulated in software as a bank of spatial and temporal filters and higher level processes by a 3-layer neural network. Preliminary trials of the meter verify that it can produce quantitative CCIR gradings that match those made by an `expert' human assessor and it does so better than other electronic systems that do not incorporate the model of early human vision.
The two major international test methods for evaluation of the image quality of video display terminals are the ISO 9241-3 international standard and the MPR test. In this paper we make an attempt to compare the visual relevance of these two test methods.
We studied the spatio-temporal shape of `receptive fields' of simple cells in the monkey visual cortex. Receptive fields are maps of the regions in space and time that affect a cell's electrical responses. Fields with no change in shape over time responded to all directions of motion; fields with changing shape over time responded to only some directions of motion. A Gaussian Derivative (GD) model fit these fields well, in a transformed variable space that aligned the centers and principal axes of the field and model in space-time. The model accounts for fields that vary in orientation, location, spatial scale, motion properties, and number of lobes. The model requires only ten parameters (the minimum possible) to describe fields in two dimensions of space and one of time. A difference-of-offset-Gaussians (DOOG) provides a plausible physiological means to form GD model fields. Because of its simplicity, the GD model improves the efficiency of machine vision systems for analyzing motion. An implementation produced robust local estimates of the direction and speed of moving objects in real scenes.
The objectives of the research were: (1) to investigate the dynamics of neuron responses and orientation selectivity in the primary visual cortex; (2) to find a possible source of bifurcation of visual information into `what' and `where' processing pathways; (3) to apply the obtained results for visual image processing. To achieve the objectives, a model of the iso-orientation domain (orientation column) of the visual cortex has been developed. The model is based on neurophysiological data and on the idea that orientation selectivity results from a spatial anisotropy of reciprocal lateral inhibition in the domain. Temporal dynamics of neural responses to oriented stimuli were studied with the model. It was shown that the later phase of neuron response had a much sharper orientation tuning than the initial one. The results of modeling were confirmed by neurophysiological experiments on the visual cortex. The findings allow us to suggest that the initial phase of neural response encodes the location of the visual stimulus, whereas the later phase encodes its orientation. Temporal dividing of information about object features and their locations at the neuronal level of the primary visual cortex may be considered to be a source for bifurcation of the visual processing into `what' and `where' pathways and may be used for parallel-sequential attentional image processing. The model of neural network system for image processing based on the iso-orientation domain model and the above idea is proposed. An example of test image processing is presented.
We have investigated whether reading is adversely affected by the flicker of VDTs. We use 60 Hz flicker for our low frequency because it is the standard used in most computer monitors, and 500 Hz for our upper boundary because it provides nearly constant presentation. Sixty hertz flicker results in 16.67 ms of dead time after every screen write. Therefore, there is no usable information at the fixation point for an average of 8.33 ms following a saccadic eye movement. We hypothesized that the visual system might have to wait for text to become available after each saccade, slowing reading speed. The 500 Hz condition allows 2 ms of dead time with an average wait of 1 ms.
It is well known that increasing the temporal sampling rate can improve the perceived quality of displayed images. The present research measured the ability of the human visual system to discriminate among different temporal sampling rates. The results of the research are intended to support the development of improved image coding and display systems. The ideal sampling rate for a moving stimulus was determined in Experiment 1. This rate is defined as the rate beyond which further increases in sampling rate have no added beneficial effect. The results showed that the ideal sampling rate depended on stimulus velocity; ideal sampling rates for velocities of 8 deg/s and 16 deg/s were about 100 Hz and 170 Hz, respectively.
We have conducted a systematic study of the effectiveness of stereo imaging in aiding the detection of abnormalities in soft tissue geometry and density. Experiments have been designed whereby subjects are shown computer generated stereo mammograms and are asked to determine if there is an abnormal object (brighter or regularly arranged) in the mammogram. The experimental data have been analyzed using ROC approaches to determine which types of soft tissue abnormalities may be easier to detect using stereo viewing. Preliminary results suggest that stereo imaging may be more effective than current single view or biplane mammography for detecting abnormal densities and arrangements in soft tissue elements.
The problem of image compression is to achieve a low bit rate in the digital representation of an input image or video signal with minimum perceived loss of picture quality. Since the ultimate criterion of quality is that judged or measured by the human receiver, it is important that the compression (or coding) algorithm minimizes perceptually meaningful measures of signal distortion, rather than more traditional and tractable criteria such as the mean squared difference between the waveform at the input and output of the coding system. This paper develops the notion of perceptual coding based on the concept of distortion-masking by the signal being compressed, and describes how the field has progressed as a result of advances in classical coding theory, modelling of human vision, and digital signal processing. We propose that fundamental limits in the science can be expressed by the semi-quantitative concepts of perceptual entropy and the perceptual distortion-rate function, and we examine current compression technology with respect to that framework. We conclude with a summary of future challenges and research directions.
Two experiments for evaluating psychophysical distortion metrics for JPEG-encoded images are described. The first is a threshold experiment, in which subjects determined the bit rate or level of distortion at which distortion was just noticeable. The second is a suprathreshold experiment in which subjects ranked image blocks according to perceived distortion. The results of these experiments were used to determine the predictive value of a number of computed image distortion metrics. It was found that mean-square-error is not a good predictor of distortion thresholds or suprathreshold perceived distortion. Some simple point- wise measures were in good agreement with psychophysical data; other more computationally intensive metrics involving spatial properties of the human visual system gave mixed results. It was determined that mean intensity, which is not accounted for in the JPEG algorithm, plays a significant role in perceived distortion.
A detection model is developed to predict visibility thresholds for discrete cosine transform coefficient quantization error, based on the luminance and chrominance of the error. The model is an extension of a previously proposed luminance-based model, and is based on new experimental data. In addition to the luminance-only predictions of the previous model, the new model predicts the detectability of quantization error in color space directions in which chrominance error plays a major role. This more complete model allows DCT coefficient quantization matrices to be designed for display conditions other than those of the experimental measurements: other display luminances, other veiling luminances, other spatial frequencies (different pixel sizes, viewing distances, and aspect ratios), and other color directions.
Several image compression standards (JPEG, MPEG, H.261) are based on the Discrete Cosine Transform (DCT). These standards do not specify the actual DCT quantization matrix. Ahumada & Peterson and Peterson, Ahumada & Watson provide mathematical formulae to compute a perceptually lossless quantization matrix. Here I show how to compute a matrix that is optimized for a particular image. The method treats each DCT coefficient as an approximation to the local response of a visual `channel.' For a given quantization matrix, the DCT quantization errors are adjusted by contrast sensitivity, light adaptation, and contrast masking, and are pooled non-linearly over the blocks of the image. This yields an 8 X 8 `perceptual error matrix.' A second non-linear pooling over the perceptual error matrix yields total perceptual error. With this model we may estimate the quantization matrix for a particular image that yields minimum bit rate for a given total perceptual error, or minimum perceptual error for a given bit rate. Custom matrices for a number of images show clear improvement over image-independent matrices. Custom matrices are compatible with the JPEG standard, which requires transmission of the quantization matrix.
Selecting an appropriate quantization table for Joint Photographic Exploitation Group (JPEG) data compression of a class of images can be an arduous task. We have designed a graphical user interface to study the effects of quantization on compression ratio and the resulting image quality. The tool calculates several measures of the difference between the original and lossy compressed image. Some of these measures are entropy, mean square error, and normalized mean square error. These measures aid the user in selecting the optimal quantization values with respect to image fidelity and compression ratio for a particular class of images.
The discrete cosine transform (DCT) can be used to transform two images into a space where it is easy to obtain an estimate of their perceptual distance. We used this method to find the closest fit of the ASCII symbols (which includes the English alphabet, numbers, punctuation, and common symbols) to rectangular segments of a gray-scale image. Each segment was converted into a DCT coefficient matrix which was compared to the coefficient matrix of each ASCII symbol. The image segment was replaced with the symbol that had the least weighted Euclidean distance. Thus, a page of text was generated that resembled the original image. The text image format has the advantage that it can be displayed on a non-graphic terminal or printer. It can also be sent via electronic mail without requiring further processing by the receiver. The processing scheme can also be used to preview stored images when transmission bandwidth is limited or a graphic output device is unavailable.
The human visual system's response to luminance discontinuities on continuous image gradients is measured. It is argued that since quantization artifacts often appear on smooth gradients, the versatility of sinewave gratings in controlling gradient properties can be used to quantify worst-case situations. The threshold spatial dimensions of discrete steps, such as would appear in a staircase approximation to the grating, are determined under a variety of experimental conditions. The smallest threshold step size defines a critical sample size, which is used to calculate limits for artifact-free reconstruction of sinusoidal luminance gratings. Rather than employ ad hoc collections of test images, the methods and results described suggest a more general approach to achieving visually optimal allocation of imaging resources.
Reading rate legibility tests were conducted to provide data relating human performance to changes in the complex spatial frequencies of displayed alphanumerics resulting from changes in display focus or modulation transfer function. Human performance is thus quantized both with measured display parameters and the corresponding behavior of spatial frequency distributions of the displayed characters. A Sony Trinitron CRT monitor was used which exhibits aliasing in the same spatial frequency domain used to represent the alphanumerics. For alphanumerics subtending 0.3 degrees, the reading rate data exhibited asymptotic behavior at about 190 characters per minute thus indicating a critical bandwidth for legibility of about 2 cycles per character. Also, characteristic spatial frequency distributions were found to occur at this critical bandwidth. Despite the aliasing due to the Trinitron's aperture grille, the results were similar to those found with a monochrome CRT. These observations have profound effects on specifying resolution requirements for displaying text on CRT displays.
Many times in electronic imaging systems it is necessary to reduce the precision of the digital data for the display, storage, manipulation, or transformation of an image. For example, it may be necessary to reduce 12 bits/channel RGB data to 8 bits/channel due to storage requirements. To accomplish this reduction in the number of digital levels, a series of input levels must be grouped together for each output level. Since this process involves the quantization of previously quantized data, it is sometimes referred to as secondary quantization. The secondary quantization process necessarily results in artifacts, such as contouring in the image, where many colors have been mapped to a single color. Conventional methods, such as linear or power-law resampling, are suboptimal and do not consider intercolor effects. This paper describes a method for determining the quantization functions that will minimize the observable image artifacts generated by the secondary quantization process. The basic approach involves the use of nonlinear optimization techniques to minimize a cost function that provides a measure of the visible color error. Examples are presented that compare the optimized quantization process to conventional techniques.
In this paper, we propose a new technique for halftoning color images. Our technique parallels recent work in model-based halftoning for both monochrome and color images; we incorporate a human visual model that accounts for the difference in the responses of the human viewer to luminance and chrominance information. Thus, the RGB color space must be transformed to a luminance/chrominance based color space. The color transformation we use is a linearization of the uniform color space L*a* b* which also decouples changes between the luminance and chrominance components. After deriving a tractable expression for total- squared perceived error, we then apply the method of Iterated Conditional Modes (ICM) to iteratively toggle halftone values and exploit several degrees of freedom in reducing the perceived error as predicted by the model.
There has been a tremendous amount of research in the area of image halftoning. Where the goal has been to find the most visually accurate representation given a limited palette of gray- levels (often just two, black and white). This paper focuses on the inverse problem, that of finding efficient techniques for reconstructing high-quality continuous-tone images from their halftoned versions. The proposed algorithms are based on a maximum a posteriori (MAP) estimation criteria using a Markov random field model for the prior image distribution. Image estimates obtained with the proposed model accurately reconstruct both the smooth regions of the image and the discontinuities along the edges. Algorithms are developed and example gray-level reconstructions are presented generated from both dithered and error diffused halftone originals.
Many digital display systems economize by rendering color images with the use of a limited palette. Palettized images differ from continuous-tone images in two important ways: they are less continuous due to their use of lookup table indices instead of physical intensity values, and pixel values may be dithered for better color rendition. These image characteristics reduce the spatial continuity of the image, leading to high bit rates and low image quality when compressing these images using a conventional lossy coder. We present an algorithm that uses a debinarization technique to approximate the original continuous-tone image, before palettization. The color components of the reconstructed image are then compressed using standard lossy compression techniques. The decoded images must be color quantized to obtain a palettized image. We compare our results with a second algorithm that applies a combination of lossy and lossless compression directly to the color quantized image in order to avoid color quantization after decoding.
Screening is a nonlinear operation where the forward mapping is a deterministic process but the mathematical properties of the inverse solution can only be estimated. Principle motivations in the printing industry behind `inverse' halftoning or descreening are the storage costs associated with halftone film and because digital image manipulations are not possible with images in a binary format. This includes size change and rotation, two important processes for the printing industry. Clearly, it would be advantageous to be able to recapture the original gray scale from the halftone film, store it in a less expensive and easy to duplicate digital format, and perform image processing operations on the data. Initially, we introduce a metric to compare the coarseness of the screen to the image bandwidth and demonstrate how to use this metric as a predictor of the ability to descreen the screened image. Transform domain representations of the screened image are discussed as well as a sampling theory similarity in the screening process. Descreening is achieved through linear filtering and adaptations of two iterative techniques. This paper concludes that under the right conditions it is possible to recover a visually close approximation of the original image from the screened image and that the iterative techniques are robust and provide objective and subjective performance improvements compared with linear filtering.
We consider the problem of reconstructing a continuous-tone (contone) image from its halftoned version, where the halftoning process is done by error diffusion. We present an iterative nonlinear decoding algorithm for halftone-to-contone conversion, and show simulation results that compare the performance of the algorithm to that of conventional linear lowpass filtering. We find that the new technique results in subjectively superior reconstruction. As there is a natural relationship between error diffusion and (Sigma) (Delta) modulation, our reconstruction algorithm can also be applied to the decoding problem for (Sigma) (Delta) modulators.
Error diffusion has proven to be a very powerful method of producing binary images that are visually similar to the original grayscale images. It has become so popular that many attempts have been made to improve it. In this paper, two modifications to error diffusion, the serpentine raster and threshold modulation, are analyzed from a theoretical viewpoint. The two analyses reveal the origins of the image quality improvements of the modifications and quantify their benefits.
Halftoning to two or more levels by means of ordered dither has always been attractive because of its speed and simplicity. However, the so-called recursive tessellation arrays in wide use suffer from strong periodic structure that imparts an unnatural appearance to resulting images. A new method for generating homogeneous ordered dither arrays is presented. A dither array is built by looking for voids and clusters in the intermediate patterns and relaxing them to optimize isotropy. While the method can be used for strikingly high quality artifact-free dithering with relatively small arrays, it is quite general; with different initial conditions the familiar recursive tessellation arrays can be built. This paper presents the algorithm for generating such arrays. Example images are compared with other ordered dither and error diffusion-based techniques.
This paper presents a method of dithering which attempts to exploit the inevitable textures generated by all dithering schemes. We concentrate on rendering continuous tone (monochrome or color) images on a CRT display with a small number (on the order of 16 - 256) of distinct colors. Monochrome (especially bi-level) dithering techniques are well studied. We have previously demonstrated that texture introduced by the dithering process can significantly affect the appearance of the image. We then developed a scheme by which the user had some control over these texture effects and could then choose (locally) different ordered-dither matrices based on measured properties of the original image (e.g., the local gradient). In this paper, we exploit texture as an alternate channel of information. The key idea is to choose the previously described matrices based on properties not in the original image.
We present a new approach for estimating printer model parameters that can be applied to a wide variety of laser printers. Recently developed `model-based' digital halftoning techniques depend on accurate printer models to produce high quality images using standard laser printers (typically 300 dpi). Since printer characteristics vary considerably, e.g. write-black vs. write- white laser printers, the model parameters must be adapted to each individual printer. Previous approaches for estimating the printer model parameters are based on a physical understanding of the printing mechanism. One such approach uses the `circular dot-overlap model,' which assumes that the laser printer produces circularly shaped dots of ink. The `circular dot-overlap model' is an accurate model for many printers but cannot describe the behavior of all printers. The new approach is based on measurements of the gray level produced by various test patterns, and makes very few assumptions about the laser printer. We use a reflectance densitometer to measure the average brightness of the test patterns, and then solve a constrained optimization problem to obtain the printer model parameters. To demonstrate the effectiveness of the approach, the model parameters of two laser printers with very different characteristics were estimated. The printed models were then used with both the modified error diffusion and the least-squares model-based approach to produce printed images with the correct gray-scale rendition. We also derived an iterative version of the modified error diffusion algorithm that improves its performance.
In multilevel halftoning, the appearance of intermediate shades of gray is created by the spatial modulation of more than two tones, i.e., black, white, and one or more gray tones. Periodic multilevel halftoning can be implemented similarly to bitonal halftoning by using N-1 identically sized threshold matrices for N available output levels. The amount of modulation in the output image is dependent on both the number of output levels and the spatial arrangement of threshold values. A method is presented for assessing the modulation resulting from a periodic multilevel halftone algorithm. The method is based on the constraint that the digital output of the halftone process be mean-preserving with respect to the input. This constraint is applied to the tone transfer functions of each pixel in the halftone cell, producing the result that the sum of the derivatives of all the unquantized tone transfer functions must equal the number of pixels in the halftone cell for all input values. This rule leads to a simple graphical technique for evaluating the modulation in a halftone algorithm as well as suggesting an infinite number of ways to vary the modulation. The application of this method to traditional as well as alternate halftone architectures is discussed.
The perceived quality of a printed image depends on the halftone algorithm and the printing process. This paper proposes a new method of analyzing halftone image quality in the frequency domain based on a human vision model. First, the Fourier transform characteristics of a dithered image are reviewed. Several commonly used dither algorithms, including clustered-dot dither and dispersed-dot dither, are evaluated based on their Fourier transform characteristics. Next, images halftoned with the dither algorithms and the Floyd-Steinberg error diffusion algorithm are compared in the frequency domain. Factors affecting printed image quality in a printing process are also discussed. Finally, a perception-based halftone image distortion measure is proposed. This measure reflects the quality of a halftone image printed on an ideal bi-level device and viewed at a particular distance. The halftone algorithms are ranked according to the proposed distortion measure. The effects of using human visual models with different peak sensitivity frequencies are examined.
The recent proliferation of digital binary output devices, such as laser printers and facsimile machines, has brought increased attention to high quality halftone reproduction. A question that often arises in halftone research is how to evaluate the quality of halftone images using a quantitative quality metric. Such metrics would allow objective evaluation, could be incorporated in halftoning algorithms, in addition to being independently reproducible. Given that halftoning introduces unique types of distortions and the display medium (a hardcopy device) is different than that of conventional image processing applications (a CRT display), it becomes necessary to develop visual models specifically for halftoning applications. In this paper, frequency-domain visual models are investigated as to their suitability for the formulation of quantitative quality metrics specifically for halftoning applications. Two types of frequency-domain models are investigated: models that utilize a simple contrast-sensitivity function and models that utilize multiple independent narrowband channels. The quantitative quality metric for both types of models is formulated as a weighted frequency domain error. Since the ultimate judges of image quality are human viewers, the success of the quantitative measures is assessed by comparing their results with the results of a psychovisual test performed on halftone images.
The quality of color correction is dependent upon the filters used to record the image. The problem of estimating, CIE tristimulus vectors of the original image under several illuminants, from data obtained with a single imaging illuminant is discussed. Optimal filters are derived which minimize the total color correction mean square error for several viewing illuminants. The sensitivity of the filters to perturbations is investigated. Simulations are performed using real reflectance spectra.
Electronic graphic arts scanners are analogous to the photographic separation methods they replace in that they measure the density of colorants in the red, green, and blue parts of the visible spectrum. Color correction setups to specify conversion to print data are traditionally created by skilled operators using trial and error. With increasing demand for device independent color processing, conversion of scanner densities to colorimetric quantities is needed. We describe a method for scanned transparencies that uses only the spectral characterization of the scanner channels and of the colorants being scanned; the scanner itself need not be modified. The basic idea is simple. First, convert scanner densities to colorant amounts. Scaling the characteristic spectral density curves by these amounts and summing gives reconstruction of the full color spectrum of the pixel. Any colorimetric quantity can then be calculated. The key aspect of the method, calculating colorant amounts, is accomplished with an iterative loop where estimated amounts are processed with the colorant and scanner characterizations to stimulate scanner densities. The errors between these and the actual scanner densities provide corrections to improve the estimates. The iteration converges quickly to the true colorant amounts. This technique is accurate and works well with lookup table methods with negligible loss of accuracy.
The PostScript language has long been known for very precise, device independent page descriptions of potentially very complex pages consisting of text, graphics, and images. However, in the past the description of color in PostScript has not done justice to the term device independence since the exact meaning of the colorants in a color space was not explicitly specified, but instead was left to the interpretation of each device receiving the given page description. Device independent color is the ability to describe color in a manner independent of the color forming properties of any particular device. Instead color is described based on the CIE system of color specification. PostScript has been extended to afford his manner of color specification.
In 1989, IBM began a project devoted to designing and building a small computer system to serve the needs of the staff of the painter Andrew Wyeth. Over the three-year duration of this project, many of the system's requirements were refined. However, a fundamental requirement from the onset was that the system be able to capture and produce digital images suitable for publication. In 1992, an experiment was performed by Wyeth's staff, IBM, and R. R. Donnelley and Sons Company. The objective of this experiment was to demonstrate that one could indeed capture images with the IBM system installed at the Wyeth's offices and transfer them as digital images to a printer (R. R. Donnelley) where they could be successfully printed. This paper describes that experiment. It describes the methodology IBM used for capturing, preserving, and communicating the color content and the methodology Donnelley used for interpreting the color content and proofing the images. Finally, it discusses the practical problems encountered in communicating the images' color content -- the things that worked well and the problems encountered.
An improved version of an earlier unified model for human color perception and visual adaptation is described. It allows superior predictions of color discriminations and color appearances under varying adaptation conditions. Accordingly, the model provides not only color equations for small-step and large-step color differences, but also equations for predicting many other visual responses including actual perceptions of hue, saturation, and brightness.
A vision model is presented that combines a model for human color perception and visual adaptation with a model for the achromatic and chromaticity modulation transfer functions of the visual system. The combined models produce a uniform color space that can be described as a function of spatial frequency. This model may be useful in developing and optimizing image compression algorithms to reduce bit rate and to increase the quality of color images.
Chromatic-achromatic demultiplexing is the only model that merges three neurophysiological characteristics found only after precortical levels of vision processing in primates: (1) orientation selectivity, (2) interaction of on and off cells, and (3) color decoding. For example, a demultiplexing cortical unit is selective to purely achromatic changes only when they take place at its preferred orientation, but its signal is chromatic-achromatic ambiguous for any other angle; this hypothetical unit is fed with outputs from alternate rows of on and off color- opponent neurons of the lateral geniculate nucleus (LGN). It has a spatial sensitivity profile well described by either difference-of-Gaussian models, Gabor-like models or n-derivative-of- Gaussian models that include orientation tuning. In consequence, current models of spatial filtering and orientation tuning of cortical neurons can be consistently connected with the chromatic-achromatic dimensions through the multiplexing model.
Von Kries adaptation has long been considered a reasonable vehicle for color constancy. Since the color constancy performance attainable via the von Kries rule strongly depends on the spectral response characteristics of the human cones, we consider the possibility of enhancing von Kries performance by constructing new `sensors' as linear combinations of the fixed cone sensitivity functions. We show that if surface reflectances are well-modeled by 3 basis functions and illuminants by 2 basis functions then there exists a set of new sensors for which von Kries adaptation can yield perfect color constancy. These new sensors can (like the cones) be described as long-, medium-, and short-wave sensitive; however, both the new long- and medium-wave sensors have sharpened sensitivities -- their support is more concentrated. The new short-wave sensor remains relatively unchanged. A similar sharpening of cone sensitivities has previously been observed in test and field spectral sensitivities measured for the human eye. We present simulation results demonstrating improved von Kries performance using the new sensors even when the restrictions on the illumination and reflectance are relaxed.
Three-dimensional objects in an image, which appear with shading and cast shadows, can be difficult to recognize as single entities, and there can also be problems recognizing the colors of the objects independent of the spectrum of illumination. The removal of shading and cast shadows has often been done in remote sensing by the band-ratio algorithm. A ratio of red to green bands cancels variations of incident light intensity between different points on the same matte object. Finlayson et al. showed that, for physically reasonable illuminant and reflectance spectra, von Kries adaptation gives exact color constancy if a particular linear transformation on the color-matching functions is performed prior to adaptation. The present paper extends this approach to band ratios, and also to the related color-constancy model of Judd (which subtracts the white-reflectance chromaticity instead of dividing by the white-reflectance tristimulus values as von Kries adaptation does). In both cases, invariance requires the illuminant basis functions to be metameric (up to a scale factor) -- with respect to the reference white in the case of Judd adaptation, and with respect to all reflectances in the case of band ratios. The von Kries theory thus seems unique among the simple processing methods.
We develop a physical model that characterizes the appearance of textured color surfaces in three dimensions. The model is derived from properties of surfaces and the physics of image formation. The color texture model describes the dependence of the spatial correlations within and between bands of a color image on surface reflectance, illumination, and the scene geometry. We show that there are important advantages in using color information for texture analysis. From our model, we derive an algorithm for recognizing instances of color textures independent of scene geometry. This algorithm is useful for the recognition of three dimensional objects and the segmentation of color images of three dimensional scenes. Experimental results are provided to confirm the model and to illustrate the performance of the algorithm.
A visual feature such as an edge may be signaled by one of any number of visual cues such as modulation of brightness, color, stereo disparity, motion, texture and so on. Here, a theory is reviewed which suggests that when more than one cue is available, separate location estimates are made using each cue, and the resulting estimates are averaged. Cues which are more reliable are given a correspondingly larger weight in the average. A psychophysical paradigm (perturbation analysis) is described for testing whether the theory applies and estimating its parameters. Preliminary results are described for localization of texture edges signaled by changes in contrast, local texture orientation and local texture scale. The data are consistent with the theory: In the data, individual cues appear to be averaged and more reliable cues are given more weight.
Many of the natural-world depth specifiers are either not available in flat screen electronic images, or, as with `brightness,' conflict with other information in such images that does specify depth. Depicted distance changes within the electronic scene do not affect the illuminance of the retinal image as would a change of distance in the natural world. Studies to be reported suggest that luminance differences between objects can significantly modulate the information from linear perspective, and hence the electronic image itself may contain the potential for conflict between `depth cues.' Subjects were required to judge relative distances between pairs or triplets of objects with either consistent or inconsistent luminance gradients, in scenes containing varying amounts of depth information such as perspective, presence or absence of a horizon, etc. The direction of the gradients significantly affected judgments in all cases. Further studies required subjects to move an object in the simulator display and align it with static objects in the display. The results were consistent with the static scene data. Explanations considered are: the Irradiation Phenomenon, as applied to both humans and electronic displays, and luminance gradient or contrast gradient effects. As well as the luminance effects, there were also individual differences in accuracy of judgments as related to a measure of spatial ability, (the Shapes Analysis Test). The practical implications are discussed.
A comparison and analysis of perceived and objective color appearance of distant objects in the Grand Canyon National Park reveals that there exist significant differences between them due to the effects of the perceptual transparency that has received relatively little attention in the past.
An investigation to quantify experience of color was performed. It was demonstrated that experience of color can be described objectively, so that predictable visual sensations can be elicited by adjusting the relationships between colors. Central to this investigation is the formulation of a model of color experience that describes color relationships based on the types of interactions between colors. The model adjusts formal compositional attributes such as hue, value, chroma, and their contrasts, as well as size, and proportion. The investigation is developed within the context of the computer screen, where graphical elements, such as random matrices and text, serve as the color stimuli. Relative scaling experiments resulted in guidelines for adjusting the experience of color. The findings of this investigation have application in many areas, including color selection and color reproduction.
In the processes of visual perception and recognition human eyes actively select essential information by way of successive fixations at the most informative points of the image. A behavioral program defining a scanpath of the image is formed at the stage of learning (object memorizing) and consists of sequential motor actions, which are shifts of attention from one to another point of fixation, and sensory signals expected to arrive in response to each shift of attention. In the modern view of the problem, invariant object recognition is provided by the following: (1) separated processing of `what' (object features) and `where' (spatial features) information at high levels of the visual system; (2) mechanisms of visual attention using `where' information; (3) representation of `what' information in an object-based frame of reference (OFR). However, most recent models of vision based on OFR have demonstrated the ability of invariant recognition of only simple objects like letters or binary objects without background, i.e. objects to which a frame of reference is easily attached. In contrast, we use not OFR, but a feature-based frame of reference (FFR), connected with the basic feature (edge) at the fixation point. This has provided for our model, the ability for invariant representation of complex objects in gray-level images, but demands realization of behavioral aspects of vision described above. The developed model contains a neural network subsystem of low-level vision which extracts a set of primary features (edges) in each fixation, and high- level subsystem consisting of `what' (Sensory Memory) and `where' (Motor Memory) modules. The resolution of primary features extraction decreases with distances from the point of fixation. FFR provides both the invariant representation of object features in Sensor Memory and shifts of attention in Motor Memory. Object recognition consists in successive recall (from Motor Memory) and execution of shifts of attention and successive verification of the expected sets of features (stored in Sensory Memory). The model shows the ability of recognition of complex objects (such as faces) in gray-level images invariant with respect to shift, rotation, and scale.
The great challenge in analyzing multiparameter images is to detect and analyze the patterns formed by the parameters in combination, and this requires that the images be in close registration. If they are misaligned non-linearly, as is often the case, particularly in medical diagnostic imaging, registration is a very difficult problem. In this paper we discuss various approaches to the assessment and correction of misregistrations among multiparameter images. We propose a technique that assess misregistrations using correlation computation and visualizes them using color and arrow displays. We demonstrate the technique with a medical case of CT and MR images; our technique shows misregistrations between images that were previously assumed to be in registration. The proposed technique provides a great potential for driving improved registration procedures, particularly in finding and correcting nonlinear effects.
This paper addresses the effective display of noisy vector-valued image data. A new game- theoretic model of a human observer in a visualization task is developed, and used to derive optimal algorithms for the display of a maximally informative fused monochrome image.
A brief description of the parallel coordinate system is provided, followed by an application to exploratory data analysis of certain multivariate data. The calculation of minimum distance (L2) between trajectories (timed paths) is considered next and close bounds via L1 distance are obtained, a simplification of importance in Air Traffic Control. Line neighborhoods are defined as the totality of lines satisfying certain parameters. The point region defined by the set of points belonging to these sets of lines in general leads to ambiguities as to lines actually belonging to the line neighborhood. This is of importance in applications involving line detection. Eickemeyer's representation of p-flats in N-space is applied to the representation of polytopes. It is shown that this representation permits analysis and display of convexity or non-convexity of the polytope being represented. Finally the representation of surfaces in three dimensions is exhibited. In general such surfaces are represented by two regions, together with a point-by-point association of points in the two regions. For developable surfaces the regions are replaced by curves, the point-by-point association being of a simple nature.
In this paper, we propose a visualization architecture which explicitly incorporates guidance based on principles of human perception, cognition, and color theory. These principles are incorporated in rules which the user can select during the visualization process. Depending on the higher-level characteristics of the data, the rule constrains the way in which the data are mapped onto visual dimensions. The purpose of these rules is to help insure that structures in the data are faithfully represented in the image, and that perceptual artifacts are not erroneously interpreted as data features.
The scanpath theory suggests that a top-down internal cognitive model of what we `see' controls not only our vision, but also drives the sequences of rapid eye movements and fixations, or glances, that so efficiently travel over a scene or picture of interest.
The NASA Ames Virtual Planetary Exploration (VPE) Testbed is developing methods for visualizing large planetary terrains in an interactive, immersive virtual environment system using a head-mounted display. Our data is the surface of Mars, modeled with a polygon mesh that typically contains 105 or more polygons. The goal of our work is to present terrain views with both high detail and frame update rates of 10 Hz or greater. We do this with extended level of detail (LOD) management. In VPE we include three LOD criteria: (1) distance from the viewpoint, (2) distance from the center of field of view, and (3) a metric based upon user-defined regions of interest. Motivations for these are: (1) all objects, independent of position, only need be displayed at a minimum visually perceptible resolution, (2) interest is focussed on the center of the field in a head-directed display, and (3) a feature's level of detail should relate to its importance to the application task. Our method uses analysis functions for each criterion that compute normalized scale factors. Factors are combined with user specified weights. At every frame update each region of the scene is analyzed, and its resulting scale factor determines which model to render. Parameters for each criterion may be interactively set by the user or automatically set by system to meet performance criteria (e.g., frame update rate).