Electro-optical displays challenge color appearance systems based on the study of surface colors because these displays provide complex arrays of additive color. CRTs already enable us to present some colors beyond the gamut of surface colors at both high and low lightnesses. Laser-based devices will carry this potential much farther, resulting in dark colors as well as light ones with a colorfulness that in some cases lie even beyond the maximum theoretically achievable with illuminated objects. The work of Evans, who regarded surface colors as a special case of aperture colors, deserves renewed attention for its applicability to additive color displays. Users of self-luminous displays need to be aware that brightness is not adequately measured by photopic light meters and that lightness and chroma of display elements will be affected by their context, including not only the near background but also the far surround.
A computer model of human spatiochromatic vision, based on the scheme proposed by De Valois and De Valois has been developed. The implementation of the model enables true color 2-D images to be processed. The input consists of cone signals at each pixel. Subsequent levels of the model are represented by arrays of activity corresponding to the equivalent neural activity. The implementation allows the behavior of different stages of the model -- retinal and cortical -- to be studied with different varieties of spatial and chromatic stimuli of any complexity. In addition the model is extensible to allow different types of neural mechanisms and cortical demultiplexing processes to be incorporated. As well as providing qualitative insight into the operation of the different stages of the model the implementation also permits quantitative predictions to be made. Both increment threshold and hue naming results are predicted by the model, but the accuracy of these predictions is contingent upon an appropriate choice of adaptation state at the retinal cone and ganglion cell level.
We investigated sustained hue and brightness perception in a uniformly-illuminated sphere (Ganzfeld) covering the entire visual field. Under these conditions the perceptions of both brightness and hue do not remain constant, but fade with time due to local adaptation (Troxler effect). Our aim was to quantify the magnitude and time course of hue and brightness fading in physical units. In a first experiment, subjects used magnitude estimation to rate perceived hue and brightness of the sphere when illuminated by constant amounts of red (674 nm), yellow- green (547 nm), or blue (431 nm) light. Within 2 - 7 minutes the perception of hue was found to become desaturated and replaced by a sensation of gray before brightness fading leveled off as well. This final plateau was reached after 5.5 - 7.5 min, depending on wavelength. It was higher for short-wavelength light and lower for long-wavelength light. However, in each case, it was above the intrinsic dark light or Eigengrau. In a second experiment, Ganzfeld luminance was logarithmically reduced with time to obtain correlated brightness estimates in the absence of fading. When the results from experiments 1 and 2 were combined in a Crawford-like transformation, the total perceived brightness loss was found to be equivalent to a luminance decrease of about 1.5 log units and 1.3 log units for Ganzfeld luminances of 0.1 cd/m2 and 1.0 cd/m2, respectively, and for all wavelengths tested. In comparison, the time at which the hue disappeared corresponded to a decrease of luminance ranging from 0.4 to 1.2 log units depending on wavelength.
A tenet of a new class of color image enhancement algorithms involves the observation that the saturation component of color images often contains what appears to be valid image structure depicting the underlying scene. In this work we present the findings of a study of the structural correspondence between the saturation and luminance components of a large database of color images. Various statistical relationships are identified. The correspondence of edges at different scales in the sense of Marr's theory of vision is also observed. Several new color image enhancement algorithms which exploit these unique characteristics are described.
Humans visual comfort to a colored image of a natural scene presented on a CRT display was investigated by two psychophysical methods. One was to measure a number of colors contained in an image by using the categorical color naming technique and to correlate it to the subjective estimation of visual comfort. The other one was to find an optimum percent chroma for a whole image by continuously adjusting the chroma value of all the pixels in the image relatively from zero (achromatic image) to 100% (original chroma) and to correlate it also to the comfort estimation value. Experimental results show that both variables strongly correlated to the subjective estimation of comfort; that is, a negative correlation between the comfort estimation and the number of categorical colors, which means the larger the number of colors contained in an image, the less comfort the image is felt to have, and a positive correlation between the comfort estimation and the optimum percent chroma means that the less comfort pattern is apt to be seen in a reduced chroma while the comfort pattern remains in original chroma. These findings suggest that the visual comfort could be evaluated by the number of categorical colors in an image and the relative amount of chroma of a whole image.
The systematic color vision model is a comprehensive model based on both the zone theory and the retinex theory. This model merges these two theories by a hypothesis of systematic negative feedback control (SNFC) for the human visual system. The SNFC includes two loops: an absolute negative feedback loop which controls the absolute light sensitivities of the cone photoreceptors for achromatic adaptation, and a relative negative feedback loop which controls the relative light sensitivities for chromatic adaptation. Under SNFC the three types of cone photoreceptors have independent light sensitivities. They function as the retinexes in the retinex theory. The color vision signals are processed zone by zone as assumed by the zone theory. This model also provides a color calculation algorithm and a visual processing framework for the first two zones. This algorithm is based on the von Kries coefficient law and the spectral sensitivities of the three types of cone photoreceptors. Since this algorithm omits the transformation from RGB to XYZ, color modeling for an electronic imaging system becomes easy and accurate. Potential errors of color calculation caused by using the CIE color-matching functions, such as abnormal hue-angle change, can also be avoided.
An exploration of emotion in color communication is presented in this paper. It begins with an outline of a proposed theory of emotion and a hypothesis of how color may induce emotion. A discussion follows that details what is essential in a color message to predict emotional responses. Experiments are described that might assist in validating the theory put forth in this paper.
Receptive field profiles of simple cells in the visual cortex have been shown to resemble even- symmetric or odd-symmetric Gabor filters. Computational models employed in the analysis of textures have been motivated by two-dimensional Gabor functions arranged in a multi-channel architecture. More recently wavelets have emerged as a powerful tool for non-stationary signal analysis capable of encoding scale-space information efficiently. A multi-resolution implementation in the form of a dyadic decomposition of the signal of interest has been popularized by many researchers. In this paper, Gabor wavelet configured in a 'rosette' fashion is used as a multi-channel filter-bank feature extractor for texture classification. The 'rosette' spans 360 degrees of orientation and covers frequencies from dc. In the proposed algorithm, the texture images are decomposed by the Gabor wavelet configuration and the feature vectors corresponding to the mean of the outputs of the multi-channel filters extracted. A minimum distance classifier is used in the classification procedure. As a comparison the Gabor filter has been used to classify the same texture images from the Brodatz album and the results indicate the superior discriminatory characteristics of the Gabor wavelet. With the test images used it can be concluded that the Gabor wavelet model is a better approximation of the cortical cell receptive field profiles.
We propose a new texture synthesis-by-analysis method inspired by current models of biological early vision and based on a multiscale Gabor scheme. The analysis stage starts with a log-polar sampling of the estimated power spectral density of the texture by a set of 4 by 4 Gabor filters, plus a low-pass residual (LPR). Then, for each channel, we compute its energy and its two (X,Y) bandwidths. The LPR is coded by five parameters. In addition, the density function of the original texture is also estimated and compressed to sixteen values. Therefore, texture is coded by only 69 parameters. The synthesis method consists of generating a set of 4 by 4 synthetic channels (Gabor filtered noise signals). Their energies and bandwidths are corrected to match the original features. These bandpass filtered noise signals are mixed into a single image. Finally, the histogram and LPR frequencies of the resulting texture are modified to fit the original values. We have obtained very satisfactory results both with highly random textures and with some quasi-periodic textures. Compared to previous methods, ours has other important advantages: high robustness (stable, non iterative and fully automatic), high compactness of the coding, and computational efficiency.
Texture analysis plays an important role for automatic image segmentation and object recognition. Objects and regions in an image can be distinguished by their texture, where the distinction arises from the different physical surface properties of the objects represented. To a human observer the different textures in an image are usually very apparent, but the verbal description of the visual properties of these patterns is a difficult and ambiguous task. In computer vision it has turned out in theoretical and experimental comparisons of different methods that the co-occurrence matrix is suitable for texture analysis. Therefore, in this approach the co-occurrence matrix is used as a mathematical model for natural textures. We propose a promising improvement for texture classification and description in the context of natural textures. After developing a new abstract language for describing visual properties of natural textures, we establish a relation between these visual properties used by a human observer, and statistical textural features computed out of the digital image data. Our experiments indicate that some statistical features are more significant for classifying natural textures than others. Finally we apply our new approach on landscape scenes: we show how the new language is used for defining texture classes.
A number of recent efforts have been made to account for the response properties of the cells in the visual pathway by considering the statistical structure of the natural environment. Previously, it has been suggested that the wavelet-like properties of cells in primary visual cortex have been proposed to provide an efficient representation of the structure in natural scenes captured by the phase spectrum. In this paper, we take a closer look at the amplitude spectra of natural scenes and its role in understanding visual coding. We propose that one of the principle insights one gains from the amplitude spectra is in understanding the relative sensitivity of cells tuned to different frequencies. It is suggested that response magnitude of cells tuned to different frequencies increases with frequency out to about 20 cycles/deg. The result is a code in which the response to natural scenes with a 1/f falloff is approximately flat out to 20 cycles/deg. The variability in the amplitude spectra of natural scenes is also investigated. Using a measure called the 'thresholded contrast spectrum' (TCS), it is demonstrated that a good proportion of the variability in the spectra is due to the relative sparseness of structure at different frequencies. The slope of the TCS was found to provide a reasonable prediction of blur across a variety of scenes in spite of the variability in their amplitude spectra.
In recent years a number of researchers have found scaling power spectra in natural images (i.e., the spectrum takes the form of a power-law). This result is surprisingly robust given the variety in each team's choice of image calibration and subject matter. We propose that the salient universal structure present in natural images is their composition of independent occluding objects. In such a world the correlation function, and thus the power spectrum, is generated by two underlying causes: the distribution of object to object transitions, and the correlations present within objects. We show that a power-law distribution of apparent object sizes combined with strongly correlated intra-object structure gives rise to the ubiquitous power-law spectrum in natural scenes. By generating images from occluding square objects we can show definitively that it is not the 1/k2 spectrum of individual edges but rather the distribution of object sizes which causes the scaling in natural images. We demonstrate also that recent measurements of spatio-temporal natural image spectra can be reproduced by such a segmentation of images into independent moving objects.
An algorithm is described which allows for the learning of sparse, overcomplete image representations. Images are modeled as a linear superposition of basis functions, and a set of basis functions is sought which maximizes the sparseness of the representation (fewest number of active units per image). When applied to natural scenes, the basis functions converge to localized, oriented, bandpass functions that bear a strong resemblance to the receptive fields of neurons in the primate striate cortex. Importantly, the code can be made overcomplete, which allows for an increased degree of sparseness in which the basis functions can become more specialized. The learned basis functions constitute an efficient representation of natural images because sparseness forces a form of reduced entropy representation that minimizes statistical dependencies among outputs.
A major problem a visual system faces is how to fit the large intensity variation of natural image streams into the limited dynamic range of its neurons. One of the means to accomplish this is through the use of fast light adaptation of the photoreceptors. In order to investigate this, we measured first time series of natural intensities, and second responses of fly photoreceptors to these time series. Time series representative of what each photoreceptor of a real visual system would normally receive were measured with an optical system measuring the light intensity of a spot comparable to the field of view of single human foveal cones. This system was worn on a head-band by a freely walking person. Resulting time series have a high rms-contrast in the order of 1, and power spectra behaving approximately as 1/f (f: temporal frequency). Measured time series were subsequently presented to fly photoreceptors by playing them back on an LED. The results show that fast light adaptation indeed keeps the response within the dynamic range of the cells and that a large part of this range is actually needed for packing the information in natural time series.
Color perception depends profoundly on adaptation processes that adjust sensitivity in response to the prevailing pattern of stimulation. We examined how color sensitivity and appearance might be influenced by adaptation to color distributions that are characteristic ofnatural images. Color distnl<utions were measured for natural scenes by successively recording each scene with a digital camera through 31 interference filters, or by sampling an array of locations within each scene with a spectroradiometer. The images were used to reconstruct the L, M, and S cone excitation at each spatial location, and the contrasts along three post-receptoral axes [L+M, L-M, or S-(L+M)]. Chromatic contrasts varied principally along a bluish-yellowish axis along which L-M and S-(L+M) signals were highly correlated, with weaker correlations between luminance and chromaticity. We use a two-stage model (von Kries scaling followed by decorrelation) to show how adaptation might influence color appearance by selectively reducing sensitivity to the principal axes of the color distributions, and compare these predictions to empirical measurements of asymmetric color matches obtained after adaptation to successive random samples drawn from natural color distributions.
The psychophysical task of discriminating small changes in the slopes of the amplitude spectra of complex images (such as digitized photographs of natural scenes) has been used to examine whether the human visual system is optimized for coding the information in natural images. The discrimination thresholds are highest when the test stimuli have amplitude spectra similar in form to those of truly natural images, and are lower when the spectra are steeper or shallower than 'normal.' The magnitudes of the thresholds differ markedly between stimuli derived from different photographs. We describe a model that explains the variety of threshold magnitudes; we suppose that the observer is detecting small changes in image contrast estimated within limited spatial-frequency bands of about 1 octave bandwidth. At threshold, the contrast change in only one frequency band will generally match the observer's JND for simple sinusoidal gratings. The success of this band-limited contrast model is shown further in experiments where the slopes of the amplitude spectra of the stimuli are changed within restricted frequency bands. If the slope is changed only within the limited frequency-band implicated by the contrast model, the observer's thresholds are unchanged, but they are elevated if the slope changes are mode only outside of the implicated band.
The local contrast in an image may be approximated by the contrast of a Gabor patch of varying phase and bandwidth. In a search for a metric for such local contrast, perceived (apparent) contrast, as indicated by matching of such patterns, were compared here to the physical contrast calculated by a number of methods. The 2 cycles/deg 1-octave Gabor patch stimuli of different phases were presented side by side separated by 4 degrees. During each session the subjects (n equals 5) were adapted to the average luminance, and four different contrast levels (0.1, 0.3, 0.6, and 0.8) were randomly interleaved. The task was repeated at four mean luminance levels between 0.75 and 37.5 cd/m2. The subject's task was to indicate which of the two patterns was lower in contrast. Equal apparent contrast was determined by fitting a psychometric function to the data from 40 to 70 presentations. There was no effect of mean luminance on the subjects settings. The matching results rejected the hypothesis that either the Michelson formula or the King-Smith & Kulikowski contrast (CKK equals (Lmax-Laverage)/Laverage) was used by the subjects to set the match. The use of the Nominal contrast (the Michelson contrast of the underlying sinusoid) as an estimate of apparent contrast could not be rejected. In a second experiment the apparent contrast of a 1-octave Gabor patch was matched to the apparent contrast of a 2-octave Gabor patch (of Nominal contrast of 0.1, 0.3, 0.6, 0.8) using the method of adjustment. The result of this experiment rejected the prediction of the Nominal contrast definition. The local band limited contrast measure (Peli, 1990), when used with the modifications suggested by Lubin (1995), as an estimate of apparent contrast could not be rejected by the results of either experiment. These results suggest that a computational contrast measure based on multi scale bandpass filtering is a better estimate of apparent perceived contrast than any of the other measures tested.
Orientation-tuned spatial filters in visual cortex are widely held to act as 'orientation detectors,' but our experiments on the perception of stationary 2-D plaids require a new view. When two sinusoidal gratings at different orientations (say 1 c/deg, plus or minus 45 deg from vertical) are superimposed to form a standard plaid they do not in general look like two sets of oblique contours (diamonds) but more like a blurred checkerboard (squares) with vertical and horizontal edges, although the Fourier components are oblique. The pattern of edges seen in this plaid and others corresponds to the zero-crossings (ZCs) in the output of a circular filter, but adaptation and masking experiments suggest that oriented filters are being summed to emulate circular filtering, before ZC analysis. At low contrasts or after adaptation to an intermediate orientation, the combination can fail or be 'broken,' and the diamond structure of the components is seen instead. Adding a low contrast 3rd harmonic to one or both components in square-wave phase also changed the plaid's appearance from squares to diamonds, but adapting to the 3rd harmonic enhanced the square appearance. Filters can evidently switch from combining across orientation to combining across spatial frequency, perhaps reflecting a preference for sharp edges. The combination stage of edge detection may involve variably weighted summing of oriented filters in monocular pathways, and the idea of 'orientation detectors' is no longer useful for understanding the perceived spatial structure of these 2-D plaid images.
Observers viewed a simulated airport runway landing scene with an obstructing aircraft on the runway and rated the visibility of the obstructing object in varying levels of white fixed-pattern noise. The effect of the noise was compared with the predictions of single and multiple channel discrimination models. Without a contrast masking correction, both models predict almost no effect of the fixed-pattern noise. A global contrast masking correction improves both models' predictions, but the predictions are best when the masking correction is based only on the noise contrast (does not include the background image contrast).
Visual performance models have in the past, typically been empirical, relying on the user to supply numerical values such as target contrast and background luminance to describe the performance of the visual system, when undertaking a specified task. However, it is becoming increasingly easy to obtain computer images using for example digital cameras, scanners, imaging photometers and radiometers. We have therefore been examining the possibility of producing a quantitative model of human vision that is capable of directly processing images in order to provide predictions of performance. We are particularly interested in being able to process images of 'real' scenes. The model is inspired by human vision and the components have analogies with parts of the human visual system but their properties are governed primarily by existing psychophysical data. The first stage of the model generates a multiscale, difference of Gaussian (DoG) representation of the image (Burton, Haig and Moorhead), with a central foveal region of high resolution, and with a resolution that declines with eccentricity as the scale of the filter increases. Incorporated into this stage is a gain control process which ensures that the contrast sensitivity is consistent with the psychophysical data of van Nes and Bouman. The second stage incorporates a model of perceived contrast proposed by Cannon and Fullenkamp. Their model assumes the image is analyzed by oriented (Gabor) filters and produces a representation of the image in terms of perceived contrast.
Contrast thresholds for sine-wave gratings are increased when the gratings are compressively sampled into a set of narrow bright bars on a dark background, even though the mean luminance and contrast of the grating are unchanged by this sampling method. Burr, Ross and Morrone, who first demonstrated this phenomenon, suggested this was due to local luminance adaptation to the sample bars, whose average peak luminance necessarily increases with the degree of compressive sapling. However, the results could also be explained on the basis of either a luminance compressive non-linearity, or a local contrast-based non-linearity. Previously we reported results with decrement CSGs, which consist of dark sample bars on a bright background, which favored the local luminance adaptation hypothesis. However here we show that this hypothesis is untenable. Using increment CSGs (bright bars on a dark background) we found that raising background luminance while holding average peak sample bar luminance constant reduced thresholds by as much as a factor of ten. This demonstrates that it is the contrast of the bars, rather than their peak luminance, which is the important feature determining thresholds, at least with increment CSGs. We also provide evidence for the involvement of a gain control mechanism which serves to partially reduce the deleterious effects of the contrast-based non-linearity on CSG thresholds. Finally we show that CSG thresholds can be reduced by the presence of a low-contrast unsampled grating mask. This suggests that although local contrast processing is initially involved in CSG detection, the cortical mechanisms which ultimately detect CSGs are the same as those which detect the unsampled gratings from which they are derived.
An essential task of the human visual system is to detect the presence of objects embedded within spatial backgrounds of the kind found in common visual scenes. Object backgrounds can differ from regions of effectively uniform luminance to areas of almost arbitrarily complex spatial structure. Detection on uniform backgrounds classically shows Weber behavior, but the form taken by the threshold function in the general case of spatially variable backgrounds is not known. For this it is proposed that a general expression of Weber's law would apply, that accounts for the local contrast contributions both of the background and the feature. The new general law states that threshold is reached when feature contrast exceeds background contrast by an amount equal to a typical Weber constant. To define contrast on a spatially variable background, an adaptable contrast metric is used with variable position and spatial scale terms. It is shown that the detection thresholds of transient (triangle-profile) pulses at any phase on a sinusoidal background follow the general but not the familiar expression of Weber's law. The results also demonstrate that for highly localized features local contrast can be computed at scales as fine as 0.6 arcmin. Further implications including the possible neural locus of the implied adaptation to local contrast are discussed.
User requirements and system cognitive quality are considered in relation to the integration of new technology, in particular for aiding cognitive functions. Intuitive interfaces and display design matching user mental models and memory schema are identified as human-centered design strategies. Situational awareness is considered in terms of schema theory and perceptual control. A new method for measuring cognitive compatibility is described, and linked to the SRK taxonomy of human performance, in order to provide a framework for analyzing and specifying user cognitive requirements.
Humans are endowed with a highly developed vision system, and nowadays the technical aids for creating imagery are available too. Hypermedia systems represent the technology by means of which it is possible to produce easily and naturally concrete and abstract projections of human mental images by computer. The delivery system and visual interface of visual media, capable of supporting interactivity while facilitating at the same time interconnectivity of images and symbols, has a potential to become an extremely powerful virtual board presentation tool. The paper describes an interactive display system on the screen as a virtual board visual interface. The sophisticated design enables the viewer's good orientation in the screen (virtual board) with a full personal interactivity for every member of the group auditorium. Describe a system which improves the interaction between many subjects in front of the visual interface. It is open to the use of collaborative scientific and educational visualization.
With the rapid expansion in imaging technologies, retrieval of images from digital collections is a subject of major interest. However, the large amount of research devoted to understanding mechanisms and products of human visual perception has, by and large, not produced information which is directly applicable to the problems of image retrieval, and systems designers still need basic data concerning which image attributes are most typically noticed by humans when viewing images. The goal of this research was to fill this gap in our knowledge, by investigating attributes reported by participants in several describing tasks with pictorial images. Content analysis of word and phrase data revealed forty-two image attributes and ten higher level classes of attributes. Participants most typically describe the perceptual attributes such as the literal object content of images and the human form, as well as the attribute of color. Location appears to be important and needs to be accounted for, as does a group of interpretive attributes labeled CONTENT/STORY. The research results suggest that term variability in the image descriptions is less than previous research might indicate, and communicative constraints operating on visually perceived data may aid in simplifying some of the approaches necessary to accomplish automated indexing. The initial analysis suggests several conceptual frameworks, such as basic level objects, figure-ground, and the use of schemas, as fruitful approaches to image indexing and retrieval.
We introduce a framework for locating the facial features that are robust to varying conditions in lighting, scale, position and orientation. This facial feature extraction algorithm could be useful as a front-end for a face recognition system either to normalize the data or provide the critical features for the classification. We are currently developing a face recognition algorithm which will incorporate the facial feature location algorithm described here. The algorithm is based on a general template which outlines different regions of the face. The template is matched to a particular image location where a set of a priori constraints are best met. The constraints are chosen to be invariant over a wide set of facial characteristics and external conditions. The constraints include ratios of average intensity values, average chrominance values and average smoothness values. The idea of constructing a general template and a set of a priori constraints could easily be extended to other objects.
DEVise is a data visualization and exploration system capable of handling large data sets using off-the-shelf hardware with minimal memory requirements. Data can be large in volume, complex in structure (multi-dimensional and/or hierarchical), and may be imported from different sources such as database servers, external programs, and World Wide Web resources. Commercial and scientific databases can also be linked to the DEVise to allow the user to visualize and analyze related information from heterogeneous sources. Associations between data sources are developed interactively as the user gains more knowledge of the data being explored. To assist in handling large data sets, DEVise allows a user to logically split the data into more manageable units at different levels. The user selects a data source, a data stream within a data source (e.g. a time series), attributes of a stream, and a mapping of attributes to graphical objects. At each step, the selections made by the user reduce the data volume. DEVise takes advantage of this form of 'data compression' to optimize its caching strategies and to minimize the accesses needed to fetch data from tertiary storage, for example. DEVise supports users with different expertise levels by automating most tasks performed by a novice user and by also providing a programming interface that allows new data sources to be defined, new graphical objects to be used, and custom storage policies to be employed.
This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.
We estimated the proportion of people that have defective stereo vision and are unable to utilize stereo disparity information to perceive depth. Previous estimates have ranged anywhere from as high as 30% to as low as 6%. Our goal was to understand the basis for the wide range in these estimates. To do this, we administered two psychophysical tests to a sample of 100 young adults. Visual stimuli consisted of dynamic random-dot stereograms presented using a fast-decay, time-sequential display device. The stimuli covered a range of disparities between 0 and .38 degrees (both crossed and uncrossed). A forced-choice methodology was used to estimate whether subjects could perceive depth based on horizontal disparity. It was found that display duration was a key variable determining the number of viewers that were classified as stereo-anomalous. The relatively high incidence of stereo- anomalous viewers in previous research was explained by the short display durations (80 ms) used in those studies. With longer durations of about 1 sec, we found that only about 5% of viewers had defective stereo vision.
Eye strain is often experienced when viewing a stereoscopic image pair on a flat display device (e.g., a computer monitor). Violations of two relationships that contribute to this eye strain are: (1) the accommodation/convergence breakdown and (2) the conflict between interposition and disparity depth cues. We describe a simple algorithm that reduces eye strain through horizontal image translation and corresponding image cropping, based on a statistical description of the estimated disparity within a stereoscopi image pair. The desired amount of translation is based on the given stereoscopic image pair, and, therefore, requires no user intervention. In this paper, we first develop a statistical model of the estimated disparity that incorporates the possibility of erroneous estimates. An estimate of the actual disparity range is obtained by thresholding the disparity histogram to avoid the contribution of false disparity values. Based on the estimated disparity range, the image pair is translated to force all points to lie on, or behind, the screen surface. This algorithm has been applied to diverse real stereoscopic images and sequences. Stereoscopic image pairs, which were often characterized as producing eye strain and confusion, produced comfortable stereoscopy after the automated translation.
This paper contains a general description of the RaPID (rapid perceptual image description) method, and the results of a series of experiments on the attribute sharpness. The method aims at a rapid and perceptually meaningful description and quantification of the primary factors of image quality. The purpose of the project is primarily to develop the main dimensions of image quality, but also to determine how they combine into overall image quality, and to develop perceptual models for the main dimensions. The result of the project will be a technique that can be used to evoke, quantify, analyze and interpret subjective reactions to the characteristics of imaging systems. A preliminary experiment was conducted on sharpness related attributes of images. The results showed that the five TV sets under study had to be evaluated on 11 different terms and that they combine into two independent attribute dimensions for natural or test images. If both types of images are considered simultaneously, the 'sharpness' space consists of at least 3 dimensions. Physical measurements were used to calculate sharpness related measures (MTFs, step responses, line width) and these were correlated with the subjective results.
Early vision processes, based on human visual system (HVS) performance, provide insufficient information for modeling our assimilation of image sequences (e.g. video). The use of a visual attention paradigm for modeling viewer response over time is advanced here. An 'importance map' of the scene can be constructed using both spatial and temporal information. The image quality of an individual frame can be degraded significantly using the importance map to predict typical foci of attention. Knowledge of the whole scene can be built up over many frames, by accumulating details represented at low quality in areas identified by the importance map as warranting less visual attention. We conjecture some limitations on the image quality and provide synthesized examples of scenes coded using this model.
One of the possible models of the human visual system (HVS) in the computer vision literature has a high resolution fovea and exponentially decreasing resolution periphery. The high resolution fovea is used to extract necessary information in order to solve a vision task and the periphery may be used to detect motion. To obtain the desired information, the fovea is guided by the contents of the scene and other knowledge to position the fovea over areas of interest. These eye movements are called saccades and corrective saccades. A two stage process has been implemented as a mechanism for changing foveation in log polar space. Initially, the open loop stage roughly foveates on the best interest feature and then the closed loop stage is invoked to accurately iteratively converge onto the foveation point. The open loop stage developed for the foveation algorithm is applied to saccadic eye movements and a tracking system. Log polar space is preferred over Cartesian space as: (1) it simultaneously provides high resolution and a wide viewing angle; and (2) feature invariance occurs in the fovea which simplifies the foveation process.
We have developed a preliminary version of a foveated imaging system, implemented on a general purpose computer, which greatly reduces the transmission bandwidth of images. The system is based on the fact that the spatial resolution of the human eye is space variant, decreasing with increasing eccentricity from the point of gaze. By taking advantage of this fact, it is possible to create an image that is almost perceptually indistinguishable from a constant resolution image, but requires substantially less information to code it. This is accomplished by degrading the resolution of the image so that it matches the space-variant degradation in the resolution of the human eye. Eye movements are recorded so that the high resolution region of the image can be kept aligned with the high resolution region of the human visual system. This system has demonstrated that significant reductions in bandwidth can be achieved while still maintaining access to high detail at any point in an image. The system has been tested using 256 by 256 8 bit gray scale images with a 20 degree field-of-view and eye-movement update rates of 30 Hz (display refresh was 60 Hz). users of the system have reported minimal perceptual artifacts at bandwidth reductions of up to 94.7% (a factor of 18.8). Bandwidth reduction factors of over 100 are expected once lossless compression techniques are added to the system.
In this paper, we introduce a new method of data compression and decompression technique to search the aimed image based on the gazing area of the image. Many methods of data compression have been proposed. Particularly, JPEG compression technique has been widely used as a standard method. However, this method is not always effective to search the aimed images from the image filing system. In a previous paper, by the eye movement analysis, we found that images have a particular gazing area. It is considered that the gazing area is the most important region of the image, then we considered introducing the information to compress and transmit the image. A method named fixation based progressive image transmission is introduced to transmit the image effectively. In this method, after the gazing area is estimated, the area is first transmitted and then the other regions are transmitted. If we are not interested in the first transmitted image, then we can search other images. Therefore, the aimed image can be searched from the filing system, effectively. We compare the searching time of the proposed method with the conventional method. The result shows that the proposed method is faster than the conventional one to search the aimed image.
We introduce a new model that can be used in the perceptual optimization of standard color image coding algorithms (JPEG/MPEG). The human visual system model is based on a set of oriented filters and incorporates background luminance dependencies, luminance and chrominance frequency sensitivities, and luminance and chrominance masking effects. The main problem in using oriented filter-based models for the optimization of coding algorithms is the difference between the orientation of the filters in the model domain and the DCT block transform in decoding domain. We propose a general method to combine these domains by calculating a local sensitivity for each DCT (color) block. This leads to a perceptual weighting factor for each DCT coefficient in each block. We show how these weighting factors allow us to use advanced techniques for optimal bit allocation in JPEG (e.g. custom quantization matrix design and adaptive thresholding). With the model we propose it is possible to calculate a perceptually weighted mean squared error (WMSE) directly in the DCT color domain, although the model itself is based on a directional frequency band decomposition.
The discrete wavelet transform (DWT) decomposes an image into bands that vary in spatial frequency and orientation. It is widely used for image compression. Measures of the visibility of DWT quantization errors are required to achieve optimal compression. Uniform quantization of a single band of coefficients results in an artifact that is the sum of a lattice of random amplitude basis functions of the corresponding DWT synthesis filter, which we call DWT uniform quantization noise. We measured visual detection thresholds for samples of DWT uniform quantization noise in Y, Cb, and Cr color channels. The spatial frequency of a wavelet is r 2-L, where r is display visual resolution in pixels/degree, and L is the wavelet level. Amplitude thresholds increase rapidly with spatial frequency. Thresholds also increase from Y to Cr to Cb, and with orientation from low-pass to horizontal/vertical to diagonal. We propose a mathematical model for DWT noise detection thresholds that is a function of level, orientation, and display visual resolution. This allows calculation of a 'perceptually lossless' quantization matrix for which all errors are in theory below the visual threshold. The model may also be used as the basis for adaptive quantization schemes.
One area of applied research in which vision scientists can have a significant impact is in improving image compression technologies by developing a model of human vision which can be used as an image fidelity metric. Scene cuts and other transient events in a video sequence have significant impact on digital video transmission bandwidth. We have therefore been studying masking at transient edge boundaries where bit rate savings might be achieved. Using Crawford temporal and Westheimer spatial masking techniques, we find unexpected stimulus polarity dependent effects. At normal video luminance levels there is a greater than fourfold increase in narrow line detection thresholds near the temporal onset of luminance pedestals. The largest elevations occur for pedestal widths in the range of 2 - 10 min. When the luminance polarity of the test line matches that of the pedestal polarity the masking is much greater than when the test and pedestal have opposite polarities. We believe at least two masking processes are involved; (1) a rapid response saturation in on- or off-center visual mechanisms and (2) a process based on a stimulus ambiguity when the test and pedestal are about the same size. The fact that masking is greatest for local spatial configurations gives one hope for its practical implementation in compression algorithms.
Many image and video compression schemes perform the discrete cosine transform (DCT) to represent image data in frequency space. An analysis of a broad suite of images confirms previous findings that a Laplacian distribution can be used to model the luminance ac coefficients. This model is expanded and applied to color space (Cr/Cb) coefficients. In MPEG, the DCT is used to code interframe prediction error terms. The distribution of these coefficients is explored. Finally, the distribution model is applied to improve dynamic generation of quantization matrices.
Rendering of halftone images on scattering media such as paper is subject to dot gain. Physical dot gain is caused by the mechanical and chemical properties of the paper, the ink and the printing device, whereas optical dot gain depends on the optical properties of the medium. Media such as semitransparent film and paper show a pronounced internal scattering. Light incident on the surface of such a medium is in part reflected at the surface, but most of it enters the medium. That light is diffused by multiple internal scattering before it either re- emerges at the surface, penetrates through to the opposite side of the medium, or is absorbed. Because of this diffusion, a narrow beam of light entering the surface will be reflected with a 'halo' around the point of entry, which is precisely what causes the optical dot gain. In this paper we present a model for the optical dot gain on scattering media that enables us to predict color shifts in the rendering as caused by different raster technologies. The model is based on a non-linear application of a point spread function to the flux of light falling upon the medium surface. The point spread function has been established using a supercomputer simulation. The model has been successfully used to predict dot gain in monochromatic halftone prints and may be used for color prints as well. Simulations show that in color printing, dot gain can actually increase the color gamut. The results have obvious implications for the modeling of color printing and the strategies for device independent color management for color hardcopy, and may also be used in conjunction with segmentation of colors on scattering media.
Halftone calibration of a black and white printer is a known process that involves printing and measuring patches for many different halftone levels. It is a tedious process that has to be repeated for every halftone dot or algorithm to be used. A new calibration procedure is described that uses a halftone-independent characterization of the printer and a pixel overlap model to predict the tone response of any halftone algorithm. This enables all halftone dots and algorithms to be calibrated with only one set of printer measurements.
An optimal pixel assignment model for objective quality measure of halftone images has been developed. The model is based on the smallest units for tone reproduction and edge reproduction. The correlation of the reproduction error between an original gray-scale image and the units-based images has been adopted in the model. Theoretical analysis and experimental results have shown that the model is efficient for the evaluation of halftone image quality.
We describe a new halftoning algorithm that uses a multiscale dot distribution procedure over a rotated quad-tree structure. The algorithm ensures that the average graylevel of the grayscale image over any node of the rotated quad-tree is equal to that of the halftone at the same node. This is achieved by first deciding the total number of black and white dots to be placed over the entire halftone, and then distributing the dots recursively to the subregions represented by the nodes of the rotated quad-tree.
An iterative halftoning method with a space-variant, input-dependent control of the binarization noise is presented. The algorithm is based on experiences with the iterative Fourier transform algorithm. It uses a convolution with an appropriately chosen finite impulse response (FIR) filter and a spatial operation instead of the Fourier transform and spectral operation. An adaptation of the spectral properties to the local grayvalue of the input image is demonstrated. Besides the local grayvalue the local image gradient is considered. Incorporation of the information about the direction of the local gradient is possible by using a non-symmetric impulse response. The idea is to preserve the spatial frequency content of the image along the gradient at the expense of the content perpendicular to it. The control of the various free parameters of this halftoning method is discussed. Experiments with a gradient- adaptive, space-variant impulse response according to a local transfer function covering elliptic areas show an improved reproduction of high frequencies in the binary image.
A novel error diffusion technique with vivid color enhancement and noise reduction has been developed to achieve high quality, high resolution color ink jet printing. Conventional error diffusion produces halftones with worm-like and salt-and-pepper-like noises. Each halftoned pixel may be one of the 8 colors: RGBCMYKW. Depending on the printing device characteristics (including printer dot size, registration, resolution, ink, and media), color interference between the 8 possible colors may occur. This color interference produces visible salt-and-pepper noise. In order to remove the noise, we classify the 8 colors into three clusters. Each cluster contains a set of harmonic colors. We exclude non-harmonic color halftones to be mixed. Any smooth color area in an input image uses a cluster (i.e., a set of colors) to make the halftone color more vivid and to minimize the non-harmonic color interference. In the error diffusion processing, inter-frame information between RGB plans are used to enforce the color cluster rules to achieve no salt-and-pepper noise and more vivid color halftones. Experimental results are provided to show the effectiveness of the technique.
A new approach for anisotropic diffusion processing of color images is proposed. The main idea of the algorithm is to facilitate diffusion of the image in the direction parallel to color edges. The direction of maximal and minimal color change at each point is computed using the first fundamental form of the image in (L*a*b*) color space. The image (Phi) evolves according to an anisotropic diffusion flow given by (delta) (Phi) /(delta) t equals g((lambda) +, (lambda) -)(delta) 2(Phi) /(delta) (xi) 2, where (xi) is the direction of minimal color change. The diffusion coefficient, g((lambda) +, (lambda) -), is a function of the eigenvalues of the first fundamental form, which represent the maximal and minimal rates of color change. Examples for real color images are presented.
A simple model for quantifying dither screen performance is proposed. Using this model, inherent limitations on the performance of dither screen halftoning are proved. An efficient method for generating quality dither screens, with good performance under our model, is presented.
Digital color images are represented by 3 color values for each picture element (pixel). The most common types of color coordinates for capture and display are RGB, but they are transformed into YUV (or other luminance-chrominance type of space) for image processing and compression. The conventional wisdom being that while luminance data are processed at full spatial and temporal resolution, chrominance information can be considerably subsampled without significant losses in perceived image quality. Also, the 3-fold complexity of RGB 3- channel processing can be usually reduced to 2-fold: one channel for luminance and another for chrominance. We have been studying the advantages of a different image representation, one suggested by biological systems. It requires only one color value at each pixel location, reducing the total data load to one third, and it can be processed, compressed and transmitted with the simplicity of a monochrome image (i.e., one-channel complexity) by adequate handling of entropy increments. At the receiver, this representation can be decoded into a full color picture of the same perceived quality and resolution as a picture represented and processed in conventional ways. Technical developments related to this representation have been presented here before; and specific technical advantages have been registered. This technology eliminates the burden of front-end color space transformation, and removes the complexity of separate computations for chromatic and achromatic processing. Specific applications comprise four areas: (1) Simplification of processing, storage, compression, and transmission of digital color images. (2) Economical full-color upgrading of black and white image capturing systems. (3) Increase up to 4x of spatial resolution in high-quality digital image capturing systems currently designed for triplane color capture (three separate CCDs). (4) Extension of the three areas above to dynamic sequences (digital video and motion pictures). [truncated]