PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
The classical approach in vision research - the derivation of basically linear filter models form experiments with simple artificial test stimuli - is currently undergoing a major revision. Instead of trying to keep the dirty environment out of our clean labs we put it now right into the focus of scientific exploration. The new approach has a close relation to basic engineering strategies for electronic image processing since its major concept is the exploration of the statistical redundancies of the environment by appropriate neural transformations. The standard engineering methods are not sufficient, however. Even a basic biological feature like orientation selectivity requires the consideration of higher-order statistics, like cumulants or polyspectra. Furthermore, there exists an abundance of nonlinear phenomena in biological vision, for example the phase-invariance of complex cells, cortical gain control, or end-stopping, which make it necessary to consider unconventional modeling approaches like differential geometry or Volterra-Wiener system. By use of such methods we cannot only gain a deeper understanding of the adaption of the visual system to the complex natural environment, but we can also make the biological system an inspiring source for the design of novel strategies in electronic image processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In our natural environment, we simultaneously receive information through various sensory modalities. The properties of these stimuli are coupled by physical laws, so that, e.g., auditory and visual stimuli caused by the same even have a fixed temporal relation when reaching the observer. In speech, for example, visible lip movements and audible utterances occur in close synchrony which contributes to the improvement of speech intelligibility under adverse acoustic conditions. Research into multi- sensory perception is currently being performed in a great variety of experimental contexts. This paper attempts to give an overview of the typical research areas dealing with audio-visual interaction and integration, bridging the range from cognitive psychology to applied research for multimedia applications. Issues of interest are the sensitivity to asynchrony between audio and video signals, the interaction between audio-visual stimuli with discrepant spatial and temporal rate information, crossmodal effects in attention, audio-visual interactions in speech perception and the combined perceived quality of audio-visual stimuli.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The creative process can be described as a continuous feedback loop between the material and an artist's decision making process. A skill can be described as knowledge of a material that allows making more informed decisions and more controlled interactions. As the artists attains a deeper knowledge of a material, the cognitive process involved in creation diverges from technical considerations and concerns itself more with meaning an depression. With computation, the creative process is better described as an evaluative process. Computers allow a multitude of stored copies and variations and also permit the visual artist to create many compositions. The artists may subsequently choose the most appealing among them, refining procedures and algorithms through an evaluative process of trial and error. The traditional relationship between the artist and the computer has been one of artists exercising visual judgement in light of manipulation of material. In this paper, we contrast the extensive use of randomness with a more controlled expression given advances in our modeling of human vision and of imagin system. The context for this discussion is computational expressionism, an exploration of computational drawing that redefines the concept of lines and compositions for the digital medium.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The contrast sensitivity function (csf) is central to describing spatial vision and to models of visual coding, yet little is known about the form of the function under natural viewing conditions. We examined how contrast sensitivity is affected by adaption states that should arise in the course of normal viewing. Webster and Miyahara showed that adaptation to the low-frequency biases in natural scenes selectively reduces sensitivity at low frequencies. Here we examine how these sensitivity changes depend on the properties of observers, by varying subjects' refractive state or by measuring adaptation to chromatic contrast rather than luminance contrast. Defocus and physical blurring have similar effects, altering the adaptation only for strongly blurred images. Switching to chromatic contrast induces larger sensitivity changes at low frequencies, consistent with the different csf's for color and luminance. Thus natural viewing may lead to characteristic adaptation states that differ for luminance and color. To examine the basis for these sensitivity changes, we adapted to 1/f patterns filtered over different frequency bands. Adding lower frequencies to images reduces the adaption induced by higher frequencies. Thus in natural-image adaptation, the low-frequency bias may result - not from the bias in the input spectra - but because the adaptation at different spatial scales is not independent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a previous study the simulation of image appearance from different distances was shown to be effective. The simulated observation distance accurately predicted the distance at which the simulated image could be discriminated from the original image. Due to the 1/f nature of natural images spatial spectra, the individual CSF used was actually tested only at one retinal spatial frequency. To test the CSF relevant for the discrimination task over a wide range of frequencies, the same simulations and testing procedure were applied to 5 contrast versions of the images. The lower contrast imags probe the CSF at lower spatial frequencies, while higher contrast images test the CSF value at higher spatial frequencies. Images were individually processed for each of 4 observers using their individual CSF to represent the appearance of the images from 3 distances where they span 1, 2, and 4 deg of visual angle, respectively. Each of the 4 pictures at the 5 contrast levels and the 3 simulated distances was presented 10 times side-by-side with the corresponding original image. Images were observed from 9 different observation distances. Subject task was to determine which of the two was the original, unprocessed images. For each simulated distance the data was used to determine the discrimination distance threshold.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We studied the detection of chromoluminance patterns in the presence of chromoluminance pedestals. We examined how thresholds depend on the color directions of the target and the pedestal. Both targets and pedestals were spatial Gabor patterns. The patterns were spatially modulated in color, luminance or both. Equidiscrimination contours describe contrast thresholds for targets in different color directions on the same pedestal. We measured the equidiscrimination contours on green/red and blue/yellow pedestals. The equidiscrimination contours changes with the contrast and the color directions of the pedestals. We applied a model with three pairs of mechanisms that we proposed earlier to these data. Each mechanism consists of a linear receptive-field like color-spatial operator followed by a nonlinear process. The nonlinear process takes two inputs: the excitation comes directly from the linear operator and the divisive inhibition is a nonlinear sum of all linear operator response. Two linear operator pairs are color opponent while the third is non-opponent. The detection variable is computed from the outputs of the nonlinear processes combine by Quick's pooling rule.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nonlinear contributions to pattern classification by humans are analyzed by using previously obtained data on discrimination between aligned lines and offset lines. We how that the optimal linear model can be rejected even when the parameters of the model are estimated individually for each observer. We use a new measure of agreement to reject the linear model and to test simple nonlinear operators. The first nonlinearity is position uncertainty. The linear kernels are shrunk to different extents and convolved with the input images. A Gaussian window weights the results of the convolutions and the maximum in that window is selected as the internal variable. The size of the window is chosen such as to maintain a constant total amount of spatial filtering, i.e., the smaller kernels have a larger position uncertainty. The result of two observers indicate that the best agreement is obtained at a moderate degree of position uncertainty, plus-minus one min of arc. Finally, we analyze the effect of orientation uncertainty and show that agreement can be further improved in some cases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We are all familiar with a number of contrast experiments in which two identical reflectance patches appear different in different spatial contexts. Examples are simultaneous contrast and White's effect. The simultaneous contrast experiment places a gray patch in a white surround to make the gray appear darker. In this case, identical radiances at the target are not identical radiances at the retina. If we want equal retinal radiances, we need, with the aid of an intraocular scatter model, to make a display that has lower display radiances in a white surround. This paper compares a number of scatter-corrected contrast experiments with their uncorrected counterparts. In simultaneous contrast, correcting for scatter shows that the underlying spatial interactions have a larger effect on sensations that in uncorrelated displays. Scatter and contrast tend to cancel, but in this case, they just reduce the apparent size of the spatial interaction. White's effect is just the opposite. The contrast effect and scatter add. In this case, correcting for scatter reduces the size of the effect, but does not overpower it. This paper describes a number of contrast experiments corrected for scattered light. The paper further discusses the magnitude of lightness shifts due to spatial interactions after scatter has been corrected. Scatter cannot explain White's Effect, although correction for scatter reduces its magnitude.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Perceptual Models for Sampling, Quantization, and Compression
An accurate and efficient model of human perception has been developed to control the placement of sample in a realistic image synthesis algorithm. Previous sampling techniques have sought to spread the error equally across the image plane. However, this approach neglects the fact that the renderings are intended to be displayed for a human observer. The human visual system has a varying sensitivity to error that is based upon the viewing context. This means that equivalent optical discrepancies can be very obvious in one situation and imperceptible in another. It is ultimately the perceptibility of this error that governs image quality and should be used as the basis of a sampling algorithm. This paper focuses on a simplified version of the Lubin Visual Discrimination Metric (VDM) that was developed for insertion into an image synthesis algorithm. The sampling VDM makes use of a Haar wavelet basis for the cortical transform and a less severe spatial pooling operation. The model was extended for color including the effects of chromatic aberration. Comparisons are made between the execution time and visual difference map for the original Lubin and simplified visual difference metrics. Results for the realistic image synthesis algorithm are also presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the human color vision, it is well admitted that signals issued from the three types of receptors (L, M, S) are combined in two opponent color components and one achromatic component.In this paper, we are concerned by the cardinal directions A, Cr1 and Cr2 defined by Krauskopf. We study in particular the interactions between luminance and chromatic components. These interactions should be taken into account in visual coding since they modify the visibility thresholds. We present here results that show the influence of the two chromatic components on the optimal perceptual quantizer of the achromatic component in particular subbands. On the subband called III-1 of luminance, we show influence of Cr1 and Cr2 sinusoidal maskers. Other results are also presented on the subband called II-1 with Cr1 and Cr2 maskers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In todays digital prepress workflow images are most often sorted in the CMYK color representation. In the lossy compression of CMYK color imags, most techniques do not take the tonal correlation between the color channels into account or they are not able to perform a proper color decorrelation in four dimensions. In a first stage a compression method has been developed that takes this type of redundancy into account. The basic idea is to divide the image into blocks. The color information in those blocks is then transformed from the original CMYK color space into a decorrelated color space. In this new color space not all components are of the same magnitude so here the gain for compression purposes becomes clear. After the color transformation step any regular compression scheme meant to reduce the spatial redundancy can be applied. In this paper a more advanced approach for the utilization procedure in the compression algorithm is presented. The proposed scheme tries to control the quantization parameters differently for all blocks and color components. Therefore the influence on the CIELab (Delta) E measure is investigated when making a small shift in the four main directions of the decorrelated color space.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Perceptual Models for Sampling, Quantization, and Compression
In this paper we present a new wavelet-based coding scheme for the compression of color images at compression ratios up to 100:1. It is originally based on the LZC algorithm of Taubman. The main point of discussion in this paper is the color space used and the combination of a coding scheme with a model of human color vision. We describe tow approaches: one is based on the pattern-color separable opponent space described by Poirson-Wandell; the other is based on the YCbCr-space that is often used for compression. In this article we show the results of some psychovisual experiments we did to refine the model of the opponent space concerning its color contrast sensitivity function. These are necessary to use it for image compression. They consists of color matching experiments performed on a calibrated computer display. We discuss this particular opponent space concerning its fidelity of prediction for human perception and its characteristics in terms of compressibility. Finally we compare the quality of the coded images of our approach to Standard JPEG, DCTune 2.0 and the SPIHT coding scheme. We demonstrate that our coder outperforms these three coders in terms of visual quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Packet transmissions over the Internet incur delay jitter that requires data buffering for resynchronization, which is unfavorable for interactive applications. Last year we reported result of formal subjective quality evaluation experiments on delay cognizant video coding (DCVC), which introduces temporal jitter into the video stream. Measures such as MSE and MPQM indicate the introduction of jitter should degrade video quality. However, most observers actually preferred compressed video sequences with delay to sequences without. One reason for this puzzling observation is that the delay introduced by DCVC suppresses the dynamic noise artifacts introduced by compression, thereby improving quality. This observation demonstrates the possibility of reducing bit rate and improving perceived quality at the same time. We have been characterizing conditions in which dynamic quantization noise suppression might improve video quality. A new battery of video test sequences using simple stimuli were developed to avoid the complexity of natural scenes. These sequences are cases where quantization noise produces bothersome temporal flickering artifacts. We found the significance of artifacts depend strongly on the local image content. Pseudo code is provided for generating these test stimuli in the hope that they lead to the development of future video compression algorithms which take advantage of this technique of improving quality by dampening temporal artifacts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the largest sources of compression in vision is the sensitivity and resolution reduction as a function of eccentricity. However, utilization of this visual property has been limited to system that directly measure the viewer's gaze position. We have applied visual eccentricity models to videophone compression applications without using eye tracking by combining the visual model with a face tracking algorithm. In lieu of a gaze-detector, we assume the gaze will be directed to the faces appearing in images. The incorporation of both resolution as well as sensitivity- based eccentricity models in a low bitrate video compression standard will be discussed. For videophone applications, the reduction in bitrate while retaining similar image quality is up to 50 percent. Problems arising from the improved temporal sensitivity of the periphery, despite its reduced spatial bandwidth, will be discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The growth of digital video has given rise to a need for computational methods for evaluating the visual quality of digital video. We have developed a new digital video quality metric, which we call DVQ. Here we provide a brief description of the metric, and give a preliminary report on its performance. DVQ accepts a pair of digital video sequences, and computes a measure of the magnitude of the visible difference between them. The metric is based on the Discrete Cosine Transform. It incorporates aspects of early visual processing, including light adaptation, luminance and chromatic channels, spatial and temporal filtering, spatial frequency channels, contrast masking, and probability summation. It also includes primitive dynamics of light adaptation and contrast masking. We have applied the metric to digital video sequences corrupted by various typical compression artifacts, and compared the results to quality ratings made by human observers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper I present a distortion metric for color video sequences. It is based on a contrast gain control model of the human visual system that incorporates spatial and temporal aspects of vision as well as color perception. The model achieves a close fit to contrast sensitivity and contrast masking data from several different psychophysical experiments for both luminance and color stimuli. The metric is used to assess the quality of MPEG-coded sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid growth of internet and digital video technology, perceptual assessment of video quality is becoming more important. An automatic assessment procedure based on some human vision model is desirable because subjective video quality measurement is costly and time- consuming. We propose a video quality metric called spatio- temporal CIELAB (ST-CIELAB). ST-CIELAB is an extension of the spatial CIELAB (S-CIELAB) image quality metric, that is itself an extension of the CIELAB color standard. The ST- CIELAB metric is based on a spatial, temporal, and chromatic model of human contrast sensitivity. Few such video quality metrics, accounting for human temporal as well as spatial sensitivities, have been proposed and they are based on multi-resolution, subband decomposition, models of human pattern sensitivity. ST-CIELAB is single-resolution which offers computational advantages over multi-resolution models. The ST-CIELAB rating is in units of JND, providing a meaningful interpretation of the rating. The ST-CIELAB metric was designed to fit published contrast sensitivity data, and ST-CIELAB is backward compatible with the well- established CIELAB standard, i.e., it reduces to CIELAB for uniform color fields. We tested the ST-CIELAB metric by performing psychophysics experiments with MPEG video sequences to see how well ST-CIELAB ratings correlate with subjective ratings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The present investigation compares performance of two objective video quality metrics in predicting the visual threshold for the detection of blocking impairments associated with MPEG-2 compression. The visibility thresholds for both saturated color and gray-scale targets are measured. The test material consists of image sequences in which either saturated color or gray-scale targets exhibiting blocking are varied in luminance contrast from -44 dB to -5 dB against a constant gray background. Stimulus presentation is by the 'method of limits' under International Telecommunications Union Rec. 500 conditions. Results find the detection of blocking impairments at Michelson contrast levels between -28 dB and -33 dB. This result is consistent with values reported by other investigators for luminance contrast detection thresholds. A small, but statistically significant difference is found between the detection threshold of saturated color patterns versus luma-only images. The results suggest, however, that blocking impairment detections controlled mainly by display luminance. Two objective metrics are applied to gray-scale image sequences, yielding measures of perceptible image blocking for each frame. A relatively simple blocking detector and a more elaborate discrete cosine transform error metric correlate well over the contrast range examined. Also, the two measures correlate highly with measured image contrast. Both objective metrics agree closely with visual threshold measurements, yielding threshold predictions of approximately -29 dB.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We study the effects of channel losses on block-based coders over packet-switched networks. Such coders rely on motion- compensated block prediction with intra refresh and/or intra-coded conditional replenishment for robust data compression. We consider a motion-compensated prediction coder that optimizes the coding mode selection based on channel loss characteristics. We also consider an intra- frame coder with conditional replenishment. We assume that a packet contains a single block and the associated prediction parameters and that macroblock losses are independent. We investigate various error concealment strategies and evaluate their effects on perceived video quality. One approach is to use spatial interpolation techniques to replace the lost packets. An alternative approach is to use temporal replacement. In the case of motion-compensated prediction, the motion vectors of neighboring blocks are used to obtain an approximation of the lost block from the previous frame. For the intra-frame coder with conditional replenishment, a lost packet is replaced by the block at the same location in the previous frame. Our results indicate that temporal replacement result in lower perceptual distortion than spatial interpolation. We also found that, when the packet containing the residual error is received but the motion-compensated prediction depends on blocks that have been lost, the lowest distortion result when the residual error is added only to the part of the signal that has not sustained any losses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many models of image quality have been developed to predict the visibility of differences between pairs of still images. Various methods have been suggested for combining predictions generated for individual frames of a video sequence. To explore this issue, we have compared the objective quality predictions of three temporal pooling methods against viewer ratings of perceived video quality. The subjective data were obtained from a large group of viewers in an experiment performed by Cable Television Laboratories, Inc., that utilized extended-duration video sequences containing material of significant complexity. Objective quality measures obtained for each field of the sequences were pooled using three simple methods: histogram, Minkowski summation and exponentially-weighted Minkowski summation. The results demonstrated that, for sequence durations on the order of 1 min, the three pooling methods have similar ability to predict overall perceived video quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual difference predictors accept two imags as input, performed some processing, and produce a single image as output. The output image represent a map of the visibility of differences between the two input images. In practical applications, input images are received at whatever resolution is convenient for the application, and viewing distances are as appropriate for the task at hand. To match retinal sampling rates, the images are typically filtered and down-sampled. Given that the typically employed optical point spread functions do not completely remove high frequency information, we ask whether, and to what extent high frequency leakage leads to aliasing. In this paper we explore the amount of aliasing possible in our implementation of the Sarnoff Visual Discrimination Model, and describe a modification that uses a sampling grid similar to the Poisson disk distribution. We then compare the result of this sampling to that of the unmodified model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a perceptual measure for still image compression system. Considering the fact that the conventional PSNR cannot sufficiently reflect the result of subjective assessment, other quality measure have been considered to design the variable bit-rate coders. Indeed, there is a growing interest for perceptual quality measure. Some works have been carried out in the field of still picture quality evaluation while trying to introduce some properties of the human visual system. In the recent literature there are roughly three properties that are identified as being useful. The best known, and generally most widely used properly, is the modulation transfer function of the human visual system. The other tow properties can be described as luminance masking and texture masking. A large number of image quality measures of this kind have been developed, with different degrees of success. In previous works, we provided a rigorous evaluation of metrics which take into account artifacts generated by compression method like JPEG. The results show that these metrics are highly correlated with the subjective quality grading but also depend on the complexity of the images under study. Then, we propose a new perceptual metric for still image compression based on multiresolution decomposition. It allows characterize image texture, better takes into account masking effect and don't depend on compression method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In two experiments, dissimilarity data and numerical scaling data were obtained to determine the underlying attributes of image quality in baseline sequential JPEG coded imags. Although several distortions were perceived, i.e., blockiness, ringing and blur, the subjective data for all attributes where highly correlated, so that image quality could approximately be described by one independent attribute. We therefore proceeded by developing an instrumental measure for one of these distortions, i.e., blockiness. In this paper a single-ended blockiness measure is proposed, i.e., one that uses only the coded image. Our approach is therefore fundamentally different from most image quality models that use both the original and the degraded image. The measure is based on detecting the low- amplitude edges that result from blocking and estimating the amplitudes. Because of the approximate 1D of the underlying psychological space, the proposed blockiness measure also predicts the image quality of sequential baseline coded JPEG images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although the concept of image dissimilarity is very familiar in the context of instrumental measure for image quality, it is fairly uncommon to use it as an experimental paradigm. Most instrumental measures relate image quality to some distance, such as the root-mean-squared error (RMSE), between the original and the processed image, such that image dissimilarity arises naturally in this context. Dissimilarity can however also be judged consistently by subjects. In this paper we compare the performance of a number of representative instrumental models for image dissimilarity with respect to their ability to predict both image dissimilarity and image quality, as perceived by human subjects. Two sets of experimental data, one for images degraded by noise and blur, and one for JPEG-coded images, are used in the comparison. In none of the examined cases could a clear advantage of complicated distance metrics be demonstrated over simple measures such as RMSE.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Several discriminability measures were correlated with reading sped over a range of screen backgrounds. Reading speed was measured using a search task in which observers tried to find one of three works in a short paragraph of black text. There were four background patterns combined with three colors at two intensities. The text contrast had a small positive correlation with speed. Background RMS contrast showed a stronger, negative correlation. Text energy in the spatial frequency bands corresponding to lines and letters also showed strong relationships. A general procedure for constructing a masking index from an image discrimination model is described and used to generate two examples indices: a global masking index, based on a single filter model combining text contrast and background RMS contrast, and a spatial-frequency-selective masking index. These indices did not lead to better correlations than those of the RMS measures alone, but they should lead to better correlations when there are larger variations in text contrast and masking patterns.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present the results of experiments in which we investigate how the physical gamma maps onto the perceived gamma, and how the subjective quality of a printed image may be influenced by the grey scale mapping that is used to print the image. The grey scale is altered by changing the gamma, the exponent relating the input to the output luminance. Five natural images, widely different in content, have been printed with 16 different values of gamma. First the printer gamma is measured. In general, this gamma differs from 1. In the printer a gamma transform is performed as well. The selected gamma values are corrected using the directly gamma function. The aim of the first experiment is to find out what the relation is between physical gamma and perceived gamma. The stimulus response function derived from our data is found to be quite linear over the range investigated. The aim of the second experiment is to test the subjective preference for a particular gamma. We have found that these gammas are not independent of image content. The mean preference averaged over the subjects for the various images ranges form 1.6 to 2.2.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Previously we reported a study into the effect of stereoscopic filing parameters on perceived quality, naturalness and eye strain. In a pilot experiment, using 25 seconds exposure duration, a marked shift occurred between naturalness and quality ratings as a function of camera separation. This shift was less clearly present in the main experiment, in which we used an exposure duration of 5 seconds. This suggests a potential effect of exposure duration on observer appreciation of stereoscopic images. To further investigate this, we performed an experiment using exposure durations of both 5 and 10 seconds. For these durations, twelve observers rated naturalness of depth and quality of depth for stereoscopic still image varying in camera separation, convergence distance and focal length. The results showed no significant main effect of exposure duration. A small yet significant shift between naturalness and quality was found for both duration conditions. This result replicated earlier findings, indicating that this is a reliable effect, albeit content-dependent. A second experiment was performed with exposure durations ranging from 1 to 15 seconds. The result of this experiment showed a small yet significant effect of exposure duration. Whereas longer exposure durations do not have a negative impact on the appreciative scores of optimally reproduced stereoscopic images, observers do give lower judgments to monoscopic images and stereoscopic images with unnatural disparity values as exposure duration increases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Perception and Performance in Real and Virtual Environments
A person has a feeling of 'being in' an image when watching a screen with a wide visual field, and his somatic sensation and sense of direction are affected by the image. Making use of this effect, we investigated the sensation of reality in images based on testing the sense of direction. In our studies, we examined the relationship between information as it is perceived by the human visual and vestibular systems. In our experiment, we used images from a horizontally rotated cameras for the visual information, and displayed these imags through a head-mounted display. Also, we set up an angular acceleration using a turntable as information to be perceived by the vestibular system. The direction of rotated cameras and the turntable were separately controlled, and the directions were varied. We found that the human visual system dominates in the case of stimuli which are small which are small for the vestibular system, and overestimates in the case of stimuli which are significant for the vestibular system. The results showed that the visual system is important for the perception of the sensation of reality, which is enhanced if stimuli to the various sensory systems are in correspondence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We differentiate a cognitive branch of the visual system from a sensorimotor branch with the Roelofs, effect, a perception that a target's position is biased in the direction opposite the offset of a surrounding frame. Previous research left the possibility that accurate motor responses to a perceptually mislocated target might be mediated by oculomotor fixation of the target. Subjects performed judging and jabbing tasks to probe cognitive and motor system representations respectively while engaging in a saccadic task that prevented fixation of the target. Three experiments with an oculomotor distractor task evaluated judging and jabbing responses to the target. Three experiments did not show a Roelofs effect in spite of the prevention of fixation on the target. Motor response did not show a Roelofs effect in spite of the prevention of fixation on the target. Further, a decision about which of two targets to jab does not result in cognitive-system information affect motor response. The Roelofs effect was presented, however, in judging trials that also involved the saccadic task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual Telepresence system which utilize virtual reality style helmet mounted displays have a number of limitations. The geometry of the camera positions and of the display is fixed and is most suitable only for viewing elements of a scene at a particular distance. In such a system, the operator's ability to gaze around without use of head movement is severely limited. A trade off must be made between a poor viewing resolution or a narrow width of viewing field. To address these limitations a prototype system where the geometry of the displays and cameras is dynamically controlled by the eye movement of the operator has been developed. This paper explores the reasons why is necessary to actively adjust both the display system and the cameras and furthermore justifies the use of mechanical adjustment of the displays as an alternative to adjustment by electronic or image processing methods. The electronic and mechanical design is described including optical arrangements and control algorithms, An assessment of the performance of the system against a fixed camera/display system when operators are assigned basic tasks involving depth and distance/size perception. The sensitivity to variations in transient performance of the display and camera vergence is also assessed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Theodore T. Blackmon, Laurent Ngyuen, Charles Frederick Neveu, Daryl Rasmussen, Eric Zbinden, Mark Maimone, Larry Henry Matthies, Scott M. Thayer, James Teza, et al.
Initiated by the Department of Energy's International Nuclear Safety Program, an effort in underway to deliver and employ a telerobotic diagnostic system for structural evaluation and monitoring within the Chornobyl Unit-4 shelter. A mobile robot, named Pioneer, will enter the damaged Chornobyl structure and employ devices to measure radiation, temperature and humidity; acquire core samples of concrete structures for subsequent engineering analysis; and make photo-realistic 3D maps of the building interior. This paper details the latter element, dubbed 'C-Map', the Chornobyl Mapping System. C-Map consists of an automated 3D modeling system using stereo computer vision along with an interactive, virtual reality software program to acquire and analyze the photo-realistic 3D maps of the damaged building interior.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
New paradigms such as 'affective computing' and user-based research are extending the realm of facets traditionally addressed in IR systems. This paper builds on previous research reported to the electronic imaging community concerning the need to provide access to more abstract attributes of images than those currently amenable to a variety of content-based and text-based indexing techniques. Empirical research suggest that, for visual materials, in addition to standard bibliographic data and broad subject, and in addition to such visually perceptual attributes such as color, texture, shape, and position or focal point, additional access points such as themes, abstract concepts, emotions, stories, and 'people-related' information such as social status would be useful in image retrieval. More recent research demonstrates that similar results are also obtained with 'fine arts' images, which generally have no access provided for these types of attributes. Current efforts to match image attributes as revealed in empirical research with those addressed both in current textural and content-based indexing systems are discussed, as well as the need for new representations for image attributes and for collaboration among diverse communities of researchers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We are developing a system to implement gestural drawing in an immersive 3D environment. We present a virtual artist who draws expressive forms in virtual space. In the art world, the term 'gestural' commonly refers to mark making that drives from the richness of movement of the artist. This focus on the character of motion is much like a similar focus on follow-through in athletic activity. Accordingly, we base the appearance of the rendered image on the body language of the artists, hence the acronym BLUI. BLUI is developed on the ImmersaDESK, an immersive virtual reality environment where the artists wears head-tracking goggles and uses a wand. Information form video, wand, and head tracker is used to generate a virtual artist, whose brush tracks with the wand.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposed a systematic approach for exploring the interactions of aesthetic properties and design variables, by integrating knowledge from other fields such as philosophy and arts. Commonly-accepted aesthetic properties and language terms used for evaluation and criticism are first discussed and a common set of nine principles for achieving aesthetic products in a number of creative disciplines is identified. We then analyze the way these principles influence product characteristics and extract concrete and computable properties of products that may be varied to induce different aesthetic judgments and responses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is well known that colors indicated on two color devices are different from each other. Gamut compression technique has been required to compress colors displayed onto a monitor with the large gamut to colors displayed on another monitor with the smaller gamut. Although minimizing color different method is commonly used, this method transforms a color ut of smaller gamut to a target color which is on the surface of the smaller gamut. In this method, a target color is set to the intersection of the compression vector toward certain point of L*50 in CIELAB color space and the surface of the gamut. The print is used uniformly for all hue. Therefore we investigated a gamut compression method modified Centroid vector method to adapt the target point of compression vectors for basic 6 hue categories based on subjective experiment using two CRT monitors. We carried out subjective experiments and determined the compression vectors. We compressed the color out of gamut with weights for interpolation calculated in each 6 hue categories. We applied this method to natural images and chart. The result showed this method was effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have measured the contrast detection thresholds for small bandpass targets embedded in digitized monochrome photographs of natural scenes. The targets are used to probe the properties of watermarking patterns, which might be embedded in a photography to state copyright or authenticity, while remaining invisible to a human observer. Thresholds were measured for targets embedded in different parts of the photographs in order to determine where in a photographs it would be most suitable to hide a watermarking pattern. Thresholds were also compared when the photographs were bandpass filtered or notch filtered in order to determine how the localized spectral energy in the photograph affected the visibility of a potential watermarking pattern. We also studied the visibility of targets embedded in synthetic pictures, whose spectral amplitude was similar to that of natural scenes. The test targets were most easily visible when embedded in parts of photographs where the luminance was relatively uniform, and they were especially easy to see where the average luminance was low. This was explicable on a simple model of contrast encoding in the human visual system. The targets were much harder to see when embedded in contrast-rich parts of the digitized photographs. Indeed, the thresholds were evaluated more than the simple human model predicted: the spatially- localized contrast energy in the photograph masked the test target effectively. The experiments with notch-filtered photographs produced surprising results that were not predicted at all by the human model. Even when the spectral energy was removed from the photograph in the band occupied by the test target, there was still substantial masking. This implies considerable masking between visual primitives encoding different spectral bands. It also implies that watermarking technology might be facilitated, since any contrast energy may hide a watermarking target regardless of their respective spectral content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many image discrimination models are available for static imags. However, in many applications temporal information is important, so image fidelity metrics for image sequences are needed as well. Ahumada et al presented a discrimination model for image sequences. It is unusually in that it does not decompose the images into multiple frequency and orientation channels. This helps make it computationally inexpensive. It was evaluated for predicting psychophysical experiments measuring contrast sensitivity and temporal masking. The results were promising. In this paper we investigate the performance of the above-mentioned model of a practical application - surveillance with IR imagery. Model evaluation is based on two-alternative force choice experiments, using a staircase procedure to control signal amplitude. The observer is presented with two one-second- duration IR-image sequences, one of which has an added target signal. The observer's task is to guess which sequence contained the target. While the target is stationary in the image center, the background moves in one direction, simulating a tracking station in which the observer has locked on to the target. The results show that the model qualitatively, in four out of five cases, have the desired behavior.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a method for reconstructing multidimensional scaling (MDS) as a biologically plausible algorithm for storing object data. To do so, we must make modifications in the definitions of stress, and in locating the process. We make these modifications by appealing to physical definitions of stress and deformation. In this system, classical MDS becomes the system in which these are modeled on perfectly elastic deformation, and a variety of systems can be created or trained which are, by contrast, viscoelastic. The resultant model is useful in applications in which the relationship between stress and the underlying metric used for MDS is complicated by local phenomena, or in which these quantities need to be modeled as learned or changing attributes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Bootstrapping provides a novel approach to training a neural network to estimate the chromaticity of the illuminant in a scene given image data alone. For initial training, the network requires feedback about the accuracy of the network's current results. In the case of a network for color constancy, this feedback is the chromaticity of the incident scene illumination. In the past, prefect feedback has been used, but in the bootstrapping method feedback with a considerable degree of random error can be used to train the network instead. In particular, the grayworld algorithm, which only provides modest color constancy performance, is used to train a neural network which in the end performs better than the grayworld algorithm used to train it.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While it is recognized that images are described through color, texture and shapes of objects in the scene, the general image understanding is still very difficult. Thus, to perform an image retrieval in a human-like manner one has to choose a specific domain, understand how users achieve similarity within that domain and then build a system that duplicates human performance. Since color and texture are fundamental aspects of human perception we propose a set of techniques for retrieval of color patterns. To determine how humans judge similarity of color patterns we performed a subjective study. Based on the result of the study five most relevant visual categories for the perception of pattern similarity were identified. We also determined the hierarchy of rules governing the use of these categories. Based on these results we designed a system which accepts one or more texture images as input, and depending on the query, produces a set of choices that follow human behavior in pattern matching. Processing steps in our model follow those of the human visual system, resulting in perceptually based features and distance measures. As expected, search results closely correlate wit human choices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In computer vision and image processing, we often perform different processing on 'objects' than on 'texture'. In order to do this, we must have a way of localizing textured regions of an image. For this purpose, we suggest a working definition of texture: Texture is a substance that is more compactly represented by its statistics than by specifying the configuration of its parts. Texture, by this definition, is stuff that seems to belong to the local statistics. Outliers, on the other hand, seem to deviate from the local statistics, and tend to draw our attention, or 'pop out'. This definition suggests that to find texture we first extract certain basic features and compute their local statistics. Then we compute a measure of saliency, or degree to which each portion of the image seems to be an outlier to the local feature distribution, and label as texture the regions with low saliency. We present a method, based upon this idea, for labeling points in natural scenes as belonging to texture regions. This method is based upon recent psychophysics results on processing of texture and popout.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Feature detectors in the early preattentive stage of the Human Visual System (HVS) are believed to cause region of the viewing field to be identified as perceptually salient, attracting the attention of the viewer. It is anticipated that this characteristic of the HVS can be incorporated into a feature based fuzzy scene decomposition model, which will assist an image rendering system in the allocation of the highest levels of detail to the most conspicuous objects. Efficiency gains should occur, with minimal loss of perceptual image quality. This paper describes the early stages of the development of this fuzzy model, for a small subset of commonly accepted visual features: color, size, location, edges and depth cues. Previous researchers have used arbitrary feature relationship models in image processing systems, with some success. Our aim is to improve on these models by integrating present knowledge of visual feature relationships, with experimental results of our own, and to apply this model to the area of image synthesis. Preliminary results from experiments with size features are presented, along with planned experimentation for other visual features. This work will have applications in the areas of scientific visualization, vision simulation and entertainment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Bottom-up or saliency-based visual attention allows primates to detect non-specific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psychophysical studies, describes attention as a rapidly shiftable 'spotlight'. The model described here reproduces the attentional scanpaths of this spotlight: Simple multi-scale 'feature maps' detect local spatial discontinuities in intensity, color, orientation or optical flow, and are combined into a unique 'master' or 'saliency' map. the saliency map is sequentially scanned, in order of decreasing saliency, by the focus of attention. We study the problem of combining feature maps, from different visual modalities and with unrelated dynamic ranges, into a unique saliency map. Four combination strategies are compared using three databases of natural color images: (1) Simple normalized summation, (2) linear combination with learned weights, (3) global non-linear normalization followed by summation, and (4) local non-linear competition between salient locations. Performance was measured as the number of false detections before the most salient target was found. Strategy (1) always yielded poorest performance and (2) best performance, with a 3- to 8-fold improvement in time to find a salient target. However, (2) yielded specialized systems with poor generations. Interestingly, strategy (4) and its simplified, computationally efficient approximation (3) yielded significantly better performance than (1), with up to 4-fold improvement, while preserving generality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Strong evidence has shown that visual processing based on selective attention is both data- and knowledge-driven. However, most of the previous work mainly focused on the former. We propose in this paper a new selective attention visual computing model based on both of them. The novelty lies in: (1) A structure variable non-uniform sampling method is proposed to separate visual computing into foveal and peripheral channel. (2) A combination of the bottom-up and the top-down selective attention mechanism based on a two-layered pyramid is proposed. The data-driven bottom-up selective attention includes the sequential extraction of feature maps, conspicuity maps, and interesting maps based on the multi-channel filtering and relaxation process. The knowledge driven top-down selective attention is based on distributed associative memory mapping. (3) A movement control mechanism is also proposed in this paper. Perfectly good experiment results on artificial and real times demonstrate the validity of our model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The scanpath theory proposed that an internal spatial- cognitive model controls perception and the active looking eye movements, EMs, of the scanpath sequence. Evidence for this came from new quantitative methods, experiments with ambiguous figures and visual imagery and from MRI studies, all on cooperating human subjects. Besides recording EMs, we introduce other experimental techniques wherein the subject must depend upon memory bindings as in visual imagery, but may call upon other motor behaviors than EMs to read-out the remembered patterns. How is the internal model distributed and operationally assembled. The concept of binding speaks to the assigning of values for the model and its execution in various parts of the brain. Current neurological information helps to localize different aspects of the spatial-cognitive model in the brain. We suppose that there are several levels of 'binding' -- semantic or symbolic binding, structural binding for the spatial locations of the regions-of-interest and sequential binding for the dynamic execution program that yields the sequence of EMs. Our aim is to dissect out respective contributions of these different forms of binding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An eye movements sequence, or scanpath, during viewing of a stationary stimulus has been described as a set of fixations onto regions-of-interest, ROIs, and the saccades or transitions between them. Such scanpaths have high similarity for the same subject and stimulus both in the spatial loci of the ROIs and their sequence; scanpaths also take place during recollection of a previously viewed stimulus, suggesting that they play a similar role in visual memory and recall.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There is, at present, a critical need within image retrieval research for an image testbed which would enable the objective evaluation of different content-based search engines, indexing and metadata schemes, and search heuristics, as well as research and evaluation in image- based knowledge structures and system architectures, user's needs in image retrieval and the cognitive processes involved in image searching. This paper discusses a pilot project specifying and establishing a prototype testbed for the evaluation of image retrieval techniques. A feasibility study is underway focusing on the development of a large set of standardized test images accessible through a web interface, and researchers in the field are being surveyed for input. Areas being addressed in the feasibility study include technical specifications as well as content issues such as: which specific image domains to include; the useful proportion of imags belonging to specific domains to images belonging to a general 'world' domain; types of image attributes and baseline and 'advanced' levels of image description needed, and research needs to be accommodated, as well as development of a standardized set of test queries and the establishment of methods for 'truthing' the database and test queries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Models that predict human performance on narrow classes of visual stimuli abound in the vision science literature. However, the vision and the applied imaging communities need robust general-purpose, rather than narrow, computational human visual system models to evaluate image fidelity and quality and ultimately improve imaging algorithms. Psychophysical measure of image imaging algorithms. Psychophysical measures of image quality are too costly and time consuming to gather to evaluate the impact each algorithm modification might have on image quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed a focused-procedure based upon a collection of image processing algorithms that serve to identify regions-of-interest (ROIs), over a digital image. To loci of these ROIs are quantitatively compared with ROIs identified by human eye fixations or glimpses while subjects were looking at the same digital images. The focused- procedure is applied to adjust and adapt the compression ratio over a digital image: - high resolution and poor compression for ROIs; low resolution and strong compression for the major expanse of the entire image. In this way, an overall high compression ratio can be achieved, while at the same time preserving, important visual information within particularly relevant regions of the image. We have bundled the focused-procedures with JPEG, so that the JPEG version allows the result of the compression to be formatted into a file compatible for standard JPEG decoding. Thus, once the image has been compressed, it can be read without difficulty.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A vision-based tele-operating system has been designed and simulated in our lab with model-based supervisory control and model-based, top-down image processing, IP, for robot pose recovery. A secondary global positioning system, GPS, is used for backup for situations when IP is not expected to work. These modules have been integrated to achieve robust performance of the system with reduced human attendance. Robust top-down IP at near-real time feedback is achieved by a pose extraction algorithm that is based on the Scanpath Theory of human eye movement. Extensive model-directed pre- filtering, low-level image processing and post-filtering of visual images, as well as model-directed data fusion are used to ensure consistency between internal model and external environment. Simulation of the system under a wide range of image and plant noise was performed to verify the stability of the system, as well as to reflect the influence and mode of failure of the system with such noise injection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we introduce a new method for determining the relationship between signal spectra and camera RGB which is required for many applications in color. We work with the standard camera model, which assumes that the response is linear. We also provide an example of how the fitting procedure can be augmented to include fitting for a previously estimated non-linearity. The basic idea of our method is to minimize squared error subject to linear constraints, which enforce positivity and range of the result. It is also possible to constrain the smoothness, but we have found that it is better to add a regularization expression to the objective function to promote smoothness. With this method, smoothness and error can be traded against each other without being restricted by arbitrary bounds. The method is easily implemented as it is an example of a quadratic programming problem, for which there are many software solutions available. In this paper we provide the results using this method and others to calibrate a Sony DXC-930 CCD color video camera. We find that the method gives low error, while delivering sensors which are smooth and physically realizable. Thus we find the method superior to methods which ignore any of these considerations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the important functions of DCPs on which the image quality is largely depends is color matrix calculation from the CCD's optical filter output. An algorithm which can go closer to the optimal solution for generation of color components data from CCD filter output is proposed in this paper. As far as the image quality is concerned, the best way is acquiring compete color data per pixel without any noise, and without any using an optical low pass filter. The most important role of the optical low pass filter is to scatter the ray penetrating the lens system as well as to cut below the frequency of the IR light. Basically, the proposed algorithm is non-filtered color reproduction flow. First, the proposed algorithm recovers the error made during the path of optic system in order to get better interpolation result. Then, the different procedures are adopted for green component and red or blue component interpolation based on the edge information extracted from the green image. The last procedure is edge preserving smoothing, where image data with high quality can be obtained by preserving the edge segments while smoothing the overall image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Directional filters are not normally used as pre-filters for optical flow estimation because orientation selectivity tends to increase the aperture problem. Despite this fact, here we apply a subband decomposition using directional spatio-temporal filters at different resolution to discriminate multiple motions at the same location. We first obtain multiple estimates of the velocity by applying the classic gradient constraint to the output of each filter. Spatio-temporal gradients of GD2 channel response are easily obtained as linear combinations of the set of 10 separable GD3 channel responses, which constitutes a multipurpose scheme for visual representation of image sequences. Then, we obtain an overdetermined linear system by imposing local constant velocity. This system is solved by least-squares yielding an estimate of the velocity and its covariance matrix. After segmenting the resulting 6 X 3 velocity estimates we combine them using Bayesian probability rules. Segmentation maintains the ability to represent multiple motions, while the combination of estimates reduces the aperture problem. Results for synthetic and real sequences are highly satisfactory. Mean errors in complex standard sequences are below those provided by most published methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual field evaluation is important in the detection, diagnosis and assessment of ophthalmologic and dysfunctions. Kohn and Clyenes were the first to appreciate that the pupil react to changes in the wavelength composition of the stimulus independently of luminance. Also, we have shown that the effective pupillary area changes at varying perimetric angles. Based on these studies a new technique is described to quantify pupillary response during a visual field examination, using a signal-image processing system associated with chromatic LEDs stimuli.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wavelet transforms are efficient tools for texture analysis and classification. Separable techniques are classically used but present several drawbacks. First, diagonal coefficients contain poor information. Second, the other coefficients contain useful information only if the texture is oriented in the vertical and horizontal directions. So an approach of texture analysis by non-separable transform is proposed. An improved interscale resolution is allowed by the quincunx scheme and this analysis leads to only one detail image where no particular orientation is favored. New orthogonal isotropic filters for the decomposition are constructed by applying McClellan transform on one dimension B-spline filters. The obtained wavelet function have better isotropic and frequency properties than those previously proposed by Feauveau. Since IIR filters are obtained, an integration in Fourier domain of the whole operations of the transform is proposed. A texture analysis is performed on wavelet details coefficients. Simple parameters are calculated from each scale. Finally, the evolution over scales of the parameters is obtained and this multiscale parameter is used to characterize the different textures. An application of this method is posed with the analysis of human cells. The aim is to distinguish states of evolution. As no information is provided by monoscale classical methods on these images, the proposed process allows to identify several states. In this process a reference curve is constructed for each states calculated from the multiscale variance of known images. When a new image is analyzed, a new evolution curve is calculated and a measure of the distance with the references is done. This technique is more efficient than classical ones as multiscale information is used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes the design and some preliminary result of a visual study that was conducted over the WWW. The subjects connected to our server over the Internet, and their own computers were controlled with Java software to produce the visual stimuli. By these means we were able to access a large population of subjects at very low cost, and we were able to conduct a large-scale study in a small amount of time. We developed tools and techniques that allowed some degree of calibration of the display and the viewing conditions, so the result obtained from the different subjects could be analyzed. We found that we could get good estimates of gamma values and pixel sizes of the subjects' displays. However, we also encountered some problems that may limit the types of experiments that can be conducted over the WWW with the present technology. We found that we could not consistently control the presentation time of the stimuli due to inconsistencies between implementations of Java on different platforms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The perception of an image by a human observer is usually modeled as a parallel process in which all parts of the image are treated more or less equivalently, but in reality the analysis of scenes is a highly selective procedure, in which only a small subset of image locations is processed by the precise and efficient neural machinery of foveal vision. To understand the principles behind this selection of the 'informative' regions of images we have developed a hybrid system, which consists of a combination of a knowledge-based reasoning system wit a low-level preprocessing by linear and nonlinear neural operators. This hybrid system is intended as a first step towards a compete model of the sensorimotor system of saccadic scene analysis. In the analysis of a scene, the system calculates in each step which eye movement has to be made to reach a maximum of information about the scene. The possible information gain is calculated by means of a parallel strategy which is suitable for adaptive reasoning. The output of the system is a fixation sequence, and finally, a hypothesis about the scene.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.