PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Image fidelity is the subset of overall image quality that specifically addresses the visual equivalence of two images. This paper describes an algorithm for determining whether the goal of image fidelity is met as a function of display parameters and viewing conditions. Using a digital image processing approach, this algorithm is intended for the design and analysis of image processing algorithms, imaging systems, and imaging media. The visual model, which is the central component of the algorithm, is comprised of three parts: an amplitude nonlinearity, a contrast sensitivity function, and a hierarchy of detection mechanisms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The urge to compress the amount of information needed to represent digitized images while preserving perceptual image quality has led to a plethora of image-coding algorithms. At high data compression ratios, these algorithms usually introduce several coding artifacts, each impairing image quality to a greater or lesser extent. These impairments often occur simultaneously. For the evaluation of image-coding algorithms, it is important to find out how these impairments combine and how this can be described. The objective of the present study is to show that Minkowski-metrics can be used as a combination rule for small impairments like those usually encountered in digitally coded images. To this end, an experiment has been conducted in which subjects assessed the perceptual quality of scale-space-coded color images comprising three kinds of impairment, viz., 'unsharpness', 'phantoms' (dark/bright patches within bright/dark homogeneous regions) and 'color desaturation'. The results show an accumulation of these impairments that is efficiently described by a Minkowski-metric with an exponent of about two. The latter suggests that digital-image-coding impairments may be represented by a set of orthogonal vectors along the axes of a multidimensional Euclidean space. An extension of Minkowski-metrics is presented to generalize the proposed combination rule to large impairments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In digital imaging systems it is often necessary to reduce the bit precision of an image due to limitations in transmission or storage. The 8 bit storage of a 12 bit medical image is a good example. Generally, the input images are linearly quantized, and the goal is to find the fixed, digital mapping that preserves the highest image quality at the reduced bit precision. Since the response of the visual system to brightness differences is nonlinear, the optimal mapping is nonlinear. The traditional approach is to use one of the commonly accepted models of the visual system, e.g. a logarithm or power-law, to construct a Look-Up-Table (LUT) that performs the digital mapping. This paper will demonstrate that this approach is visually suboptimal for finite input precision, even if the visual model is perfect. A better method for constructing the digital mapping or LUT will be derived by posing the problem as a combinational optimization problem of taking N bits from M bits, where N is less than M, such that a visual distortion metric is minimized. Computer generated images will be used to demonstrate the method in a 12 bit to 8 bit application, and a 6 bit to 2 bit example will be included to illustrate its convergence characteristics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We measured the overall luminance, the luminance profile of individual pixels, and the interactions among neighboring pixels of a grayscale (i.e., non-color) CRT, using a two- dimensional CCD camera. We find that the luminance of a pixel depends on the luminance of the two preceding pixels in the raster, and that the interaction results in superadditivity. Furthermore, the luminance of a pixel depends on the proportion of the screen illuminated. Two consequences of these nonlinearities are that average luminance (as in a halftone image) cannot be predicted from the linear summation of individual pixel spread functions, and that inverting the polarity of a display does not simply invert the luminance profiles. These interactions must be taken into account wherever the luminance profile of displayed stimuli is important.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The contrast sensitivity of the human eye and its dependence on luminance and display size is described on the basis of internal noise in the visual system. With the addition of a global description of the optical MTF of the eye, a complete physical model is obtained for the spatial contrast sensitivity function. The results of this model are compared with measurements published by various authors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
International standards require that VDU products should be flicker free to at least 90% of the user population. There have been many methods proposed to achieve this objective. However, it is extremely difficult to validate or compare data between the various methods because of the nature of subjective variability and the lack of a common quantifiable reference. This paper describes an objective flicker measurement technique that provides such a reference and also two subjective experiments designed to validate the measurement accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Temporal sampling artifacts may cause jitter in moving video images. Most often, these artifacts are attributed to aliasing in the spatiotemporal spectrum of the image. However, the spatiotemporal spectrum is only a mathematical representation, so the practical value of this approach depends on a number of assumptions about human vision. Most importantly, there must be a linear summation of the individual components. This assumption was tested with both quantitative and informal experiments. Initial results showed that the spatiotemporal model was accurate for simple sine-wave gratings and slightly overestimated aliasing for more complex gratings. However, it was possible to create compound gratings where the model grossly overestimated aliasing. Results showed that, in general, the assumption of linear summation is unwarranted and that jitter in complex images cannot always be predicted from their constituent components. One practical implication of this result is that image quality testing should include natural images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Using Human Visual Models to Develop and Evaluate Halftoning Algorithms
In this work, we propose a new method to generate halftone images which are visually optimized for the display device. The algorithm searches for a binary array of pixel values that minimizes the difference between the perceived displayed continuous-tone image and the perceived displayed halftone image. The algorithm is based on the direct binary search (DBS) heuristic. Since the algorithm is iterative, it is computationally intensive. This limits the complexity of the visual model that can be used. It also impacts the choice of the metric used to measure distortion between two perceived images. In particular, we use a linear, shift- invariant model with a point spread function based on measurement of contrast sensitivity as a function of spatial frequency. The non-ideal spot shape rendered by the output devices can also have a major effect on the displayed halftone image. This source of non-ideality is explicitly accounted for in our model for the display device. By recursively computing the change in perceived mean-squared error due to a change in the value of a binary pixel, we achieve a substantial reduction in computational complexity. The effect of a trial change may be evaluated with only table lookups and a few additions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When models of human vision adequately measure the relative quality of candidate halftonings of an image, the problem of halftoning the image becomes equivalent to the search problem of finding a halftone that optimizes the quality metric. Because of the vast number of possible halftones, and the complexity of image quality measures, this principled approach has usually been put aside in favor of fast algorithms that seem to perform well. We find that the principled approach can lead to a range of useful halftoning algorithms, as we trade off speed for quality by varying the complexity of the quality measure and the thoroughness of the search. High quality halftones can be obtained reasonably quickly, for example, by using as a measure the vector length of the error image filtered by a contrast sensitivity function, and, as the search procedure the sequential adjustment of individual pixels to improve the quality measure. If computational resources permit, simulated annealing can find nearly optimal solutions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a new class of dithering algorithms for black and white (b/w) images. The basic idea behind our technique is to divide the image into small blocks and minimize the distortion between the original continuous tone image and its low pass filtered halftone. This corresponds to a quadratic programming problem with linear constraints which is solved via standard optimization techniques. Examples of b/w halftone images using our technique are compared to images obtained via error diffusion algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An optimization approach to the design of dither matrices used in the dispersed-dot digital halftoning method is described. Digital halftoning techniques are used to render continuous- tone images on high resolution binary display devices. An important class of digital halftoning techniques are the ones that convert a gray level in an image into a binary pattern. The design of such patterns is important to reduce visible artifacts so as to render the image with higher fidelity. An approach to design based on shaping the spectrum of the dithering signal according to a model of the human visual system is presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method of dithering which attempts to exploit the inevitable textures generated by all dithering schemes. We concentrate on rendering continuous tone (monochrome or color) image on a CRT display with a small number (on the order of 16 - 256) of distinct colors. Monochrome (especially bi-level) dithering techniques are well studied. We have previously demonstrated that texture introduced by the dithering process can significantly affect the appearance of the image. We then developed a scheme by which the user had some control over these texture effects. The primary tradeoff was between very fine grained textures which depend critically on the local gray level and relatively coarser, more obvious, textures which appear uniform across the entire image. In this paper we try to actively exploit the texture effects to enhance the appearance of the rendered image. The key idea is to use our previously described hybrid method (which combines ordered-dither with error diffusion) but to choose (locally) different ordered-dither matrices based on measured properties of the original image (e.g., the local gradient). We also show how to use anisotropic error diffusion to generate similar texture effects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Error diffusion is a powerful means to improve the subjective quality of a quantized image by shaping the spectrum of the display error. Considering an image in raster ordering, this is done by adding a weighted sum of previous quantization errors to the current pixel before quantization. These weights form an error diffusion filter. In this paper a method is proposed to find an optimized error diffusion filter for image display applications. The design is based on the lowpass characteristic of the contrast sensitivity of the human visual system. The filter is chosen so that a cascade of the quantization system and the observer's visual modulation transfer function yields a whitened spectrum of error. It is shown in this paper that the optimal error diffusion filter corresponds to a linear prediction filter of the human visual transfer function. A first order linear filter for an underlying non-separable vision model is examined. The resulting images contain mostly high frequency components of the display error, which are less noticeable for the viewer. This corresponds well to previously published results about the visibility of halftoning patterns. An informal comparison with other error diffusion algorithms shows less artificial contouring and increased image quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A least-squares model-based approach to digital halftoning is proposed. It exploits both a printer model and a model for visual perception. It attempts to produce an 'optimal' halftoned reproduction, by minimizing the squared error between the response of the cascade of the printer and visual models to the binary image and the response of the visual model to the original gray-scale image. Conventional methods, such as clustered ordered dither, use the properties of the eye only implicitly, and resist printer distortions at the expense of spatial and gray-scale resolution. In previous work we showed that our printer model can be used to modify error diffusion to account for printer distortions. The modified error diffusion algorithm has better spatial and gray-scale resolution than conventional techniques, but produces some well known artifacts and asymmetries because it does not make use of an explicit eye model. Least-squares model-based halftoning uses explicit eye models and relies on printer models that predict distortions and exploit them to increase, rather than decrease, both spatial and gray-scale resolution. We have shown that the one-dimensional least-squares problem, in which each row or column of the image is halftoned independently, can be implemented with the Viterbi's algorithm. Unfortunately, no closed form solution can be found in two dimensions. The two-dimensional least squares solution is obtained by iterative techniques. Experiments show that least-squares model-based halftoning produces more gray levels and better spatial resolution than conventional techniques. We also show that the least- squares approach eliminates the problems associated with error diffusion. Model-based halftoning can be especially useful in transmission of high quality documents using high fidelity gray-scale image encoders. As we have shown, in such cases halftoning can be performed at the receiver, just before printing. Apart from coding efficiency, this approach permits the halftoner to be tuned to the individual printer, whose characteristics may vary considerably from those of other printers, for example, write-black vs. write-white laser printers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The quality of an image can be evaluated by performing a psychovisual test or by using quantitative quality measures. In order to assess the performance of different halftone techniques, gray scale images are halftoned in various ways and then presented to human viewers for quality evaluation. Quantitative quality criteria, such as edge correlation, mean square error and local error measures are also used for quality evaluation of the halftone images. Since the ultimate judges of image quality are human viewers, the success of these quantitative criteria as quality measures for halftones is assessed by comparing their results with the results of the psychovisual test.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A computational model for the human perception of image brightness utilizing both local and global interactions has been advanced by Grossberg, Mingolla and Todorovic. A simulation of this multi-layer, non-linear recurrent network model can be used to assess perceived image quality. The model is validated by examining the simulation of a classical brightness perception phenomenon, in particular, Glass patterns. Results of a comparative evaluation of three halftoning algorithms are offered which indicate that the model is useful for the evaluation of image processing algorithms. Human subjects ranked the quality of the images halftoned with each of three different algorithms at two different viewing distances. After processing by the brightness perception model, ranking of objective measures of the simulated model output correspond with the rankings assigned by human observers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Algorithms and Image Coding and Processing Based on Visual Models
This paper asks how the vision community can contribute to the goal of achieving perceptually lossless image fidelity with maximum compression. In order to maintain a sharp focus the discussion is restricted to the JPEG-DCT image compression standard. The numerous problems that confront vision researchers entering the field of image compression are discussed. Special attention is paid to the connection between the contrast sensitivity function and the JPEG quantization matrix.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spatial/spatial-frequency representations have proven to be an interesting and powerful framework for the simulation of a number of visual effects. Results consistent with observations of the human visual system have been obtained at levels ranging from the shape of receptive field profiles to perceptual grouping and texture segmentation. A number of representations are under study, in a number of different fields. A key issue in comparing these representations is the resolution that can be attained (simultaneously) in the joint domain. The uncertainty principle dictates that arbitrarily high resolution cannot be achieved in both space and spatial-frequency. Joint resolution can range from singular functions in space (with infinite extent in spatial-frequency), to the reverse (e.g., the pixel representation at one extreme, and the Fourier transform at the other). In this paper, we discuss some of the available representations in the context of image sequence coding, and establish some of the characteristics desired in a representation for this application. We show that the joint resolution of a representation, in particular, can affect the performance of coding methods based on that representation. Examples which illustrate this point using industry standard DCT-based methods are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We report on psychovisual experiments designed to obtain subjective-based thresholds for a novel conditional-replenishment image-sequence coder. This coder attempts to avoid the replenishment of textured blocks for which no subjective change has occurred from the previous to the current frame. Typically, such blocks give rise to a large difference signal with respect to the corresponding block in the previous image, and hence are coded (replenished) in commonly used coders. We designed and conducted extensive visual experiments to study the response of the human visual system to stimuli that are relevant to the coding algorithm. Three major classes of experiments were conducted with numerous parametric variations for each, in which the observers were asked to discriminate target elements with properties that differed from those of the background: (1) Uniform targets on uniform background of different intensity. (2) Textured targets of varying standard deviation on a uniform background of the same average intensity. (3) Textured targets on a textured background with the same standard deviation, but different average intensity. We report on the results of these experiments and on the improvement in the performance of the coder, as a result of implementing these results in the encoding algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Coding results at 384 kbps are presented based on a three—dimensional subband framework where the original image data is decomposed into spatio—temporal frequency bands. This tree—structured framework was originally introduced in [1]. The 3-Dsubband decomposition consists of two temporal subbands followed by a cascade of spatial decompositions as shown in Figure 1. The temporal filtering is based on the 2—tap Haar filterbank while the spatial filtering, both horizontal and vertical, is based on 10—tap quadrature mirror filterbanks (QMFs) ofJohnston [2]. The 11 subbands for any video sequence are displayed as given in the template of Figure 2. Figure 3 shows the frequency decomposition based on the framework given in Figure 1 for one of the image sequences, "Melanie" ,presented here. A more extensive study of the effects of different filter types on the coding results in a subband framework will be discussed in [3]. In general, for the image sequences that we have looked at, subbands 9, 10, 11 can be discarded without causing severe degradations in the original image sequences due to the low signal energy and low perceptual sensitivity in the higher frequency bands. A fixed coding rate implies a fixed number of bits for the subbands at each instant in time. The bits are adaptively allocated to the subbands based on a local energy criterion. More bits are allocated to subband 8 (low spatial—high temporal frequency band) when the motion activity is high. Bits are dynamically reallocated to subbands 2—6 (high spatial—low temporal frequency bands) when the motion activity drops below a threshold. Conditional replenishment is also implemented in all of the frequency bands in order to code static objects and background at a very low bit rate. This results in a locally adaptive frame rate coder. Conditional replenishment (CR) implemented in the 3-D framework also appears in [4, 5]. The lowest frequency subbands, labeled subbands 1—3 in Figure 1 , contain the most signal energy and require very high quality encoding in order to preserve good reconstructed picture quality. These subbands have first priority in the bit allocation and are encoded using PCM with a uniform quantizer. Once conditional replenishment allows the bit rate to drop for subbands 1—3, the higher frequency subbands can be more accurately encoded with the additional bits. The significant highpass subbands (for the case examined here, subbands 4-11 in Fig 2) are encoded using a new form of vector quantization called Geometric Vector Quantization (GVQ) which takes advantage of the sparse and highly structured characteristics of the upper frequency data. GVQ was first introduced in [6, 7]. GVQ consists of purely deterministic codebooks which require no training. The codebook entries consist of L—levels (where L and the codevector size are determined by the overall bit rate). For the results presented here, we will examine a codebook consisting of 3—levels with some constraints on the levels in order to reduce the search complexity. Section 2 discusses the bit allocation, conditional replenishment and encoding of the lowest frequency bands. Section 3 describes the generalized GVQ method and results based on 3—level quantization. Section 4 includes coding results, future avenues of research and the conclusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The theory of Intensity-Dependent Spread functions (IDS) is a model of the human visual system. The motivation behind IDS is to balance resolution and reliability. The system does this, but it also predicts many phenomena in human vision. IDS is a nonlinear, adaptive system that enhances edges, nonlinearly compresses dynamic ranges and automatically adjusts to local variations in intensity. It has a single free parameter, uses only additions in its on-line calculations and can be performed on a parallel processor. Another property of the system is that for inputs with only two intensities, e.g. disks, square waves and step edges, the output reduces exactly to one plus the convolution of the input with a bandpass filter whose passband is determined by the two intensities. For a Gaussian spread function the transfer function becomes the Difference-of-Gaussians (DoG) filter but the bandwidth set automatically by the input intensities. This paper demonstrates how IDS can be used for digital image enhancement. There is an artificial image that illustrates the characteristics of IDS processing and shows how the theoretical results translate into visual effects. There are also several realistic scenes that have been enhanced by IDS.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a new algorithm that utilizes mathematical morphology for pyramidal decomposition of color images. Several previous approaches have utilized linear or morphological smoothing to obtain pyramidal representations of monochrome images. In this paper an extension of various previously developed monochrome pyramid algorithms is presented. Our decomposition algorithm allows for lossy color image compression by using Block Truncation Coding at the pyramid levels to attain reduced bit rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Previous work at the Institute for Perception Research has resulted in a new model for representing images, called a polynomial transform. This transform is perceptually relevant since it mimics properties of the early stages of human vision such as localization and decomposition of luminance changes into specific basic patterns, i.e., localized polynomials. The transform also has interesting signal processing properties, some of which will be illustrated in this paper. Image interpolation is an example of how polynomial transforms can be used for image restoration. It is derived that the polynomial transform coefficients of a blurred and sampled image are related by a linear transformation to the polynomial transform coefficients of the original image. By inverting this transformation, we can obtain deblurring and interpolation. This inversion is based on the assumption that the image can be locally approximated by a low-order polynomial description. By adopting a fixed degree for this a priori polynomial description, we obtain non-adaptive interpolation algorithms. The performance of the algorithm can be further improved by varying the degree of this a priori polynomial description depending on whether the image region is locally uniform or non- uniform. Especially in the presence of noise, this adaptivity is usually very important. It is shown how such space-variant image processing can be easily described and implemented using polynomial transforms. Subjective evaluation of the image interpolation technique aims at optimizing the parameter values of the algorithm, as well as comparing the new algorithm to existing interpolation techniques. Some results of this evaluation are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we describe how a recently developed model for image representation based on polynomial transforms can be applied in image restoration. Two related algorithms for noise reduction in Computed Tomography (CT) images are presented. They show the efficiency of the model to detect and restore edges embedded in noise by adapting the amount of noise reduction to the local image content in a multiresolution fashion. A subjective evaluation of the noise reduction techniques is used to optimize the parameter values of the algorithms. Results of this evaluation are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a spatial frequency analysis of the luminance and chrominance images derived from 20 scenes representative of natural terrain. Our results weakly support the claim that the human visual system has access to spatial frequencies of luminance and chrominance in relative proportion to their occurrence in natural scenes. The weak effect that we found may be due to the limited gamut of colors present in the scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This is a two-part study of the human visual system's mechanism for normalization in color constancy phenomena. Part I uses a viewing technique shown by Maximov. The viewing box consists of a cardboard shoe box with a hole cut in the top to pass light, a color correcting filter and a viewing tube. The viewing tube restricts the field of view on the opposite wall where small five-area Mondrians, called Tatami are mounted.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of electrophysiology in this century, many predictions made by visual psychophysics, like trichromacy, have been linked to data on visual cells, but the proposed links with respect to chromatic and achromatic visual processes are not clearly consistent beyond photoabsorption, and the interactions of photoreceptor signals at the higher visual stages are still under very active study and discussion. This paper reviews the problems involved in trying to relate an important postulate of psychophysical zone theories of color vision--the post-receptor chromatic-achromatic independence to neurophysiological data and visual multiplexing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes the design and operation of a new simulation model for color matrix display development. It models the physical structure, the signal processing, and the visual perception of static displays, to allow optimization of display design parameters through image quality measures. The model is simple, implemented in the Mathematica computer language, and highly modular. Signal processing modules operate on the original image. The hardware modules describe backlights and filters, the pixel shape, and the tiling of the pixels over the display. Small regions of the displayed image can be visualized on a CRT. Visual perception modules assume static foveal images. The image is converted into cone catches and then into luminance, red-green, and blue-yellow images. A Haar transform pyramid separates the three images into spatial frequency and direction-specific channels. The channels are scaled by weights taken from human contrast sensitivity measurements of chromatic and luminance mechanisms at similar frequencies and orientations. Each channel provides a detectability measure. These measures allow the comparison of images displayed on prospective devices and, by that, the optimization of display designs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An experiment to quantify the effect of white point, black point, and surround color in color matching is presented. In this experiment, as in 'classical' colorimetry experiments, subjects are shown a color stimulus and are asked to adjust a parameter until a match is obtained. The stimuli used in this investigation consist of uniform color fields and continuous tone color images, whose white point, black point and surround color are varied. The results of this investigation are directly applicable to the display of images in a variety of displays and viewing conditions, for example, in the exchange of images among a variety of displays.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The precision of human vision requires displays to be accurate to about 0.2% of the luminance range. We present a technique by which this grey-level precision can be achieved with the use of an 8-bit color monitor. The basic idea is to 'steal' adjacent bits from the color variation for use in increasing the precision of the luminance variation. On a monitor with 8 bits per color gun, the technique can provide 1786 or more grey levels at a cost of one bit of color jitter, with standard D/A hardware. The color variations are invisible under almost all conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A model is developed to approximate visibility thresholds for discrete cosine transform (DCT) coefficient quantization error based on the peak-to-peak luminance of the error image. Experimentally measured visibility thresholds for R, G, and B DCT basis functions can be predicted by a simple luminance-based detection model. This model allows DCT coefficient quantization matrices to be designed for display conditions other than those of the experimental measurements: other display luminances, other veiling luminances, and other spatial frequencies (different pixel spacings, viewing distances, and aspect ratios).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present the results of a study of the sensitivity of the human visual system (HVS) to spatially varying color stimuli. Sinusoidal grating patterns of different spatial frequencies were presented to six observers and the contrast required to just distinguish the pattern from the surrounding uniform field was determined. Tables and curves of contrast (measured in ΔELab) as a function of frequency were generated at different values of; the orientation (horizontal, vertical and diagonal) of the pattern, the average luminance, the x and y chromaticity co-ordinates, and the direction of the variation of the stimulus in color space (luminance, red-green, and blue-yellow). The results show that the HVS is more sensitive to sinusoidal gratings oriented horizontally and vertically regardless of the type of variation. Furthermore, the HVS is more sensitive to luminance variations than it is to chromatic variations. Tables and curves of the data are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The limited color spatial acuity of the human visual system is exploited to develop a more efficient algorithm for realistic image synthesis. A screen subdivision ray tracer is modified to control the amount of chromatic and achromatic detail present at the edges in an environment. An opponents color space (previously used to select wavelengths for synthetic image generation) is used to define the chromatic and achromatic channels present in the image. Computational savings achieved by the algorithm are discussed. A perceptual evaluation shows that image quality is not seriously degraded by the use of the technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Higher-Level Vision: Perceiving Surfaces and Objects
Machine vision has exhibited rather slow progress over the last 25 years compared to other areas of Computer Technology and an interesting question is whether more progress will be made by continuing intensively the current approaches in research or instead by searching for new directions. I will present a thesis that more research along some of the prevailing lines will lead to best to marginal advances. For example, current edge detectors are very good at finding step edges. The problem is that their major objective, finding object outlines, is not equivalent to step edge finding. It seems that people find object outlines by a process of simultaneous interpretation and low level processing of the image. Such integration should be contrasted to one of the prevailing models in machine vision which assumes a linear sequence of a few distinct processes from low level to high level vision. (When researches talk about 'interpretation guide segmentation' they usually refer to the labeling of already obtained features using high level models of the scene.) If the levels interact strongly, then optimizing the processing techniques for each level separately (for example edge detection for low level vision) is not going to be fruitful. Since human vision is the reason that we believe that machine vision is even possible, a deeper examination of the human or animal vision recognition process is essential to the further progress of machine vision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The theory of Wilson's human spatial vision model was tested for real-world target detection and discrimination. The model was implemented on an image-processing system (IPS) as a rectangular sampling grid with Nyquist frequency resolution. A digitized airplane image target (AIT) was filtered with the model and the response magnitudes of the filters documented. Two filter targets were generated using the combined visual outputs of either the three filters with the highest absolute response magnitudes (FT1) or three filters with one-half maximum response magnitude (FT2). Six experiments were conducted to determine if FT1 or FT2 yields similar psychophysical thresholds as AIT. Four experiments measured detection thresholds under contrast reduction, static noise, and dynamic noise conditions, and two experiments measured discrimination thresholds. AIT and filter target thresholds were found to be significantly different when FT2 was the filter target and when FT1 was measured under the dynamic noise condition. We conclude that the theory of Wilson's model holds for real-world target detection and discrimination when temporal visual processing is not required.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real-world images are highly complex, and it is not yet understood how the human visual system processes this information in recognition tasks. Most current psychophysical models of discrimination assume that decisions are made on the basis of information directly accessible from the spatially-tuned mechanisms that mediate detection. Detection mechanisms are localized with respect to orientation and spatial frequency in the Fourier domain, and have been shown to process disparate components along each dimension independently. We demonstrate that in discrimination tasks with complex stimuli, disparate Fourier components are not generally processed independently. Both masking and configuration-dependent effects are found. The pattern of results suggests that mediating pathways are not always localized in Fourier space, but in some case integrate information across wide regions of the domain. The integrating mechanisms appear specialized to signal particular differences or transformations that apply to rigid objects. We present a quantitative model based both on primary Fourier components and on non-arbitrary combinations of these components, and we show how this model accounts for our current complex-discrimination results. Finally, we suggest how a class of concurrent-rating experiments can be used to further test this model and identify the nature of the integrating mechanisms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A fundamental issue in texture analysis is that of deciding what textural features are important in texture perception, and how they are used. Experiments on human pre-attentive vision have identified several low-level features (such as orientation on blobs, and size of line segments), which are used in texture perception. However, the question of what higher level features of texture are used has not been adequately addressed. We designed an experiment to help identify the relevant higher order features of texture perceived by humans. We used twenty subjects, who were asked to perform an unsupervised classification of thirty pictures from Brodatz's album on texture. Each subject was asked to group these pictures into as many classes as desired. Both hierarchical cluster analysis and non-metric MDS were applied to the pooled similarity matrix generated from the subjects' groupings. A surprising outcome is that the MDS solutions fit the data very well. The stress in the two dimensional case is 0.10, and in the three dimensional case is 0.045. We rendered the original textures in these coordinate systems, and interpreted the (rotated) axes. It appears that the axes in the 2D case correspond to periodicity versus irregularity, and directional versus non-directional. In the 3D case, the third dimension represents the structural complexity of the texture. Furthermore, the clusters identified by the hierarchical cluster analysis remain virtually intact in the MDS solution. The results of our experiment indicate that people use three high level features for texture perception. Future studies are needed to determine the appropriateness of these high-level features for computational texture analysis and classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There is a great deal of uncertainty and controversy surrounding the artwork legacy of Rembrandt van Rijn. Much of the difficulty stems from the dearth of reliable contemporary documentation covering the artist's activities as well as the great number of students who painted in his studio. Consequently, attributions have rested heavily upon subjective assessments of style and execution, together with whatever historical evidence can be uncovered. The dilemma associated with selecting those works which should be assigned to Rembrandt is complicated further by his fame and the potential for great financial return from the discovery of new pieces. In recent decades this dilemma has been alleviated to a considerable degree by the introduction analytical scientific methods for analyzing (and, in some cases, dating) the materials of an artwork. However, the greatest impact of materials analyses has been to throw out many style-based attributions after finding that the materials were inconsistent with the artist's legacy. Thus, materials analyses typically play a negative role in showing that an attribution is impossible rather than proving that the work in question was by the artist in question. On the other hand a new opportunity is at hand as a consequence of the emergence of digital computer image processing technology. It is now possible to apply this tool to the direct attribution of a painting through analyses of statistical properties pertaining to palette, albedo, and impasta. This paper describes the first efforts at creating a data base on the properties and statistics of Rembrandt portraits so as to provide a basis for determining which should be included in the body of his works, rather than which should be excluded.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Depth Perception and Sensory Integration for Complex Visual Environments
Most virtual-reality systems use LCD-based displays that achieve a large field-of-view at the expense of resolution. A typical display will consist of approximately 86,000 pixels uniformly distributed over an 80-degree by 60-degree image. Thus, each pixel subtends about 13 minutes of arc at the retina; about the same as the resolvable features of the 20/200 line of a Snellen Eye Chart. The low resolution of LCD-based systems limits task performance in some applications. We have examined target-detection performance in a low-resolution virtual world. Our synthesized three-dimensional virtual worlds consisted of target objects that could be positioned at a fixed distance from the viewer, but at random azimuth and constrained elevation. A virtual world could be bounded by chromatic walls or by wire-frame, or it could be unbounded. Viewers scanned these worlds and indicated by appropriate gestures when they had detected the target object. By manipulating the viewer's field size and the chromatic and luminance contrast of annuli surrounding the field-of-view, we were able to assess the effect of field size on the detection of virtual objects in low-resolution synthetic worlds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new evaluation method of visual wide-field effects using human postural control analysis is proposed. In designing a television system for future, it is very important to understand the dynamic response of human beings in order to evaluate the visual effects of displayed images objectively. Visual effects produced by 3-D wide-field images are studied. An observer's body sway produced by postural control is discussed using rotating 2-D and 3-D images. Comparisons between stationary and rotating images are also performed. A local peak appears in power spectra of the body sway for the rotating images (3-D and 2-D). On the other hand, no distinctive component appears in the power spectra for the stationary images. By extending the visual field, the cyclic component can be proved from the audio-correlation function of the body sway for the rotating images. These results suggest that displayed images induce the postural control. The total length of the body sway locus is also analyzed to evaluate the postural control. The total length for the rotating images increases in proportion to viewing angles, and is nearly saturated after 50 (deg). Moreover, it is shown that the total length for the rotating 3-D image is greater than for the rotating 2-D image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Contrast thresholds are lower for detection of a vertical pattern than for an obliquely-oriented pattern. Is there an analogous oblique effect for the depth threshold of a stereoscopic luminance pattern? If so, why? Are the causes different from those for an oblique effect with monocular vision? To explore these issues, we used stereoscopic blurry bar (D6) luminance patterns with a peak spatial frequency of 2 or 4 cycles/degree (cpd) and either a vertical or an oblique orientation. We obtained psychometric functions for data obtained from a method of constant stimuli procedure, using 100 forced-choice trials for each datum. For each of three observers we estimated stereoacuity with a maximum-likelihood curve-fitting procedure. Subjects showed better stereoacuity for the vertical spatial patterns than for the oblique patterns. Some possible causes are that for oblique patterns (unlike vertical patterns) (1) the total vertical extent of the pattern is shrunk by a factor of sin((theta) ), where (theta) equals 90 degree(s) for vertical; (2) the pattern is 'stretched out' in the horizontal direction by a factor of csc((theta) ); (3) there are vertical as well as horizontal retinal disparities. Perhaps the resulting sparseness of horizontal disparity information or the potential vertical disparities in the oblique patterns reduce stereoacuity. To disentangle these causes, we used several different experimental conditions (e.g., elongation of oblique patterns) run in randomized blocks of trials. We will discuss these results and implications for stereopsis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We analyzed the apparent depth and size of stereoscopic images and verified the results by a psycho-physical experiment. Random dot stereograms containing horizontal disparity (parallax) were displayed field-sequentially on a CRT display, and the apparent depth and size of the image was measured by using two LED's as a matching index. The conclusions derived are: stereoscopic images are larger than the image on the screen when formed behind it and smaller when formed in front. When an image is formed in front of the screen, its apparent depth saturates as a function of parallax, resulting in a decrease in the sensitivity of apparent depth at large values of parallax. Neither is a linear function of disparity between the left and right images on the screen.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Random-dot stereograms first generated by Julesz have since been much used for research in vision and perception. When stereograms are binocularly viewed, three-dimensional surfaces can be perceived hovering over the random-dot background. It can be observed that when the viewing distance alters, the hovering depth of the surface also changes and that if we move our eyes to and fro sideways while viewing, the hovering surface moves with our eye movement. It is believed that the information about depth and three-dimensional shape available from the horizontal component of the stereo disparity field requires interpretation in conjunction with information about egocentric viewing distance. This paper shows the relationship between hovering depth and viewing position. The hovering depth can be calculated providing the interocular distance, the convergence angle and the disparity are known. The ratio of the hovering depths at two different viewing positions is equal to the ratio of the corresponding viewing distances. A mathematical explanation is given of the fact that changing viewing position results in changing of the perceived depth of the hovering surface in stereograms. The horizontal shift of the hovering surface has a linear relationship with the amount of eye movement, and the ratio between them is determined by a the disparity and the interocular distance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a dichoptic stimulus, where one feature in one eye could participate in both rivalry and stereopsis with features in the other eye, 3D perception was lost intermittently. The periods of loss of 3D percept were positively correlated with the periods of rivalrous suppression, and the degree of difference in rivalrous suppression between the eyes, due to variable eye dominance, was positively correlated with the degree of loss of 3D percept. This suggested that because differential luminance between the eyes affects their dominance in rivalry, stereopsis in the presence of rivalry would be similarly affected, and we now suggest that it is. These results are not predicted by any of the presently popular accounts of how rivalry and 'fusion' coexist. Instead, it seems that rivalry and 'fusion' are not two modular processes and that rivalry and stereopsis are affected by the same factors within an interactive network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper examines the mapping of data onto perceptual dimensions to create a visualization. The choice of perceptual variables depends critically on the goals of the visualization, and this task-dependency has implications for (1) the veridicality of the representation, (2) the mapping of data onto 'preattentive' features, and (3) the use of multiple senses to represent the data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Despite aggressive work on the development of sensor fusion algorithms and techniques, no formal evaluation procedures have been proposed. Based on existing integration models in the literature, an evaluation framework is developed to assess the operator's ability to use multisensor, or sensor fusion, displays. The proposed evaluation framework for evaluating the operator's ability to use such systems is a normative approach: The operator's performance with the sensor fusion display can be compared to the model's predictions based on the operator's performance when viewing the original sensor displays prior to fusion. This allows for the determination as to when a sensor fusion system leads to: (1) poorer performance than one of the original sensor displays (clearly an undesirable system in which the fused sensor system causes some distortion or interference); (2) better performance than with either single sensor system alone, but at a sub-optimal (compared to the model predictions) level; (3) optimal performance (compared to model predictions); or, (4) super-optimal performance, which may occur if the operator were able to use some highly diagnostic 'emergent features' in the sensor fusion display, which were unavailable in the original sensor displays. An experiment demonstrating the usefulness of the proposed evaluation framework is discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As the world of telecommunications moves towards the possibility of providing images along with sound in the form of videophones, so the need arises to establish the ways in which an image may be able to enhance the communicative effectiveness of the users. Providing users with an acceptable image of each other as they are speaking is neither cheap nor simple to deliver and a case for it has to be made. This paper briefly summarizes work already carried out in the field and describes three experiments aimed at assessing the contribution of images to speech intelligibility. The first experiment dealt with connected speech masked by noise, the second dealt with the discrimination of single words from similarly sounding words, and the third dealt with the comprehension of information intense connected speech. The conclusions drawn are that an image does indeed have a part to play, but that the benefits are dependent on the sufficiency of the speech information and the way it is presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human Perception, Performance, and Presence in Virutal Environments
Pictorial display aids represent synthetic environments within which users interact with symbolic elements representing objects and processes in the real world. The design of these environments challenge researchers to understand the elements of the physical environment that make them predicable and understandable. By incorporating these aspects of the physical world, useful representation aids can improve the naturalness of their symbolic representation, geometric structure, and dynamic response.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Virtual environment displays can show realistic scenes that include a wide variety of visual depth cues. This realism may also elicit telepresence, or the feeling by the viewer that he or she is actually present in the distant or virtual world. For applications such as telerobotics, realistic representations of the remote site may improve task performance. However, timing demands and hardware power may limit the realism that can be achieved. Thus, it is necessary to consider how telepresence affects the operator's task performance. Increasing telepresence may only asymptotically improve performance because of the adaptability of the human operator. In our present experiments, we examined the contribution of two important depth cues, occlusion and disparity, on the performance of a simulated telerobotic task. We have simulated a three-axis tracking task that is viewed under four different levels of realism. We hoped to determine if the combined presentation of the depth cues has a more beneficial effect on performance than either depth cue presented singularly. Results showed similar performance improvements with the presentation of occlusion or disparity individually. When both cues were present together, a somewhat larger performance improvement was measured.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object manipulation performance in a virtual environment is discussed. Factors which affect performance such as stereo visual cues, motion parallax, and force sensation are discussed. For visual factor research, tasks such as placing a virtual cylindrical object onto a virtual platform were assigned to the subjects. The hardware to generate virtual 3D space consisted of a fixed stereo CRT and glasses with LC shutters or HMD. A VPL Data Glove was used for virtual object manipulation. Results showed that the use of stereo display and motion parallax are very helpful for this type of task. Also, the existence of physical laws such as self- alignment through physical constraints is very helpful. Force/touch factors are also investigated by adding force display to the system mentioned above. A mechanical master arm was used to provide these sensations. The assigned task was to place a cubic object into a hole. The results from this example were a good indication that force sensation is indispensable for performing these types of tasks. Particularly, the boundary constraints for the motion of an operator played an important role in providing realistic sensations. Through these experiments, the importance of the aforementioned factors in providing realistic sensations within a virtual environment was clarified quantitatively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A force-reflecting teleoperation training stimulator with a high-fidelity real-time graphics display has been developed for operator training. A novel feature of this simulator is that it enables the operator to feel contact forces and torques through a force-reflecting controller during the execution of the simulated peg-in-hole task, providing the operator with the feel of visual and kinesthetic force virtual reality. A peg-in-hole task is used in our simulated teleoperation trainer as a generic teleoperation task. A quasi-static analysis of a two- dimensional peg-in-hole task model has been extended to a three-dimensional model analysis to compute contact forces and torques for a virtual realization of kinesthetic force feedback. The simulator allows the user to specify force reflection gains and stiffness (compliance) values of the manipulator hand for both the three translational and the three rotational axes in Cartesian space. Three viewing modes are provided for graphics display: single view, two split views, and stereoscopic view.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Helmet Mounted Display (HMD) system developed in our lab should be a useful teleoperator systems display if it increases operator performance of the desired task; it can, however, introduce degradation in performance due to display update rate constraints and communication delays. Display update rates are slowed by communication bandwidth and/or computational power limitations. We used simulated 3D tracking and pick-and-place tasks to characterize performance levels for a range of update rates. Initial experiments with 3D tracking indicate that performance levels plateau at an update rate between 10 and 20 Hz. We are currently running pick-and-place experiments and plan to explore if graphical enhancements that slow the display can increase performance more than their cost due to speed. Delays degrade performance in any manipulation task, and we have found that using the HMD with delay decreases performance as delay increases. We are also currently comparing effects produced by manipulator control and display control. Using a predictive display, we will explore if the display effects can be mitigated and if the HMD can be used to increase performance with delays.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A key task in virtual environments is visual search. To obtain quantitative measures of human performance and documentation of visual search strategies, we have used three experimental arrangements--eye, head, and mouse control of viewing windows--by exploiting various combinations of helmet-mounted-displays, graphics workstations, and eye movement tracking facilities. We contrast two different categories of viewing strategies: one, for 2D pictures with large numbers of targets and clutter scattered randomly; the other for quasi-natural 3D scenes with targets and non-targets placed in realistic, sensible positions. Different searching behaviors emerge from these contrasting search conditions, reflecting different visual and perceptual modes. A regular 'searchpattern' is a systematic, repetitive, idiosyncratic sequence of movements carrying the eye to cover the entire 2D scene. Irregular 'searchpatterns' take advantages of wide windows and the wide human visual lobe; here, hierarchical detection and recognition is performed with the appropriate capabilities of the 'two visual systems'. The 'searchpath', also efficient, repetitive and idiosyncratic, provides only a small set of fixations to check continually the smaller number of targets in the naturalistic 3D scene; likely, searchpaths are driven by top-down spatial models. If the viewed object is known and able to be named, then an hypothesized, top-down cognitive model drives active looking in the 'scanpath' mode, again continually checking important subfeatures of the object. Spatial models for searchpaths may be primitive predecessors, in the evolutionary history of animals, of cognitive models for scanpaths.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Users of virtual and teleoperator display systems are said to experience "being in" the simulated or remote environment; this experiential state is commonly referred to as "presence" or "telepresence". This article identifies another experiential state, "distal attribution", in which the user experiences "being in touch with" the simulated or remote environment while fully cognizant of being in the real environment in which the display is situated. The focus of the article is on the factors or conditions that promote presence and distal attribution and with empirical methods that might be used to assess the two. Out of the author's conviction that a proper understanding of these states will derive only from explicit consideration of the nature of ordinary perceptual experience, the article begins with some important phenomenological facts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.