The development of an image processing pipeline for each new camera design can be time-consuming. To speed
camera development, we developed a method named L3 (Local, Linear, Learned) that automatically creates an
image processing pipeline for any design. In this paper, we describe how we used the L3 method to design and
implement an image processing pipeline for a prototype camera with five color channels. The process includes
calibrating and simulating the prototype, learning local linear transforms and accelerating the pipeline using
graphics processing units (GPUs).
To speed the development of novel camera architectures we proposed a method, L3 (Local, Linear and Learned),that automatically creates an optimized image processing pipeline. The L3 method assigns each sensor pixel into one of 400 classes, and applies class-dependent local linear transforms that map the sensor data from a pixel and its neighbors into the target output (e.g., CIE XYZ rendered under a D65 illuminant). The transforms are precomputed from training data and stored in a table used for image rendering. The training data are generated by camera simulation, consisting of sensor responses and rendered CIE XYZ outputs. The sensor and rendering illuminant can be equal (same-illuminant table) or different (cross-illuminant table). In the original implementation, illuminant correction is achieved with cross-illuminant tables, and one table is required for each illuminant. We find, however, that a single same-illuminant table (D65) effectively converts sensor data for many different same-illuminant conditions. Hence, we propose to render the data by applying the same-illuminant D65 table to the sensor data, followed by a linear illuminant correction transform. The mean color reproduction error using the same-illuminant table is on the order of 4▵E units, which is only slightly larger than the cross-illuminant table error. This approach reduces table storage requirements significantly without substantially degrading color reproduction accuracy.
The high density of pixels in modern color sensors provides an opportunity to experiment with new color filter
array (CFA) designs. A significant bottleneck in evaluating new designs is the need to create demosaicking,
denoising and color transform algorithms tuned for the CFA. To address this issue, we developed a method(local,
linear, learned or L3) for automatically creating an image processing pipeline. In this paper we describe the L3 algorithm and illustrate how we created a pipeline for a CFA organized as a 2×2 RGB/Wblock containing a clear
(W) pixel. Under low light conditions, the L3 pipeline developed for the RGB/W CFA produces images that are
superior to those from a matched Bayer RGB sensor. We also use L3 to learn pipelines for other RGB/W CFAs
with different spatial layouts. The L3 algorithm shortens the development time for producing a high quality
image pipeline for novel CFA designs.
We design and analyze a high-speed document sensing and misprint detection system for real-time monitoring of printed
pages. We implemented and characterized a prototype system, comprising a solid-state line sensor and a high-quality imaging
lens, that measures in real time the light reflected from a printed page. We use sensor simulation software and signal
processing methods to create an expected sensor response given the page that is being printed. The measured response is
compared with the predicted response based on a system simulation. A computational misprint detection system measures
differences between the expected and measured responses, continuously evaluating the likelihood of a misprint. We describe
several algorithms to identify rapidly any significant deviations between the expected and actual sensor response.
The parameters of the system are determined by a cost-benefit analysis.
We introduce a new metric, the visible signal-to-noise ratio (vSNR), to analyze how pixel-binning and resizing methods
influence noise visibility in uniform areas of an image. The vSNR is the inverse of the standard deviation of the SCIELAB
representation of a uniform field; its units are 1/ΔE. The vSNR metric can be used in simulations to predict
how imaging system components affect noise visibility. We use simulations to evaluate two image rendering methods:
pixel binning and digital resizing. We show that vSNR increases with scene luminance, pixel size and viewing distance
and decreases with read noise. Under low illumination conditions and for pixels with relatively high read noise, images
generated with the binning method have less noise (high vSNR) than resized images. The binning method has
noticeably lower spatial resolution. The binning method reduces demands on the ADC rate and channel throughput.
When comparing binning and resizing, there is an image quality tradeoff between noise and blur. Depending on the
application users may prefer one error over another.
The surface reflectance function of many common materials varies slowly over the visible wavelength range. For
this reason, linear models with a small number of bases (5-8) are frequently used for representation and estimation
of these functions. In other signal representation and recovery applications, it has been recently demonstrated
that dictionary based sparse representations can outperform linear model approaches. In this paper, we describe
methods for building dictionaries for sparse estimation of reflectance functions. We describe a method for building
dictionaries that account for the measurement system; in estimation applications these dictionaries outperform
the ones designed for sparse representation without accounting for the measurement system. Sparse recovery
methods typically outperform traditional linear methods by 20-40% (in terms of RMSE).
As the number of imaging pixels in camera phones increases, users expect camera phone image quality to be comparable to digital still cameras. The mobile imaging industry is aware, however, that simply packing more pixels into the very limited camera module size need not improve image quality. When the size of a sensor array is fixed, increasing the number of imaging pixels decreases pixel size and thus photon count. Attempts to compensate for the reduction in light sensitivity by increasing exposure durations increase the amount of handheld camera motion blur which effectively reduces spatial resolution. Perversely, what started as an attempt to increase spatial resolution by increasing the number of imaging pixels, may result in a reduction of effective spatial resolution. In this paper, we evaluate how the performance of mobile imaging systems changes with shrinking pixel size, and we propose to replace the widely misused "physical pixel count" with a new metric that we refer to as the "effective pixel count" (EPC). We use this new metric to analyze design tradeoffs for four different pixel sizes (2.8um, 2.2um, 1.75um and 1.4um) and two different imaging arrays (1/3.2 and 1/8 inch). We show that optical diffraction and camera motion make 1.4 um pixels less perceptually effective than larger pixels and that this problem is exacerbated by the introduction of zoom optics. Image stabilization optics can increase the effective pixel count and are, therefore, important features to include in a mobile imaging system.
Under low illumination conditions, such as moonlight, there simply are not enough photons present to create a high quality color image with integration times that avoid camera-shake. Consequently, conventional imagers are designed for daylight conditions and modeled on human cone vision. Here, we propose a novel sensor design that parallels the human retina and extends sensor performance to span daylight and moonlight conditions. Specifically, we describe an interleaved imaging architecture comprising two collections of pixels. One set of pixels is monochromatic and high sensitivity; a second, interleaved set of pixels is trichromatic and lower sensitivity. The sensor implementation requires new image processing techniques that allow for graceful transitions between different operating conditions. We describe these techniques and simulate the performance of this sensor under a range of conditions. We show that the proposed system is capable of producing high quality images spanning photopic, mesopic and near scotopic conditions.
Precise simulation of digital camera architectures requires an accurate description of how the radiance image is transformed by optics and sampled by the image sensor array. Both for diffraction-limited imaging and for all practical lenses, the width of the optical-point-spread function differs at each wavelength. These differences are relatively small compared to coarse pixel sizes (6μm-8μm). But as pixel size decreases, to say 1.5μm-3μm, wavelength-dependent point-spread functions have a significant impact on the sensor response. We provide a theoretical treatment of how the interaction of spatial and wavelength properties influences the response of high-resolution color imagers. We then describe a model of these factors and an experimental evaluation of the model's computational accuracy.
During the last decade, a number of remarkable magnetic resonance imaging (MRI) techniques have been developed for measuring human brain activity and structure. These MRI techniques have been accompanied by the development of signal processing, statistical and visualization methodologies. We review several examples of these methods, drawn mainly from work on the human visual pathways. We provide examples of how two methods- functional MRI (fMRI) and diffusion tensor imaging (DTI) - are used. First, we explain how fMRI enables us to identify and measure several distinct visual field maps and measure how these maps reorganize following disease or injury. Second we explain how DTI enables us to visualize neural structures within the brain's wires (white matter) and measure the patterns of connectivity in individual brains. Throughout, we identify signal processing, statistical, and visualization topics in need of further methodological development.
We describe a method for integrating information from lens design into image system simulation tools. By coordinating these tools, image system designers can visualize the consequences of altering lens parameters. We describe the critical computational issues we addressed in converting lens design calculations into a format that could be used to model image information as it flows through the imaging pipeline from capture to display. The lens design software calculates information about relative illumination, geometrical distortion, and the wavelength and field height dependent optical point spread functions (PSF). These data are read by the image systems simulation tool, and they are used to transform the multispectral input radiance into a multispectral irradiance image at the sensor. Because the optical characteristics of lenses frequently vary significantly across the image field, the process is not shift-invariant. Hence, the method is computationally intense and includes a number of parameters and methods designed to reduce artifacts that can arise in shift-variant filtering. The predicted sensor irradiance image includes the effects of geometric distortion, relative illumination, vignetting, pupil aberrations, as well as the blurring effects of monochromatic and chromatic aberrations, and diffraction.
The steady increase in CMOS imager pixel count is built on the technology advances summarized as Moore's law. Because imagers must interact with light, Moore's Law impact differs from its impact on other integrated circuit applications. In this paper, we investigate how the trend towards smaller pixels interacts with two fundamental properties of light: photon noise and diffraction. Using simulations, we investigate three consequences of decreasing pixel size on image quality. First, we quantify the likelihood that photon noise will become visible and derive a noise-visibility contour map based on photometric exposure and pixel size. Second, we illustrate the consequence of diffraction and optical imperfections on image quality and analyze the implications of decreasing pixel size for aliasing in monochrome and color sensors. Third, we calculate how decreasing pixel size impacts the effective use of microlens arrays and derive curves for the concentration and redirection of light within the pixel.
In many imaging applications, there is a tradeoff between sensor spatial resolution and dynamic range. Increasing sampling density by reducing pixel size decreases the number of photons each pixel can capture before saturation. Hence, imagers with small pixels operate at levels where photon noise limits image quality. To understand the impact of these noise sources on image quality we conducted a series of psychophysical experiments. The data revealed two general principles. First, the luminance amplitude of the noise standard deviation predicts threshold, independent of color. Second, this threshold is 3-5% of the mean background luminance across a wide range of background luminance levels (ranging from 8 cd/m2 to 5594 cd/m2). The relatively constant noise threshold across a wide range of conditions has specific implications for the imaging sensor design and image process pipeline. An ideal image capture device, limited only by photon noise, must capture at least 1000 photons/pixel (1/sqrt(103) ~= 3%) to render photon noise invisible. The ideal capture device should also be able to achieve this SNR or higher across the whole dynamic range.
The Image Systems Evaluation Toolkit (ISET) is an integrated suite of software routines that simulate the capture and processing of visual scenes. ISET includes a graphical user interface (GUI) for users to control the physical characteristics of the scene and many parameters of the optics, sensor electronics and image processing-pipeline. ISET also includes color tools and metrics based on international standards (chromaticity coordinates, CIELAB and others) that assist the engineer in evaluating the color accuracy and quality of the rendered image.
Digital imager sensor responses must be transformed to calibrated (human) color representations for display or print reproduction. Errors in these color rendering transformations can arise from a variety of sources, including (a) noise in the acquisition process (including photon noise and sensor noise) and (b) sensor spectral responsivities inconsistent with those of the human cones. These errors can be summarized by the mean deviation and variance of the reproduced values. It is desirable to select a color transformation that produces both low mean deviations and low noise variance. We show that in some conditions there is an inherent trade-off between these two measures: when selecting a color rendering transformation either the mean deviation or the variance (caused by imager noise) can be minimized. We describe this trade-off mathematically, and we describe a methodology for choosing an appropriate transformation for different applications. We illustrate the methodology by applying it to the problem of color filter selection (CMYG vs. RGGB) for digital cameras. We find that under moderate illumination conditions photon noise alone introduces an uncertainty in the estimated CIELAB coordinates on the order of 1-2 ΔE units for RGGB sensors and in certain cases even higher uncertainty levels for CMYG sensors. If we choose color transformations that equate this variance, the color rendering accuracy of the CMYG and RGGB filters are similar.
When rendering photographs, it is important to preserve the gray tones despite variations in the ambient illumination. When the illuminant is known, white balancing that preserves gray tones can be performed in many different color spaces; the choice of color space influences the renderings of other colors. In this behavioral study, we ask whether users have a preference for the color space where white balancing is performed. Subjects compared images using a white balancing transformation that preserved gray tones, but the transformation was applied in one of the four different color spaces: XYZ, Bradford, a camera sensor RGB and the sharpened RGB color space. We used six scenes types (four portraits, fruit, and toys) acquired under three calibrated illumination environments (fluorescent, tungsten, and flash). For all subjects, transformations applied in XYZ and sharpened RGB were preferred to those applied in Bradford and device color space.
This paper describes practical algorithms and experimental results using the sensor correlation method. We improve the algorithms to increase the accuracy and applicability to a variety of scenes. First, we use the reciprocal scale of color temperature, called 'mired,' in order to obtain perceptually uniform illuminant classification. Second, we propose to calculate correlation values between the image color gamut and the reference illuminant gamut, rather than between the image pixels and the illuminant gamuts. Third, we introduce a new image scaling operation with an adjustable parameter to adjust overall intensity differences between images and find a good fit to the illuminant gamuts. Finally, the image processing algorithms incorporating these changes are evaluated using a real image database.
With the development of high-speed CMOS imagers, it is possible to acquire and process multiple images within the imager, prior output. We refer to an imaging architecture that acquires a collection of images and produces a single result as multiple capture single image (MCSI). In this paper we describe some applications of the MCSI architecture using a monochrome sensor and modulation light sources. By using active light sources. By using active light sources, it is possible to measure object information in a manner that is independent of the passive illuminant. To study this architecture, we have implemented a test system using a monochrome e CMOS sensor and several arrays of color LEDs whose temporal modulation can be precisely controlled. First, we report on experimental measurement that evaluate how well the active and passive illuminant can be separated as a function of experimental variables, including passive illuminant intensity, temporal sampling rate and modulation amplitude. Second, we describe two applications of this technique: (a) creating a color image from a monochrome sensor, and (b) measuring the spatial distribution of the passive illuminant.
Pixel design is a key part of image sensor design. After deciding on pixel architecture, a fundamental tradeoff is made to select pixel size. A small pixel size is desirable because it results in a smaller die size and/or higher spatial resolution; a large pixel size is desirable because it results in higher dynamic range and signal-to-noise ratio. Given these two ways to improve image quality and given a set of process and imaging constraints an optimal pixel size exists. It is difficult, however, to analytically determine the optimal pixel size, because the choice depends on many factors, including the sensor parameters, imaging optics and the human perception of image quality. This paper describes a methodology, using a camera simulator and image quality metrics, for determining the optimal pixel size. The methodology is demonstrated for APS implemented in CMOS processes down to 0.18 (mu) technology. For a typical 0.35 (mu) CMOS technology the optimal pixel size is found to be approximately 6.5 micrometers at fill factor of 30%. It is shown that the optimal pixel size scales with technology, btu at slower rate than the technology itself.
In this paper, we review several algorithms that have been proposed to transform a high dynamic range image into a reduced dynamic range image that matches the general appearance of the original. We organize these algorithms into two categories: tone reproduction curves (TRCs) and tone reproduction operators (TROs). TRCs operate pointwise on the image data, making the algorithms simple and efficient. TROs use the spatial structure of the image data and attempt to preserve local image contrast.
We have developed a software simulator to create physical models of a scene, compute camera responses, render the camera images and to measure the perceptual color errors between the scene and rendered imags. The simulator can be used to measure color reproduction errors and analyze the contributions of different sources to the error. We compare three color architectures for digital cameras: (a) a sensor array containing three interleaved color mosaics, (b) an architecture using dichroic prisms to create three spatially separated copies of the image, (c) a single sensor array coupled with a time-varying color filter measuring three images sequentially in time. Here, we analyze the color accuracy of several exposure control methods applied to these architectures. The first exposure control algorithm simply stops image acquisition when one channel reaches saturation. In a second scheme, we determine the optimal exposure time for each color channel separately, resulting in a longer total exposure time. In a third scheme we restrict the total exposure duration to that of the first scheme, but we preserve the optimum ratio between color channels. Simulator analyses measure the color reproduction quality of these different exposure control methods as a function of illumination taking into account photon and sensor noise, quantization and color conversion errors.
We describe computational experiments to predict the perceived quality of multilevel halftone images. Our computations were based on a spatial color difference metric, S-CIELAB, that is an extension of CIELAB, a widely used industry standard. CIELAB predicts the discriminability of large uniform color patches. S-CIELAB includes a pre- processing stage that accounts for certain aspects of the spatial sensitivity to different colors. From simulations applied to multilevel halftone images, we found that (a) for grayscale image, L-spacing of the halftone levels results in better halftone quality than linear-spacing of the levels; (b) for color images, increasing the number of halftone levels for magenta ink results in the most significant improvement in halftone quality. Increasing the number of halftone levels of the yellow ink resulted in the least improvement.
A simple method of converting scanner (RGB) responses to estimates of object tristimulus (XYZ) coordinates is to apply a linear transformation to the RGB values. The transformation parameters are selected subject to minimization of some relevant error measure. While the linear method is easy, it can be quite imprecise. Linear methods are only guaranteed to work when the scanner sensor responsivities are within a linear transformation of the human color- matching functions. In studying the linear transformation methods, we have observed that the error distribution between the true and estimated XYZ values is often quite regular: plotted in tristimulus coordinates, the error cloud is a highly eccentric ellipse, often nearly a line. We will show that this observation is expected when the collection of surface reflectance functions is well-described by a low-dimensional linear model, as is often the case in practice. We will discuss the implications of our observation for scanner design and for color correction algorithms that encourage operator intervention.
We describe a Iinearscannermodelthatprovides a useful
characterization of the response of a scanner to diffusely reflecting surfaces. We show how the linear model can be used to estimate that portion of the scanner sensor responsivities that fall within the
linear space spanned by the input signals. We also describe how the model can be extended to characterize a scanner's response to surfaces that fluoresce under the scanner illuminant.
The spectral power distribution of the light that reflects from a surface to the eye depends both on the reflectance function of the surface and on the spectral power distribution of the illuminant The human visual system actively adjusts to reduce the dependence of surface color appearance on the illumination. We use a matching paradigm to measure the change in the cone coordinates of a color signal necessary to maintain constant color appearance across an illuminant change. If the visual system made no adjustment, the measured cone coordinate change would be zero. If the visual system were perfectly color constant, the measured change would compensate exactly for the physical change in cone coordinates due to the change in illuminant
It is not possible to measure the visual system’s adjustment to all combinations of illuminant changes and surfaces. Therefore we develop and test a model of this adjustment. In our laboratory experiments two variables govern the adjustment: the cone coordinates of the reflected color signal and the change in the spectral power distribution of the illuminant We model the visual system’s adjustment using a bilinear function. We determine the parameters of the bilinear function from a small number of measurements. We show that the bilinear model predicts the visual system’s adjustment to many other surface and illuminant combinations.
This paper describes a method for estimating the surface spectral reflectance function of inhomogeneous
objects. The standard reflectance model for inhomogeneous materials suggests that surface
reflectance functions can be described as the sum of a constant (specular) function and a subsurface
( diffuse) function. First we present an algorithm to generate an illuminant estimate without using a
reference white standard. Next we show that several physical constraints on the reflectance functions can
be used to estimate the subsurface component. A band of the estimated spectral reflectance functions
is recovered as possible solutions for the subsurface component.
SC762: Device Simulation for Image Quality Evaluation
Customers judge the image quality of a digital camera by viewing the final rendered output. Achieving a high quality output depends on the multiple system components, including the optical system, imaging sensor, image processor and display device. Consequently, analyzing components singly, without reference to the characteristics of the other components, provides only a limited view of the system performance. An integrated simulation environment, that models the entire imaging pipeline, is a useful tool that improves understanding and guides design.
This course will introduce computational models to simulate the scene, optics, sensor, processor, display, and human observer. Example simulations of calibrated devices and imaging algorithms will be used to clarify how specific system components influence the perceived quality of the final output.