Recent mobile imaging seeks to expedite the autofocus process by embedding a phase detector in the image sensor to provide information for controlling both the magnitude and direction of lens movement. Compared to conventional contrast-detection autofocus, phase-detection autofocus (PDAF) is able to quickly bring the lens toward the in-focus position. However, the presence of sensor noise, the lack of image contrast, and the spatial offset between the left and right phase detectors can easily affect the performance of phase detection. We present a statistical approach to address this issue by characterizing the distribution of phase shift for a given distance of the lens to the in-focus position. We model the phase shift as a skew-normal distribution and verify it empirically. The results show that the skew-normal distribution is indeed a proper model for the phase shift data. We also propose a method based on Bayes’ theorem to determine the lens movement. Experimental results show that the proposed method is able to improve the reliability of PDAF.
In the presence of light bloom or glow, multiple peaks may appear in the focus profile and mislead the autofocus system
of a digital camera to an incorrect in-focus decision. We present a novel method to overcome the blooming effect. The
key idea behind the method is based on the observation that multiple peaks are generated due to the presence of false
features in the captured image, which, in turn, are due to the presence of fringe (or feather) of light extending from the
border of the bright image area. By detecting the fringe area and excluding it from focus measurement, the blooming
effect can be reduced. Experimental results show that the proposed anti-blooming method can indeed improve the
performance of an autofocus system.
This paper concerns the compensation of specular highlight for handheld image projectors. By employing a projector-camera configuration, where the camera is aligned with the viewer, the distortion caused by nonideal (e.g., colored, reflective) projection surfaces can be estimated from the captured image and compensated for accordingly to improve the projection quality. This works fine when the viewing direction relative to the system is fixed. However, the compensation becomes inaccurate when this condition changes, because the position of the specular highlight changes as well. We propose a novel method that, without moving the camera, can estimate the specular highlight seen from any position and integrate it with Grossberg’s radiometric compensation framework to demonstrate how view-dependent compensation can be achieved. Extensive results, both objective and subjective, are provided to demonstrate the performance of the proposed algorithm.
We consider the quality assessment of images displayed on a liquid crystal display (LCD) with dim backlight-a
situation where the power consumption of the LCD is set to a low level. This energy saving mode of LCD decreases the
perceived image quality. In particular, some image regions may appear so dark that they become non-perceptible to
human eye. The problem becomes more severe when the image is illuminated with very dim backlight. Ignoring the
effect of dim backlight on image quality assessment and directly applying an image quality assessment metric to the
entire image may produce results inconsistent with human evaluation. We propose a method to fix the problem. The
proposed method works as a precursor of image quality assessment. Specifically, given an image and the backlight
intensity level of the LCD on which the image is to be displayed, the method automatically classifies the pixels of an
image into perceptible and non-perceptible pixels according to the backlight intensity level and excludes the nonperceptible
pixels from quality assessment. Experimental results are shown to demonstrate the performance of the proposed method.
Switching the backlight of handheld devices to low power mode saves energy but affects the color appearance of an
image. In this paper, we consider the chroma degradation problem and propose an enhancement algorithm that
incorporates the CIECAM02 appearance model to quantitatively characterize the problem. In the proposed algorithm, we
enhance the color appearance of the image in low power mode by weighted linear superposition of the chroma of the
image and that of the estimated dim-backlight image. Subjective tests are carried out to determine the perceptually
optimal weighting and prove the effectiveness of our framework.
The saliency map is useful for many applications such as image compression, display, and visualization. However, the
bottom-up model used in most saliency map construction methods is computationally expensive. The purpose of this
paper is to improve the efficiency of the model for automatic construction of the saliency map of an image while
preserving its accuracy. In particular, we remove the contrast sensitivity function and the visual masking component of
the bottom-up visual attention model and retain the components related to perceptual decomposition and center-surround
interaction that are critical properties of human visual system. The simplified model is verified by performance
comparison with the ground truth. In addition, a salient region enhancement technique is adopted to enhance the
connectivity of the saliency map, and the saliency maps of three color channels are fused to enhance the prediction
accuracy. Experimental results show that the average correlation between our algorithm and the ground truth is close to
that between the original model and the ground truth, while the computational complexity is reduced by 98%.
Automatic white balancing is an important function for digital cameras. It adjusts the color of an image and makes the
image look as if it is taken under canonical light. White balance is usually achieved by estimating the chromaticity of the
illuminant and then using the resulting estimate to compensate the image. The grey world method is the base of most
automatic white balance algorithms. It generally works well but fails when the image contains a large object or
background with a uniform color. The algorithm proposed in this paper solves the problem by considering only pixels
along edges and by imposing an illuminant constraint that confines the possible colors of the light source to a small
range during the estimation of the illuminant. By considering only edge points, we reduce the impact of the dominant
color on the illuminant estimation and obtain a better estimate. By imposing the illuminant constraint, we further
minimize the estimation error. The effectiveness of the proposed algorithm is tested thoroughly. Both objective and
subjective evaluations show that the algorithm is superior to other methods.
As opposed to the global shutter, which starts and stops the light integration of each pixel at the same time by
incorporating a sample-and-hold switch with analog storage in each pixel, the electronic rolling shutter found in most
low-end CMOS image sensors today collects the image data row by row, analogous to an open slit that scans over the
image sequentially. Each row integrates the light when the slit passes over it. Therefore, the scanlines of the image
are not exposed at the same time. This sensor architecture creates an objectionable geometric distortion, known as the
rolling shutter effect, for moving objects. In this paper, we address this problem by using digital image processing
techniques. A mathematical model of the rolling shutter is developed. The relative image motion between the moving
objects and the camera is determined by block-based motion estimation. A Bezier curve fitting is applied to smooth the
resulting motion data , which are then used for the alignment of scanlines. The basic ideas behind the algorithm
presented here can be generalized to deal with other complicated cases.
Multimedia applications running over wireless or other error prone transmission media require compression algorithms that are resilient to channel degradation. This paper presented a data packetization approach to make the emerging ISO JPEG-2000 image compression standard resilient to transmission errors. The proposed technique can be easily extended to other wavelet based-image codec schemes. Extensive simulation results shown that, with the proposed approach, a decoder is able to recover up to 8.5 dB in PSNR with a minimum overhead, and without affecting coding efficiency and spatial/quality scalability. Finally, the proposed approach supports unequal error protection of the wavelet subbands.
We derive a visual image quality metric from a model of human visual processing that takes as its input an original image and a compressed or otherwise altered version of that image. The model has multiple channels tuned to spatial frequency, orientation and color. Channel sensitivities are scaled to match a bandpass achromatic spatial frequency contrast sensitivity function (CSF) and lowpass chromatic CSFs. The model has a constant gain control with parameters based on the results of human psychophysical experiments on pattern masking and contrast induction. These experiments have shown that contrast gain control within the visual system is selective for spatial frequency, orientation and color. The model accommodates this result by placing a contrast gain control within each channel and by letting each channel's gain control be influenced selectively by contrasts within all channels. A simple extension to this model provides predictions of color image quality.
In this paper, we describe an approach to detecting and tracking certain feature points in the mouth region in a talking head sequence. These feature points are interconnected in a polygononal mesh so that the detection and tracking of these points is based on the information not only at these points but also in the surrounding elements. The detection of the nodes in an initial frame is accomplished by a feature detection algorithm. The tracking of these nodes in successive frames is obtained by deforming the mesh so that, when one mesh is warped to the other, the image patterns over corresponding elements in two meshes match with each other. This is accomplished by a modified Newton algorithm which iteratively minimized the error between the two images after mesh-based- warping. The numerical calculation involved in the optimization approach is simplified by using the concept of master elements and shape functions in the finite element method. This algorithm has been applied to a SIF resolution sequence, which contains fairly rapid mouth movement. Our simulation results show that this algorithm can locate and track the feature points in the mouth region quite accurately.
In this paper, we discuss issues related to analysis and synthesis of facial images using speech information. An approach to speaker independent acoustic-assisted image coding and animation is studied. A perceptually based sliding window encoder is proposed. It utilizes the high rate (or oversampled) acoustic viseme sequence from the audio domain for image domain viseme interpolation and smoothing. The image domain visemes in our approach are dynamically constructed from a set of basic visemes. The look-ahead and look-back moving interpolations in the proposed approach provide an effective way to compensate the mismatch between auditory and visual perceptions.
Motion estimation is a key issue in video coding. In very low bitrate applications, the amount of the side information for the motion field presents an important portion of the total bitrate. This paper presents a joint motion estimation, segmentation and coding technique, which tries to reduce the segmentation and motion side information, while providing a similar or smaller prediction error when compared to more classical motion estimation techniques. The main application in mind is a region based coding approach in which the consecutive frames of the video are divided into regions having similar motion vectors with simple shapes, easy to encode.
We utilize speech information to improve the quality of audio-visual communications such as video telephony and videoconferencing. We show that the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion, and demonstrate speech-assisted coding of talking head video. Demonstration sequences are presented. Extensions and other applications are outlined.