Gesture recognition has attracted extensive research interest in the field of human computer interaction. Realtime affine invariant gesture recognition is an important and challenging problem. This paper presents a robust affine view invariant gesture recognition system for realtime LED smart light control. As far as we know, this is the first time that gesture recognition has been applied for control LED smart light in realtime. Employing skin detection, hand blobs captured from a top view camera are first localized and aligned. Subsequently, SVM classifiers trained on HOG features and robust shape features are then utilized for gesture recognition. By accurately recognizing two types of gestures (“gesture 8" and a “5 finger gesture"), a user is enabled to toggle lighting on/off efficiently and control light intensity on a continuous scale. In each case, gesture recognition is rotation- and translation-invariant. Extensive evaluations in an office setting demonstrate the effectiveness and robustness of the proposed gesture recognition algorithm.
In many robotics and automation applications, it is often required to detect a given object and determine its pose (position and orientation) from input images with high speed, high robustness to photometric changes, and high pose accuracy. We propose a new object matching method that improves efficiency over existing approaches by decomposing orientation and position estimation into two cascade steps. In the first step, an initial position and orientation is found by matching with Histogram of Oriented Gradients (HOG), reducing orientation search from 2D template matching to 1D correlation matching. In the second step, a more precise orientation and position is computed by matching based on Dominant Orientation Template (DOT), using robust edge orientation features. The cascade combination of the HOG and DOT feature for high-speed and robust object matching is the key novelty of the proposed method. Experimental evaluation was performed with real-world single-object and multi-object inspection datasets, using software implementations on an Atom CPU platform. Our results show that the proposed method achieves significant speed improvement compared to an already accelerated template matching method at comparable accuracy performance.
Traditional methodologies for primary selection usually consider the optimization of parameters that characterize the global performance of the display system, such as the luminance of the white point, gamut volume, and power consumption. We propose a methodology for primary design that optimizes a figure of merit designed to favor gamuts for which maximum luminance at each chromaticity is uniformly related to the corresponding maximum luminance over the set of optimal colors. We contrast the results obtained with the proposed methodology with those obtained by an alternative strategy based on the optimization of gamut volume, and analyze differences in performance between these approaches for both three and four primary systems. Results indicate that the global vs local design choices result in significantly different primary designs.
Primary selection plays a fundamental role in display design. Primaries affect not only the gamut of colors the
systems is able to reproduce, but also, they have an impact on the power consumption and other cost related
variables. Using more than the traditional three primaries has been shown to be a versatile way of extending
the color gamut, widening the angle view of LCD screens and improving power consumption of displays systems.
Adequate selection of primaries requires a trade-off between the multiple benefits the system offers, the costs
and the complexity it implies, among other design parameters.
The purpose of this work is to present a methodology for optimal design for three primary and multiprimary
display systems. We consider the gamut in perceptual spaces, which offer the advantage of an evaluation that
correlates with human perception, and determine a design that maximize the gamut volume, constrained to
a certain power budget, and analyze the benefits of increasing number of primaries, and their effect on other
variables of performance like gamut coverage.
Nowadays, multi-domain vertical alignment (MVA) LCD technology is becoming the mainstream approach for LCTV
industry due to its high contrast ratio and wide viewing angle property. However, like many other types of LCD devices,
MVA still has the problem of color shift, namely color washout, between normal and oblique viewing angle. In
particular, MVA shows an evident gamma curve distortion at large oblique viewing angle. This paper formulates the
MVA LCTV color performance in two stages. First, an in-depth LCD characterization was performed on an MVA
LCTV with consideration of spectral characteristic, backlight leakage, sub-pixel crosstalk, primary shift. Also,
micrograph based sub-pixel analysis was employed and suggested that the off-axis color shift problem of MVA can be
improved by using dual sub-pixel technology. Next, a generic LCD color model was proposed to relate the RGB input
with the output tristimulus value in XYZ. Sub-pixel based image rendering was performed and the results show the
feasibility of LCD color prediction with high colorimetric accuracy both at on-axis and off-axis viewing condition by
using the proposed generic color model.
One of the image quality issues of LC TV is the motion blur. In this paper, the LCD motion blur is modeled using a frequency domain analysis, where the motion of an object causes temporal component in the spatial/temporal spectrum. The combination of display temporal low-pass filtering and eye tracking causes the perception of motion blur. One way to reduce motion blur is to use backlight flashing, where the shorter "on" duration reduces the display temporal aperture
function, thus improves the temporal transfer function of the display. The backlight flashing was implemented on a LCD with a backlight system consisting of an array of light emitting diodes (LED). The LED can be flashed on for a short duration after LCD reaches the target level. The effect of motion blur reduction was evaluated both objectively and subjectively. In the objective experiment, the retina image is derived from a sequence of captured images using a high speed camera. The subjective study compares the motion blur to an edge with a simulated edge blur. The comparison of objective and subjective experiments shows a good agreement. Both objective measurement and subjective experiment shows clear improvement in motion blur reduction with synchronized backlight flashing.
The drive for larger size, higher spatial resolution, and wider aperture LCD has shown to increase the electrical crosstalk between electrodes in the driver circuit. This crosstalk leads to additivity errors in color LCD. In this paper, the LCD color crosstalk was modeled using a capacitance coupling mode and the crosstalk effect was analyzed with micrographs captured from an imaging colorimeter. The experimental result reveals the subpixel nature of color crosstalk that whenever any two neighboring subpixels are “on” at the same time, there is crosstalk from one subpixel to another, but whenever there is one “off” subpixel between the two “on” subpixels, there is no crosstalk between the “on” subpixels. There is positive crosstalk from right to left across all three subpixels. Based on this crosstalk model, the crosstalk of a LCD was characterized and a spatial subpixel crosstalk correction algorithm was developed to improve the color performance of LCD. The correction algorithm reduced crosstalk by a factor of 16. Compared to a 3D lookup table approach, the new algorithm is easier to implement and more accurate in performance.
Contone imagery usually has eight bits per pixel for each of the three primaries in typical displays. However, there are
often points in the imaging pipeline that constrain this number for cost reasons. Conversely, higher quality displays seek
to achieve 9-10 bits/pixel/color, though there may be system bottlenecks limited at 8. In both cases, a goal is to achieve a
higher perceived bit-depth quality than is afforded by the imaging system. The two main artifacts caused by reduced bitdepth
are contouring and loss of low amplitude detail. Prevention of these distortions can be accomplished by applying
a dithering process before the bit-depth limitation. A technique for achieving bit-depth extension via spatiotemporal
dithering has been previously been presented . In applications where it is only possible to affect the image after the
bit-depth losses have already occurred, it is impossible to accurately restore the loss of low-amplitude detail. However, it
is possible to remove the false contours. Of the several approaches used to remove false contours, we will discuss
predictive cancellation and its dependence on the spatial frequency localization and masking properties of the visual
system. We discuss the key visual properties that arose while investigating these two applications, which include the
optical transfer function (OTF) of the eye, masking by noise, and contour integration.
Document scanners are used to convert paper documents to digital format for document distribution or archiving. Scanners are also used in copier and fax machine to convert document to electrical signal in analog and digital format. Most document scanners use white backing to avoid black border or black hole in scanned images. One problem with white backing is that show-through from the backside is visible for duplex printed (two sided) documents. This paper describes an optical method to eliminate show-through without reverting back to the black border or black hole. The scanner cover is made into a saw-tooth shaped mirror surface. The surface is oriented so that it reflects the light from the scanner lamp to the scanner lens. When scanning the scanner cover as in the case of a hole in the paper, it reflects light (specular reflection) from the scanner lamp directly to the scanner lens. Because the scanner lamp is much brighter than
the reflected light from the document, only a small portion of the reflected light is needed to have the same output as scanning a piece of white paper. Radiometric calculation shows that this new approach can reduce the overall reflection from the scanner cover to 8% when scanning a document, and yet, appear to be white when no document is in between the cover and scan bar. The show-through is greatly reduced due to this reduced overall reflection from the scanner
Continuous tone, or “contone”, imagery usually has 24 bits/pixel as a minimum, with eight bits each for the three primaries in typical displays. However, lower-cost displays constrain this number because of various system limitations. Conversely, high quality displays seek to achieve 9-10 bits/pixel/color, though there may be system bottlenecks limited at 8. The two main artifacts from reduced bit-depth are contouring and loss of amplitude detail; these can be prevented by dithering the image prior to these bit-depth losses. Early work in this area includes Roberts’ noise modulation technique, Mista’s blue noise mask, Tyler’s technique of bit-stealing, and Mulligan’s use of the visual system’s spatiotemporal properties for spatiotemporal dithering. However, most halftoning/dithering work was primarily directed to displays at the lower end of bits/pixel (e.g., 1 bit as in halftoning) and higher ppi. Like Tyler, we approach the problem from the higher end of bits/pixel/color, say 6-8, and use available high frequency color content to generate even higher luminance amplitude resolution. Bit-depth extension with a high starting bit-depth (and often lower spatial resolution) changes the game substantially from halftoning experience. For example, complex algorithms like error diffusion and annealing are not needed, just the simple addition of noise. Instead of a spatial dither, it is better to use an amplitude dither, termed <i>microdither</i> by Pappas. We have looked at methods of generating the highest invisible opponent color spatiotemporal noise and other patterns, and have used Ahumada’s concept of equivalent input noise to guide our work. This paper will report on techniques and observations made in achieving contone quality on ~100 ppi 6 bits/pixel/color LCD displays with no visible dither patterns, noise, contours, or loss of amplitude detail at viewing distances as close as the near focus limit (~120 mm). These include the interaction of display nonlinearities and their role of generating a low-spatial frequency flicker from mutually high-pass spatial and temporal noise, as well as the temporal response symmetries.
While the use of visual models for assessing all aspects of the imaging chain is steadily increasing, one hindrance is the complexity of these models. This has impact in two ways - not only does it take longer to run the more complex visual model, making it difficult to place into optimization loops, but it also takes longer to code, test, and calibrate the model. As a result, a number of shortcut models have been proposed and used. Some of the shortcuts involve more efficient frequency transforms, such as using a Cartesian separable wavelet, while other types of shortcuts involve omitting the steps required to simulate certain visual mechanisms, such as masking. A key example of the latter is spatial CIELAB, which only models the opponent color CSFs and does not model the spatial frequency channels. Watson's recent analysis of the Modelfest data showed that while a multi-channel model did give the best performance, versions dispensing with the complex frequency bank and just using frequency attenuation did nearly as well. Of course, the Modelfest data addressed detection of a signal on a uniform field, so no masking properties were probed. On the other end of complexity is the model by D'Zmura, which not only includes radial and orientation channels, but also the interactions between the channels in both luminance and color. This talk will dissect several types of practical distortions that require more advanced visual models. One of these will be the need for orientation channels to predict edge jaggies due to aliasing. Other visual mechanisms in search of an exigent application that we will explore include cross luminance-chrominance masking and facilitation, local contrast, and cross-channel masking.
Binary image compression is different from contone image compression. Binary image compression ratio varies greatly with halftoning algorithm as well as image type. Most binary compression methods cannot efficiently compress images halftoned using frequency modulation (FM) screening or error diffusion. The blue noise characteristic of the output pattern makes all run-length based compression algorithms ineffective. In this paper, we describe a method that combines prior information about the halftone screen used in the halftone process with local statistics to improve the prediction of the FM screened halftone image. The binary image is first broken into sub-blocks and the mean of each block is calculated. This block mean and the halftone screen are used to generate a predicted image. A residual image, the difference between the predicted image and the original halftone image, can be constructed by performing an exclusive OR between the original image and the predicted image. Since there is strong correlation between this predicted pattern and the original halftone image, the residual image consists mainly of zeros. The residual image can then be compressed with run-length encoding algorithms. We applied this method to a number of test images with both photo and text content; the compression ratio is improved by up to a factor of 10 as compared to a standard run-length encoding algorithm.
Color misregistration is one of the common artifacts for 3-CCD desktop scanners. The misregistration of red, green and blue image layers causes both color fringing and blur in the scanned images. These effects are quantified by linear system theory analysis. Knowing the bandwidth and peak sensitivity asymmetries in the opponent color representation of the visual system, we developed a method to reduce the color misregistration artifact by attempting to capture signals in an approximate opponent color space. A new sensor arrangement facilitates this goal, in which the luminance and chrominance signals are captured independently. The luminance signal (Y) is captured at the full resolution using one row of the 3-row CCD linear arrays. The first chrominance signal is captured on another row with an interleaved half resolution red (R) and half resolution luminance sensor elements, and the second chrominance signal is similarly captured on a third row using blue (B) and luminance (Y). Since each luminance and chrominance signal is isolated on a single row, and since there is no registration error within a row, color misregistration is theoretically prevented in luminance as well as in the chrominance signals. Simulation shows that the new method does eliminate the blur and reduces the visibility of color fringing. Since residual luminance and chrominance misregistration may occur, a psychophysical experiment was conducted to judge the improvement in the scanned image quality. The experiment shows that this new capture scheme can significantly reduce the perception of misregistration artifacts.
In this paper, we describe a visual experiment to measure the contrast detection threshold of both halftone image and continuous tone image. A continuous tone sinusoidal grating was halftoned with a classical 45 degree dot screen. A calibrated CRT monitor was used to display images. The observers were asked to make a forced choice that whether the displayed image contains a grating pattern. The contrast detection threshold was determined using Probit analysis. The threshold elevation, the ratio of contrast threshold for halftone grating to continuous tone grating, was calculated based on the measured contrast detection threshold. It was found that the threshold elevation strongly depends on halftone dot frequency. At a high halftone frequency, there is little difference in the measured contrast detection threshold between continuous tone grating and halftone grating, but at a lower halftone frequency, the detection threshold is significantly higher for halftone grating than that of the continuous tone grating. The threshold elevation is much higher for the gratings oriented at 45 degree where the peaks of the halftone frequency lies. A multiple channel vision model was implemented to predict the visual difference for both continuous tone and halftone image. The model correctly predicted the grating detection threshold of continuous tone grating, but it fails to predict the threshold elevation due to halftone.
Proc. SPIE. 2654, Solid State Sensor Arrays and CCD Cameras
KEYWORDS: Signal to noise ratio, Imaging systems, Sensors, Image processing, Scanners, Interference (communication), Signal processing, Charge-coupled devices, Modulation transfer functions, Algorithm development
Scanner noise is one of the fundamental parameters of image quality. In this paper, we present an algorithm developed to derive the noise of a scanner using the 2D Wiener spectra of the test pattern and the scanner's MTF. The Wiener spectra of the test pattern was measured and its contribution to the measured RMS noise was estimated by integrating the volume under the product of the test pattern Wiener spectra and the scanner's MTF. The test pattern contribution was then removed from the measured noise. The derived noise agrees very well with the noise model for both drum scanner and CCD scanners. The structured 1D noise is also of interest especially when evaluating CCD scanner systems. A method was described to accurately determine 1D structured noise by averaging over fast scan and slow scan directions. Finally an experiment was conducted to verify the noise measurement technique. The true noise of a drum scanner was measured at its analog output terminal, and was compared to the noise estimated with the proposed new noise metric. The agreement between hardware measured noise and the estimated noise is very good with RMS error of less than 0.001 in reflectance unit. With this new technique, we can effectively improve the noise measurement accuracy by a factor of up to 500% for a photographic test pattern.
This paper describes a method of modeling and testing of a modular imaging spectrometer instrument (MISI), with special emphasis on system and sub-system modulation transfer function (MTF) analysis. The optical system was modeled using optical ray tracing methods. The dynamic deformation of the scan mirror was modeled using a finite element analysis method, and the image degradation due to the deformation is estimated using optical image formation theory. The detector and conditioning electronics were also modeled using the transfer function theory. This modeling approach was used as a tradeoff tool for the design of MISI. Laboratory experiments were conducted to test the performances of each sub-system on design criteria, and finally a field test is planned to test the overall optical/mechanical/electrical performance of the entire imaging chain.
The Digital Imaging and Remote Sensing Laboratory at the Rochester Institute of Technology is developing a new airborne multispectral imaging scanner. One of the most critical components of the scanner system is the scan mirror assembly. The scan mirror must satisfy at least two basic requirements: (1) optical image quality: the image blur caused by deformation of the mirror surface should not exceed the detector size, and (2) mechanical stability: the scan mirror assembly must be dynamically balanced to prevent vibration due to centrifugal force. Due to the large size (6-in. diameter) and high rotation speed (4800 rpm), these two requirements are difficult to meet at the same time. We present a modeling approach for evaluation of mechanical design alternatives using image quality metrics. Several mirror design configurations were evaluated. Each configuration was modeled using a finite element analysis method. The deformation of the mirror surface as well as the centrifugal forces were calculated. The image quality was modeled using optical image formation theory. The modeling approach was validated experimentally. A 3-in. scan mirror was modeled using the same procedures, and the line spread function (LSF) of the scan mirror due to the deformation at high speed was calculated. The actual LSF at that speed was also measured using a CCD linear array camera. The test results obtained with a 3-in. mirror agree with the model within 20% in the width of the LSF. (Approximately 500% error is observed if no distortion is assumed.)