When viewing images on a monitor, we are adapted to the lighting conditions of our viewing environment as well as the monitor itself, which can be very different from the lighting conditions in which the images were taken. As a result, our perception of these photographs depends directly on the environment in which they are displayed. For high-dynamic-range images, the disconnect in the perception of scene and viewing environments is potentially much larger than in conventional film and photography. To prepare an image for display, luminance compression alone is therefore not sufficient. We propose to augment current tone reproduction operators with the application of color appearance models as an independent preprocessing step to preserve chromatic appearance across scene and display environments. The method is independent of any specific tone reproduction operator and color appearance model (CAM) so that for each application the most suitable tone reproduction operator and CAM can be selected.
We test perception of 3-D spatial relations in 3-D images rendered by a 3-D display and compare it to that of a high-resolution flat panel display. Our 3-D display is a device that renders a 3-D image by displaying, in rapid succession, radial slices through the scene on a rotating screen. The image is contained in a glass globe and can be viewed from virtually any direction. We conduct a psychophysical experiment where objects with varying complexity are used as stimuli. On each trial, an object or a distorted version is shown at an arbitrary orientation. The subject's task is to decide whether or not the object is distorted under several viewing conditions (monocular/binocular, with/without motion parallax, and near/far). The subject's performance is measured by the detectability d, a conventional dependent variable in signal detection experiments. Highest d values are measured for the 3-D display when the subject is allowed to walk around the display.
The sensor element of an imaging system should be mounted into its housing in such a way that the scene can be properly focused onto the sensor element's focal plane over the active area. Operational imaging requirements are forcing increasingly smaller tolerances on sensor alignment, and manufacturing systems must improve alignment capability to keep pace. Imaging system designs include reference datums that provide the basis for manufacturing alignment of optical components in each subassembly. Design constraints for alignment of the sensor element into the camera housing typically include x,y,z, clocking, and parallelism specifications. Measurement of z and parallelism positioning is often problematic, since the relevant reference datum features are often beneath the mounting platform and are obscured to the measurement system. General algorithms for determining sensor chip alignment when datum features are inaccessible to the measurement system are described. Precharacterization measurements of datum surfaces are used to determine datum locations during alignment measurement. The algorithms are useful for active manufacturing alignment as well as postmounting alignment measurement. The algorithms are successfully implemented for ultraprecision, active manufacturing alignment, and postalignment measurement of IR imaging systems.
Markov chain models (MCMs) were recently adopted in many recognition applications. The well-known clustering algorithm, the k-means algorithm, is used to design the codebooks of the MCM, and then each code word in the codebook is regarded as one state of MCM. However, users usually have no idea how to determine the number of states before the design of the MCM, and therefore doubt whether the MCM produced by the k-means algorithm is optimal. We propose a new MCM based on the genetic algorithm for recognition applications. Genetic algorithms combine the clustering algorithm and the MCM design. The users do not need to define the size of the codebook before the design of the MCM. The genetic algorithm can automatically find the number of states in MCM, and thereby obtain a near-optimal MCM. Furthermore, we propose the fuzzy MCM (FMCM) and the fuzzy genetic algorithm (FGA) to enhance the recognition rate. Experimental results show that the proposed MCM outperforms the traditional MCM and other texture and speech recognition methods.
In this paper, we show how Markovian strategies used to solve well-known segmentation problems such as motion estimation, motion detection, motion segmentation, stereovision, and color segmentation can be significantly accelerated when implemented on programmable graphics hardware. More precisely, we expose how the parallel abilities of a standard graphics processing unit usually devoted to image synthesis can be used to infer the labels of a segmentation map. The problems we address are stated in the sense of the maximum a posteriori with an energy-based or probabilistic formulation, depending on the application. In every case, the label field is inferred with an optimization algorithm such as iterated conditional mode (ICM) or simulated annealing. In the case of probabilistic segmentation, mixture parameters are estimated with the K-means and the iterative conditional estimation (ICE) procedure. For both the optimization and the parameter estimation algorithms, the graphics processor unit's (GPU's) fragment processor is used to update in parallel every labels of the segmentation map, while rendering passes and graphics textures are used to simulate optimization iterations. The hardware results obtained with a mid-end graphics card, show that these Markovian applications can be accelerated by a factor of 4 to 200 without requiring any advanced skills in hardware programming.
A nonlinear functional is considered for segmentation of images containing structural textures. A structural texture pattern in an image is characterized by a certain amplitude spectrum, and segmentation of different patterns is obtained by detecting different regions with different amplitude spectra. A gradient-descent-based algorithm is proposed by deriving equations minimizing the functional. This algorithm, implementing the solutions minimizing the functional, is based on the level set method. An effective method employed in this algorithm is shown to be robust in a noisy environment. Experimental results demonstrate that the proposed method outperforms segmentation obtained by using the simulated annealing algorithm based on Gaussian Markov random fields.
A method for the automatic 3-D segmentation of the spinal canal in computed tomographic (CT) images is presented. The method uses a priori radiological and anatomical information, mathematical morphology, and region-growing methods. The skin and peripheral fat structures are determined by delineating the air and other materials external to the body. Using the fat layer as a reference, the bone structure is segmented. The Hough transform for the detection of circles is applied to a cropped bone edge map that includes the thoracic vertebral structure. The centers of the detected circles are used to derive the information required for the fuzzy connectivity algorithm that is employed to segment the spinal canal. In a preliminary study, the method successfully segmented the spinal canal in eight CT volumes of four patients, with Hausdorff distances with reference to contours drawn independently by a radiologist in the range 1.7±0.8 mm.
We present our research efforts toward the deployment of 3-D sensing technology to an under-vehicle inspection robot. The 3-D sensing modality provides flexibility with ambient lighting and illumination in addition to the ease of visualization, mobility, and increased confidence toward inspection. We leverage laser-based range-imaging techniques to reconstruct the scene of interest and address various design challenges in the scene modeling pipeline. On these 3-D mesh models, we propose a curvature-based surface feature toward the interpretation of the reconstructed 3-D geometry. The curvature variation measure (CVM) that we define as the entropic measure of curvature quantifies surface complexity indicative of the information present in the surface. We are able to segment the digitized mesh models into smooth patches and represent the automotive scene as a graph network of patches. The CVM at the nodes of the graph describes the surface patch. We demonstrate the descriptiveness of the CVM on manufacturer CAD and laser-scanned models.
With the wide use of digital multimedia equipment, exchange of image data has become prevalent. Images and video data are exchanged through personal computers, mobile phones, and personal digital assistants (PDAs). Since the resolutions of each image display are different, it is necessary to change the image size accordingly. An image-resizing algorithm in the discrete cosine transform (DCT) domain is known to be fast for a compressed image. Most of the DCT domain methods truncate the high-frequency components during image downsampling and they are assumed to be zero to upsample images. We estimate the high-frequency parts using the correlation between the low- and the high-frequency components, and compare the peak SNR (PSNR) performances. We verify that the use of correlation is the best linear estimation in the mean square error sense.
We present a region of interest (ROI) based rate control for H.263 compatible video conferencing. A face detection and tracking scheme with very low complexity is proposed for segmentation. By analyzing quadratic rate models in frame layer, video object plane (VOP) layer, and macroblock (MB) layer extracted from the test data, a quadratic rate model at the MB layer with a modified physical meaning is proposed to improve the model accuracy. The basic idea is to use a group of uncoded MBs in the current VOP instead of individual MBs to update model parameters. A joint VOP layer and MB layer rate control algorithm is proposed. The VOP layer rate control assigns target bit rate for each VOP based on the coding complexity and visual importance, and determines an average quantization parameter (QP) for each VOP. Some new features of MB layer rate control are designed to utilize both average statistics of a VOP and individual statistics of MB together. The performance is compared with conventional TMN8 and object-based VM8 rate control, better peak SNR (PSNR) for ROI, and more accurate rate control can be achieved for various video sequences. The proposed rate control algorithm can be extended for H.264 ROI-based scalable video coding.
The advances in multimedia and networking technologies have created opportunities for Internet pirates, who can copy digital contents and illegally distribute them, thus violating the legal rights of content owners. In such a situation, digital watermarking has gained popularity as a main technology to implement the copyright protection of multimedia digital contents distributed on the Internet. We present a novel "nonblind" watermarking procedure for JPEG images based on the use of protected extensible markup language (XML) documents. The procedure enables the copyright owner to insert a distinct watermark code identifying the buyer into the distributed images. Furthermore, to increase the security and robustness levels of the procedure, the watermark is repeatedly embedded into an image in the discrete cosine transform (DCT) domain at different frequencies and by exploiting both block classification techniques and perceptual analysis. The embedded watermark is then extracted from an image according to the information contained in a protected XML document that is associated to the image.
We propose a new asymmetric watermarking scheme. In the proposed scheme, two linear transformation matrices are constructed. One matrix is secret and applied to a secret watermark to form a public asymmetric watermark. The other matrix is public and non-full rank, which is used to perform a public (asymmetric) detection using a correlation test. Theory analysis and simulation results show that the proposed scheme not only can provide a reliable asymmetric detection, but also ensures that the embedded watermark can be detected even if the asymmetric detection is disabled.