Most eye localization methods suffer from illumination variation. To overcome this problem, we propose an illumination normalization technique as a preprocessing step before localizing eyes. This technique requires no training process, no assumption on the light conditions, and no alignment between different images for illumination normalization. Moreover, it is fast and thus effective for real-time applications. Experiment results verify the effectiveness and efficiency of the eye localization scheme with the proposed illumination normalization technique.
TOPICS: Video, Video surveillance, Computer programming, Digital watermarking, Video compression, Data compression, Electrical engineering, Affine motion model, Algorithm development, Information technology
We propose a frame-matching algorithm for video sequences, when a video sequence is modified from its original through frame removal, insertion, shuffling, and data compression. The proposed matching algorithm defines an effective matching cost function and minimizes cost using dynamic programming. Experimental results show that the proposed algorithm provides a significantly lower probability of matching errors than the conventional algorithm.
Image segmentation is one of the most important tasks of image processing, as it provides information used to interpret and analyze image contents. The tuning of the parameters of the segmentation method can be considered an optimization problem by defining an objective function based on the similarity of the segmented image and the ground truth. The problem becomes harder to solve when the ground truth is known only under uncertainty. A solution is proposed for the design and the automatic tuning of a real-time segmentation method for infrared images where the ground truth is uncertain. The proposed solution consists of three steps: the proposal of a segmentation method adapted for the considered images, the definition of an objective function that takes the uncertainty of the ground truth into account, and the automatic tuning of the segmentation method by means of genetic algorithms.
3-D scanning has become increasingly popular in a wide range of applications. We present a prototype 3-D/stereoscopic scanning system based on a cheap, readily available flatbed scanner. Stereoscopic imaging is achieved by modifying the optical path of the ordinary flatbed scanner. The results of 3-D imaging using our prototype system are demonstrated, and a number of design alternatives are discussed.
A new steganographic method for data hiding in jig swap puzzle images is proposed. First, a color image is taken as input and divided into blocks. Second, each block is rearranged to a new position according to the secret data and a stegokey. The resulting image is a perfect jig swap puzzle. The original image is needed for extracting the secret data. Under the assumption that the receiver and the sender share some common images, the receiver can extract the secret data from the jig swap puzzle image. We also present a scenario for secret data transmission based on an online jig swap puzzle. Experimental results show that the proposed method is undetectable and robust under lossy image compression and format conversion.
Image thresholding is a very common image processing operation, since almost all image processing schemes need some sort of separation of the pixels into different classes. In order to determine the thresholds, most methods analyze the histogram of the image. The optimal thresholds are often found by either minimizing or maximizing an objective function with respect to the values of the thresholds. By defining two classes of objective functions for which the optimal thresholds can be found by efficient algorithms, this paper provides a framework for determining the solution approach for current and future multilevel thresholding algorithms. We show, for example, that the method proposed by Otsu and other well-known methods have objective functions belonging to these classes. By implementing the algorithms in ANSI C and comparing their execution times, we can also make quantitative statements about their performance.
Block-matching motion estimation plays an important role in real-time video compression and thus has significant impact on searching speed and quality of performance. In order to address these issues, we introduce a highly efficient block motion estimation algorithm, referred to as a predictive cross-hexagon search (PCHS) algorithm, that can considerably reduce the complexity of the Joint Video Team (JVT) encoder. In contrast to many classical fast motion estimation algorithms, PCHS has three desirable features: (1) prediction of a search center, (2) usage of search patterns with different sizes, and (3) early algorithm termination that makes it adaptive and effective. We set four predictor candidates for initial search point options and then increase the accuracy of the predictor. The different-size search patterns, including small cross search patterns, hexagon search patterns, and cross-hexagon search patterns, used in the searching process can better suit more motion types. Due to the high accuracy of the predictor, the proposed algorithm adapts early termination; as the predictor is good enough, the search stops early. Therefore, the PCHS algorithm is suitable for real-time video encoding, as it can speed up the encoder without sacrificing performance compared with other fast algorithms.
To reduce the cost of a machine-vision system for pharmaceutical capsule inspection, a custom approach is explored. In order to minimize cost while maintain high data throughput, the USB 2.0 interface is used. By developing custom USB 2.0 cameras with minimal hardware and using conventional PCs to perform the image processing algorithms, a low-cost yet versatile real-time system is developed. We discusses the development of a custom USB 2.0 camera and its associated hardware, the use of PCs to acquire image data and perform inspection algorithms, and a custom system controller that synchronizes system signals and manages passed and failed capsules in real time.
To improve the spatial resolution of video, a superresolution reconstruction method based on a sliding window is proposed utilizing the movement information between frames in the low-resolution video. We propose a registration algorithm based on a four-parameter transformation model through Taylor series expansion, using an iterative solving method as well as the Gaussian pyramid image model to estimate the movement parameters from coarseness to fine. Superresolution frames are reconstructed using an iterative back projection (IBP) algorithm. We also present the suitable length of the sliding window and the reasonable iteration number of the IBP algorithm in the video superresolution reconstruction. Our algorithm is compared to other algorithms on simulated images and actual color videos. Both show that our registration algorithm achieves higher subpixel accuracy than other algorithms, even in the case of large movements, and that the reconstructed video has better visual effects and stronger resolution ability. It can be extensively applied to the superresolution reconstruction of video sequences in which the frames are different from each other mainly by translation and rotation.
Tsinghua University, Department of Computer Science and Technology, Beijing, China 100084
Tsinghua University, School of Software, Beijing, China 100084
We propose a novel multiscale decomposition (MSD) image fusion algorithm, which is region-based image fusion using bidimensional empirical mode decomposition (BEMD). BEMD is a 2-D data-driven decomposition derived from the empirical mode decomposition (EMD), which does not require predetermined filter or wavelet function. The input images are decomposed into a number of intrinsic mode functions (IMFs) as well as a residual image. The fusion is performed region by region based on the segmentations of the input images to produce composite BEMD representation, and then the inverse BEMD transform is applied to obtain the fused image. Experiments show that the proposed image fusion algorithm provides superior performance over traditional fusion schemes in terms of both objective metrics and visual quality.
We present a scalable registration algorithm for aligning large-frame imagery compressed with the JPEG2000 coding standard. Unlike traditional approaches, the proposed method registers the images in the compressed domain, which eliminates the need to reconstruct the full image prior to performing registration. Two forms of scalability are exploited during registration: resolution and quality. Resolution scalability results from the native multiresolution image representation of the discrete wavelet transform utilized as a building block in JPEG2000. Quality scalability relates to the embedded block coding with optimal truncation (EBCOT) used for compressing the wavelet coefficients. This combination allows registration on selectable resolution levels and quality layers, which enables registration of large-frame imagery at low bit rates over constrained bandwidth channels. Furthermore, the hierarchical nature of the algorithm provides a trade-off between registration accuracy and computational complexity. Experimental results show that the proposed algorithm exhibits consistent registration performance across a range of quality levels (3.5 to 0.5 bpp) for frames sizes of 2 K×4 K. We present simulation results with imagery collected from a prototype persistent surveillance system to demonstrate the feasibility of the proposed algorithm in real-world scenarios.
In the field of digital image processing, the description of image content is one of the most crucial tasks. Indeed, it is a mandatory step for various applications, such as industrial vision, medical imaging, content-based image retrieval, etc. The description of the image content is achieved through the computation of some predefined features, which can be performed at different scales. Among global features that describe the content of the whole image, the gray level histogram focuses on the distribution of gray levels within the image, while morphological features (e.g., the pattern spectrum) measure the distribution of object sizes in the image. Despite their broad interest, such morphological size-distribution features are limited due to their monodimensional nature. Our goal is to review multidimensional extensions of these features able to deal with complementary information (such as shape, orientation, spectral, intensity, or spatial information). Moreover, we illustrate each multidimensional feature by an illustrative example that shows their relevance compared to the standard morphological size distribution. These features can be seen as relevant solutions when the standard monodimensional features fail to accurately represent the image content.
A two-stage method for detecting microcalcifications in mammograms is presented. In the first stage, the determination of the candidates for microcalcifications is performed. For this purpose, a 2-D linear prediction error filter is applied, and for those pixels where the prediction error is larger than a threshold, a statistical measure is calculated to determine whether they are candidates for microcalcifications or not. In the second stage, a feature vector is derived for each candidate, and after a classification step using a support vector machine, the final detection is performed. The algorithm is tested with 40 mammographic images, from Screen Test: The Alberta Program for the Early Detection of Breast Cancer with 50-µm resolution, and the results are evaluated using a free-response receiver operating characteristics curve. Two different analyses are performed: an individual microcalcification detection analysis and a cluster analysis. In the analysis of individual microcalcifications, detection sensitivity values of 0.75 and 0.81 are obtained at 2.6 and 6.2 false positives per image, on the average, respectively. The best performance is characterized by a sensitivity of 0.89, a specificity of 0.99, and a positive predictive value of 0.79. In cluster analysis, a sensitivity value of 0.97 is obtained at 1.77 false positives per image, and a value of 0.90 is achieved at 0.94 false positive per image.
We propose the use of the morphological pattern spectrum, or pecstrum, as the base of a biometric shape-based hand recognition system. The system receives an image of the right hand of a subject in an unconstrained pose, which is captured with a commercial flatbed scanner. According to pecstrum property of invariance to translation and rotation, the system does not require the use of pegs for a fixed hand position, which simplifies the image acquisition process. This novel feature-extraction method is tested using a Euclidean distance classifier for identification and verification cases, obtaining 97% correct identification, and an equal error rate (EER) of 0.0285 (2.85%) for the verification mode. The obtained results indicate that the pattern spectrum represents a good feature-extraction alternative for low- and medium-level hand-shape-based biometric applications.
We propose a novel and accurate technique based on edge map for the determination of core points in fingerprint images. An edge map is obtained by detecting the edges on a smoothed orientation map. An edge pixel deletion operation is then performed on the edge map based on gradient information. Last, upper and lower core points can be extracted by analyzing the orientation consistency of a few edge pixels. Experimental results show that the proposed method can effectively detect the core points with high speed for all types of fingerprints.
Deinterlacing is a method to construct a complete image from an interlaced signal. The interlaced signal format is adopted by the Natural Television System Committee (NTSC) based on eye remanence. In previous work, such as traditional edge line averaging (ELA), it used the intra-interpolation to find the minimum difference value without considering the edge and boundary existence. Consequently, it will cause the interpolation value to be blurred at the edge. A novel algorithm, an edge-based correlation adaptive (ECA) method, is proposed. ECA is based on various edge directions to detect the edge. This new intrafield method has better performance on smoothing the edge and stripe. ECA is improved by using a weighted summation of the ELA component to facilitate the interpolation result. We also interpolate the half-pixel value to increase the accuracy for edge detection. We also mention the architecture and very large scale integration (VLSI) implement results.
A thorough analysis of discrete polynomial moments and their suitability for application to geometric surface inspection is presented. A new approach is taken to the analysis based on matrix algebra, revealing some formerly unknown fundamental properties. It is proven that there is one and only one unitary polynomial basis that is complete, i.e., the polynomial basis for a Chebychev system. Furthermore, it is proven that the errors in the computation of moments are almost exclusively associated with the application of the recurrence relationship, and it is shown that QR decomposition can be used to eliminate the systematic propagation of errors. It is also shown that QR decomposition produces a truly orthogonal basis set despite the presence of stochastic errors. Fourier analysis is applied to the polynomial bases to determine the spectral distribution of the numerical errors. The new unitary basis offers almost perfect numerical behavior, enabling the modeling of larger images with higher-degree polynomials for the first time. The application of a unitary polynomial basis eliminates the need to compute pseudo-inverses. This improvement in numerical efficiency enables real-time modeling of surfaces in industrial surface inspection. Two applications in industrial quality control via artificial vision are demonstrated.