Eye localization has been an intensive research topic for its wide applications in face recognition, driver fatigue monitoring systems, etc.1, 2, 3, 4 Unfortunately, it is still very hard to find an illumination-independent eye localization algorithm. Thus, the active infrared-based approaches are employed to diminish the influence of illumination variation.1, 2 However, eye localization in visible-spectrum images is more relevant, since it is more practical in applications, without needing extra equipments. Consequently, it is often essential to normalize illumination before localizing eyes in visible-spectrum images of widely varied lighting. Jung 3 apply the self quotient image (SQI) to rectify illumination and get satisfactory eye localization results on their private face database. However, in SQI, the image is normalized by division over its smoothed version, which depends on the kernel size of the weighed Gaussian filter to a great extent. The kernel size is rather difficult to determine since the intrinsic information will be severely reduced if the kernel size is small or halo effects might appear if the kernel size is large. Although a multiscale technique is adopted to alleviate this problem, computational cost is added, and sometimes overcompensation regions still exist. In addition, the image noise would be amplified by the division operation, which also deteriorates the performance of eye localization. This letter proposes to take advantage of a more effective illumination normalization method, the logarithmic total variation (LTV) model,5 as a preprocessing step for eye localization, and it validates the performance of this eye localization scheme on some public face databases. In Sec. 2, the LTV model is presented to normalize illumination on the face. Since the computational cost of the LTV model is expensive for real-time applications, a modified graph cut–based algorithm is proposed to solve the model in Sec. 3 so that the proposed preprocessing step is accelerated. Experimental results are given in Sec. 4, followed by conclusions in Sec. 5.
The LTV Model for Illumination Normalization
According to the Lambertian model, the captured face image can be represented asis the albedo of the object surface, and is the final light strength received at location . The albedo is the intrinsic representation of the captured face and is independent of the ambient lighting condition, which can be investigated for illumination-independent eye localization. Taking the logarithm of Eq. 1, we have: , , and , respectively, then
Chen 5 argued that one of the differences between the intrinsic structure and the illumination pattern of a face image is the scale difference, and the intrinsic structure is usually smaller than the illumination artifacts and shadows. In a way, promotes the variation patterns of the albedos of small-scale facial features. Thus, in order to eliminate the interference of ambient lighting in eye localization, one needs to extract from . We notice that the TV- model has shown its effectiveness for this task.5 Hence, we could use the TV- model to estimate :is a penalty parameter. As increases, the term becomes more dominant, and thus becomes more smoothing. One advantage of this LTV model is that the parameter , which depends only on the scale of image, is very easy to set. According to the LTV model, larger structures such as extrinsic illumination are left in (or ), and , which is taken as the output image of illumination normalization, can be obtained by .
Therefore, the illumination normalization problem is transformed to how to solve the TV- model. Since partial differential equation–based algorithms often have numerical difficulties, Chen 5 cast Eq. 4 as a second-order cone program and solved it by the modern interior-point methods. But this iterative solution is expensive in both memory and computation time. To make this illumination normalization technique suitable for real-time applications, a graph cut–based algorithm is proposed to solve Eq. 4.
Efficient Solution to the TV- Model
First, we show how to decompose Eq. 4 as several independent binary energy minimization problems. The images discussed in this letter are defined as matrices in , where denotes the set of nonnegative integers that represent the grayscale levels of images, and denotes the size of images. Let and denote the original and separated images, respectively. According to Eq. 3, each element of these matrices satisfies, i.e., the differentials on the image edges are defined to be zero. This assumption can be guaranteed by padding the image using the boundary elements. In this letter, to simplify and accelerate our algorithm, we just use the 4-neighbors of to approximate the gradient of at the location . Consequently, the regularization term in Eq. 4 can be defined in the discrete case by as given, and define for ; otherwise, . Define , where is an arbitrary real number. There exists . For each pair of neighboring pixels and , can be expressed in terms of the elements of over all grayscale levels as follows: . In this way, the original problem is reformulated into several independent binary problems based on the decomposition of a function into its level sets. Hence, combining Eq. 6 with Eq. 7, the first term in the right part of Eq. 4 can be binarized as for ; otherwise, . For binary numbers and , there exists . The second term of the right side in Eq. 4 can then be binarized as . As a result, Eq. 4 is reformulated by combining Eq. 8 with Eq. 9. For given input and and a fixed level , Eq. 4 can be rewritten as
We then construct a directed capacitated graph corresponding to to find its minimizer at every level . It is worth noting that the nodes/pixels in the graph are all binary and that the cost of each -link connecting one pair of neighboring pixels equals 1 and a -link connecting with the source or the sink costs and , respectively. In this way, a simplified two-terminal graph representation of Eq.11 is constructed, and then the minimizer is obtained via the min-cut algorithm on the graph.7
To sum up, by introducing divide-and-conquer methodology and a simplified graph representation, the minimizer of Eq. 4 can be computed more efficiently. This method is essentially identical with Ref. 6 but is easier to understand and implement. Consequently, the LTV model–based illumination normalization technique is accelerated, called Fast LTV (FLTV) in this letter.
Three well-known benchmark databases were chosen to evaluate the performance of the proposed eye localization scheme under both good and bad lighting conditions. In the Chinese Academy of Science: Pose, Expression, Accessory, Lighting (CAS-PEAL) face database,8 the Lighting and Normal subsets were used, which contain 2450 frontal face images under widely variable illumination and 1040 frontal face images under normal illumination, respectively. Yale face database B (Ref. 9), which contains 650 frontal face images was also adopted since it allows for testing under large variations of illumination, including strong shadow and side lighting. Another 3368 frontal face images under general illumination were chosen from the Face Recognition Technology (FERET) face database.10 All images are roughly cropped so that the facial regions are left and then resized to the appointed size. Then, illumination normalization is executed on the images. SQI and FLTV are both conducted here for comparison.
It is obvious that the darkest pixel in the eye region is most often a part of a pupil. Thus, this gray valley can be employed to localize eyes in face images. Generally, to suppress noise and alleviate the interference of other objects (e.g., hair, eye corner), a mean filter and a circular averaging filter are usually used to enhance the image. This simple eye localization approach requires no initialization and training process. Moreover, it is extremely fast and easy to implement and thus is widely used in practical applications. This approach is also used to test the illumination normalization methods here. For higher accuracy and speed, we limit searching gray valleys to the top half of the face image.
A few localization results on the Lighting subset of CAS-PEAL and Yale B face database are illustrated in Fig. 1 and Fig. 2 , respectively. The upper images are the original images; the lower images are the eye localization results on corresponding illumination-normalized images using FLTV. Note that the same method used to select in Ref. 4 is also adopted here. It can be observed that there exist no over compensation regions in the normalized images. To evaluate the accuracy of eye localization, a general criterion4 to claim successful eye localization is adopted:and are the manually marked left and right eye positions, and and are the automatically located positions. Thus, the correct localization rates directly on the original images and the preprocessed images with SQI and FLTV on the four test sets are separately obtained and are shown in Table 1 . It can be seen that SQI and FLTV can greatly improve eye localization accuracy on all test sets, and FLTV outperforms SQI under both good and bad illumination. In order to evaluate the computational cost, we get the average location time per image by calculating the mean of the total execution time. The average location times on a face image using SQI and FLTV as a preprocessing step are and , respectively. It is easy to conclude that FLTV is much faster than SQI and is effective for real-time eye localization. In addition, it takes on average for the original LTV model to process a image, which is much slower than the proposed FLTV and SQI methods. All the experiments are conducted with C++ on a Pentium D computer.
Correct localization rates on the four test sets.
|Illuminationpreprocess||Bad illumination||Good illumination|
The experimental results demonstrate that our illumination normalization technique is reliable for robust eye localization under extreme lighting conditions. It can also greatly improve the eye localization accuracy on images under good lighting conditions. One reason may be that FLTV not only retains useful information for eye localization, but also eliminates the interference of hair of large size (structure) by leaving it in . Therefore, using such a simple eye localization approach can achieve better or closer accuracy than other complicated eye localization algorithms.
Conclusion and Discussion
In this letter, we propose an illumination normalization technique as a preprocessing step before localizing eyes. This eye localization scheme is proven to be very fast and reliable under variable illumination. Motivated by the effectiveness and efficiency of the proposed illumination normalization technique, we might expect good performance when combining it with other existing eye localization algorithms, which is also our future work.