1 January 2009 Fast illumination normalization for robust eye localization under variable illumination
Author Affiliations +
Abstract
Most eye localization methods suffer from illumination variation. To overcome this problem, we propose an illumination normalization technique as a preprocessing step before localizing eyes. This technique requires no training process, no assumption on the light conditions, and no alignment between different images for illumination normalization. Moreover, it is fast and thus effective for real-time applications. Experiment results verify the effectiveness and efficiency of the eye localization scheme with the proposed illumination normalization technique.
Yang and Su: Fast illumination normalization for robust eye localization under variable illumination

1.

Introduction

Eye localization has been an intensive research topic for its wide applications in face recognition, driver fatigue monitoring systems, etc.1, 2, 3, 4 Unfortunately, it is still very hard to find an illumination-independent eye localization algorithm. Thus, the active infrared-based approaches are employed to diminish the influence of illumination variation.1, 2 However, eye localization in visible-spectrum images is more relevant, since it is more practical in applications, without needing extra equipments. Consequently, it is often essential to normalize illumination before localizing eyes in visible-spectrum images of widely varied lighting. Jung 3 apply the self quotient image (SQI) to rectify illumination and get satisfactory eye localization results on their private face database. However, in SQI, the image is normalized by division over its smoothed version, which depends on the kernel size of the weighed Gaussian filter to a great extent. The kernel size is rather difficult to determine since the intrinsic information will be severely reduced if the kernel size is small or halo effects might appear if the kernel size is large. Although a multiscale technique is adopted to alleviate this problem, computational cost is added, and sometimes overcompensation regions still exist. In addition, the image noise would be amplified by the division operation, which also deteriorates the performance of eye localization. This letter proposes to take advantage of a more effective illumination normalization method, the logarithmic total variation (LTV) model,5 as a preprocessing step for eye localization, and it validates the performance of this eye localization scheme on some public face databases. In Sec. 2, the LTV model is presented to normalize illumination on the face. Since the computational cost of the LTV model is expensive for real-time applications, a modified graph cut–based algorithm is proposed to solve the model in Sec. 3 so that the proposed preprocessing step is accelerated. Experimental results are given in Sec. 4, followed by conclusions in Sec. 5.

2.

The LTV Model for Illumination Normalization

According to the Lambertian model, the captured face image I(x,y) can be represented as

1

I(x,y)=ρx,ySx,y,
where ρ is the albedo of the object surface, and S(x,y) is the final light strength received at location (x,y) . The albedo ρ is the intrinsic representation of the captured face and is independent of the ambient lighting condition, which can be investigated for illumination-independent eye localization. Taking the logarithm of Eq. 1, we have:

2

log(I)=log(ρ)+log(S).
If we denote f=log(I) , v=log(ρ) , and u=log(S) , respectively, then

3

f=v+u.

Chen 5 argued that one of the differences between the intrinsic structure and the illumination pattern of a face image is the scale difference, and the intrinsic structure is usually smaller than the illumination artifacts and shadows. In a way, v promotes the variation patterns of the albedos of small-scale facial features. Thus, in order to eliminate the interference of ambient lighting in eye localization, one needs to extract v from f . We notice that the TV- L1 model has shown its effectiveness for this task.5 Hence, we could use the TV- L1 model to estimate u :

4

u=argminuΩ(u+λfu)dx,
where λ is a penalty parameter. As λ increases, the term fu becomes more dominant, and thus u becomes more smoothing. One advantage of this LTV model is that the parameter λ , which depends only on the scale of image, is very easy to set. According to the LTV model, larger structures such as extrinsic illumination are left in u (or S ), and ρ , which is taken as the output image of illumination normalization, can be obtained by ρ=exp(v)=exp(fu) .

Therefore, the illumination normalization problem is transformed to how to solve the TV- L1 model. Since partial differential equation–based algorithms often have numerical difficulties, Chen 5 cast Eq. 4 as a second-order cone program and solved it by the modern interior-point methods. But this iterative solution is expensive in both memory and computation time. To make this illumination normalization technique suitable for real-time applications, a graph cut–based algorithm is proposed to solve Eq. 4.

3.

Efficient Solution to the TV- L1 Model

First, we show how to decompose Eq. 4 as several independent binary energy minimization problems. The images discussed in this letter are defined as m×n matrices in Zm×n , where Z denotes the set of nonnegative integers that represent the grayscale levels of images, and m×n denotes the size of images. Let fZm×n and uZm×n denote the original and separated images, respectively. According to Eq. 3, each element of these matrices satisfies

5

fi,j=vi,j+ui,j,fori=1,,m,j=1,,n.
Moreover, we assume that all images satisfy the Neumann condition on the boundary of the domain Ω , i.e., the differentials on the image edges are defined to be zero. This assumption can be guaranteed by padding the image using the boundary elements. In this letter, to simplify and accelerate our algorithm, we just use the 4-neighbors of ui,j to approximate the gradient of u at the location (i,j) . Consequently, the regularization term in Eq. 4 can be defined in the discrete case by

6

u=i,j(ui+1,jui,j+ui,j+1ui,j).
Suppose μ as given, and define Bi,j=1 for ui,jμ ; otherwise, Bi,j=0 . Define x+=max {x,0} , where x is an arbitrary real number. There exists Bi,jBk,l(Bi,jBk,l)++(Bk,lBi,j)+ . For each pair of neighboring pixels (i,j) and (k,l) , ui,juk,l can be expressed in terms of the elements of Bi,j over all grayscale levels μ=0,1,,lmax as follows:

7

ui,juk,l=μ=0lmaxBi,jBk,l=μ=0lmax[(Bi,jBk,l)++(Bk,lBi,j)+],
where lmax=maxi,j{ui,j}255 . In this way, the original problem is reformulated into several independent binary problems based on the decomposition of a function into its level sets. Hence, combining Eq. 6 with Eq. 7, the first term in the right part of Eq. 4 can be binarized as

8

Ωudx=μ=0lmaxi,j{[(Bi,jBi+1,j)++(Bi+1,jBi,j)+]+[(Bi,jBi,j+1)++(Bi,j+1Bi,j)+]}.
Similarly, we define Bi,j=1 for fi,jμ ; otherwise, Bi,j=0 . For binary numbers Bi,j and Bi,j , there exists Bi,jBi,j(1Bi,j)Bi,j+Bi,j(1Bi,j) . The second term of the right side in Eq. 4 can then be binarized as

9

Ωfudx=i,jfi,jui,j=μ=0lmaxi,j[(1Bi,j)Bi,j+Bi,j(1Bi,j)],
where lmax=maxi,j{fi,j}255 . As a result, Eq. 4 is reformulated by combining Eq. 8 with Eq. 9. For given input f and λ and a fixed level μ{0,1,,lmax} , Eq. 4 can be rewritten as

10

u=argminμ=0lmaxE(B;f,λ,μ),

11

E(B;f,λ,μ)=i,j{[(Bi,jBi+1,j)++(Bi+1,jBi,j)+]+[(Bi,jBi,j+1)++(Bi,j+1Bi,j)+]+λ[(1Bi,j)Bi,j+Bi,j(1Bi,j)]}.

Thus, the problem of minimizing discretized Eq. 4 is decomposed into minimizing E(B;f,λ,μ) for all levels μ=0,1,,lmax . It is noted that the minimizer u* of Eq. 4 can be constructed from the minimizers {Bμ*:μ=0,1,,lmax} using the relationship6

12

ui,j*=max[μBμ*(i,j)=1].

We then construct a directed capacitated graph corresponding to E(B;f,λ,μ) to find its minimizer Bμ* at every level μ=0,1,,lmax . It is worth noting that the nodes/pixels in the graph are all binary and that the cost of each n -link connecting one pair of neighboring pixels equals 1 and a t -link connecting (i,j) with the source or the sink costs λBi,j and λ(1Bi,j) , respectively. In this way, a simplified two-terminal s-t graph representation of Eq.11 is constructed, and then the minimizer Bμ* is obtained via the min-cut algorithm on the graph.7

To sum up, by introducing divide-and-conquer methodology and a simplified graph representation, the minimizer u* of Eq. 4 can be computed more efficiently. This method is essentially identical with Ref. 6 but is easier to understand and implement. Consequently, the LTV model–based illumination normalization technique is accelerated, called Fast LTV (FLTV) in this letter.

4.

Experiments

Three well-known benchmark databases were chosen to evaluate the performance of the proposed eye localization scheme under both good and bad lighting conditions. In the Chinese Academy of Science: Pose, Expression, Accessory, Lighting (CAS-PEAL) face database,8 the Lighting and Normal subsets were used, which contain 2450 frontal face images under widely variable illumination and 1040 frontal face images under normal illumination, respectively. Yale face database B (Ref. 9), which contains 650 frontal face images was also adopted since it allows for testing under large variations of illumination, including strong shadow and side lighting. Another 3368 frontal face images under general illumination were chosen from the Face Recognition Technology (FERET) face database.10 All images are roughly cropped so that the facial regions are left and then resized to the appointed size. Then, illumination normalization is executed on the images. SQI and FLTV are both conducted here for comparison.

It is obvious that the darkest pixel in the eye region is most often a part of a pupil. Thus, this gray valley can be employed to localize eyes in face images. Generally, to suppress noise and alleviate the interference of other objects (e.g., hair, eye corner), a mean filter and a circular averaging filter are usually used to enhance the image. This simple eye localization approach requires no initialization and training process. Moreover, it is extremely fast and easy to implement and thus is widely used in practical applications. This approach is also used to test the illumination normalization methods here. For higher accuracy and speed, we limit searching gray valleys to the top half of the face image.

A few localization results on the Lighting subset of CAS-PEAL and Yale B face database are illustrated in Fig. 1 and Fig. 2 , respectively. The upper images are the original images; the lower images are the eye localization results on corresponding illumination-normalized images using FLTV. Note that the same method used to select λ in Ref. 4 is also adopted here. It can be observed that there exist no over compensation regions in the normalized images. To evaluate the accuracy of eye localization, a general criterion4 to claim successful eye localization is adopted:

13

err=max{lclc,rcrc}lcrc<0.25,
where lc and rc are the manually marked left and right eye positions, and lc and rc are the automatically located positions. Thus, the correct localization rates directly on the original images and the preprocessed images with SQI and FLTV on the four test sets are separately obtained and are shown in Table 1 . It can be seen that SQI and FLTV can greatly improve eye localization accuracy on all test sets, and FLTV outperforms SQI under both good and bad illumination. In order to evaluate the computational cost, we get the average location time per image by calculating the mean of the total execution time. The average location times on a 128×128 face image using SQI and FLTV as a preprocessing step are 0.781s and 0.057s , respectively. It is easy to conclude that FLTV is much faster than SQI and is effective for real-time eye localization. In addition, it takes 6.53s on average for the original LTV model to process a 128×128 image, which is much slower than the proposed FLTV and SQI methods. All the experiments are conducted with C++ on a Pentium D 2.8GHz computer.

Fig. 1

Correct localization samples from CAS-Lighting subset.

010503_1_1.jpg

Fig. 2

Correct localization samples from Yale B.

010503_1_2.jpg

Table 1

Correct localization rates on the four test sets.

IlluminationpreprocessBad illuminationGood illumination
CAS-LightingYale BCAS-NormalFERET
None50.8%53.4%69.4%78.6%
SQI95.5%85.4%99.3%97.3%
FLTV97.4%86.5%100.0%98.1%

The experimental results demonstrate that our illumination normalization technique is reliable for robust eye localization under extreme lighting conditions. It can also greatly improve the eye localization accuracy on images under good lighting conditions. One reason may be that FLTV not only retains useful information for eye localization, but also eliminates the interference of hair of large size (structure) by leaving it in S . Therefore, using such a simple eye localization approach can achieve better or closer accuracy than other complicated eye localization algorithms.

5.

Conclusion and Discussion

In this letter, we propose an illumination normalization technique as a preprocessing step before localizing eyes. This eye localization scheme is proven to be very fast and reliable under variable illumination. Motivated by the effectiveness and efficiency of the proposed illumination normalization technique, we might expect good performance when combining it with other existing eye localization algorithms, which is also our future work.

references

1.  W. Hizem, Y. Yang, and B. Dorizzi, “Near-infrared sensing and associated landmark detection for face recognition,” J. Electron. Imaging  10.1117/1.2898556 17(1), 011005 (2008). Google Scholar

2.  Z. Zhu and Q. Ji, “Robust real-time eye detection and tracking under variable lighting conditions and various face orientations,” Comput. Vis. Image Underst.  10.1016/j.cviu.2004.07.012 98(1), 124–154 (2005). Google Scholar

3.  S. U. Jung and J. H. Yoo, “A robust eye detection method in facial region,” Lect. Notes Comput. Sci. 4418, 596–606 (2007). Google Scholar

4.  Z. H. Zhou and X. Geng, “Projection functions for eye detection,” Pattern Recogn.  10.1016/j.patcog.2003.09.006 37(5), 1049–1056 (2004). Google Scholar

5.  T. Chen, X. S. Zhou, D. Comaniciu, and T. S. Huang, “Total variation models for variable lighting face recognition,” IEEE Trans. Pattern Anal. Mach. Intell.  10.1109/TPAMI.2006.195 28(9), 1519–1524 (2006). Google Scholar

6.  J. Darbon and M. Sigelle, “Image restoration with discrete constrained total variation. Part I: fast and exact optimization,” J. Math. Imaging Vision  10.1007/s10851-006-8803-0 26(3), 261–276 (2006). Google Scholar

7.  Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Trans. Pattern Anal. Mach. Intell.  10.1109/TPAMI.2004.60 26(9), 1124–1137 (2004). Google Scholar

8.  W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, X. Zhang, and D. Zhao, “The CAS-PEAL large-scale Chinese face database and evaluation protocols,” IEEE Trans. Syst. Man Cybern., Part A 38(1), 149–161 (2008). Google Scholar

9.  A. Georghiades, D. Kriegman, and P. Belhumeur, “From few to many: generative models for recognition under variable pose and illumination,” IEEE Trans. Pattern Anal. Mach. Intell.  10.1109/34.927464 23(6), 643–660 (2001). Google Scholar

10.  P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “The FERET evaluation methodology for face-recognition algorithms,” IEEE Trans. Pattern Anal. Mach. Intell.  10.1109/34.879790 22(10), 1090–1104 (2000). Google Scholar

Fei Yang, Jianbo Su, "Fast illumination normalization for robust eye localization under variable illumination," Journal of Electronic Imaging 18(1), 010503 (1 January 2009). https://doi.org/10.1117/1.3086868
JOURNAL ARTICLE
3 PAGES


SHARE
Back to Top