## 1.

## Introduction

Eye localization has been an intensive research topic for its wide applications in face recognition, driver fatigue monitoring systems, etc.^{1, 2, 3, 4} Unfortunately, it is still very hard to find an illumination-independent eye localization algorithm. Thus, the active infrared-based approaches are employed to diminish the influence of illumination variation.^{1, 2} However, eye localization in visible-spectrum images is more relevant, since it is more practical in applications, without needing extra equipments. Consequently, it is often essential to normalize illumination before localizing eyes in visible-spectrum images of widely varied lighting. Jung ^{3} apply the self quotient image (SQI) to rectify illumination and get satisfactory eye localization results on their private face database. However, in SQI, the image is normalized by division over its smoothed version, which depends on the kernel size of the weighed Gaussian filter to a great extent. The kernel size is rather difficult to determine since the intrinsic information will be severely reduced if the kernel size is small or halo effects might appear if the kernel size is large. Although a multiscale technique is adopted to alleviate this problem, computational cost is added, and sometimes overcompensation regions still exist. In addition, the image noise would be amplified by the division operation, which also deteriorates the performance of eye localization. This letter proposes to take advantage of a more effective illumination normalization method, the logarithmic total variation (LTV) model,^{5} as a preprocessing step for eye localization, and it validates the performance of this eye localization scheme on some public face databases. In Sec. 2, the LTV model is presented to normalize illumination on the face. Since the computational cost of the LTV model is expensive for real-time applications, a modified graph cut–based algorithm is proposed to solve the model in Sec. 3 so that the proposed preprocessing step is accelerated. Experimental results are given in Sec. 4, followed by conclusions in Sec. 5.

## 2.

## The LTV Model for Illumination Normalization

According to the Lambertian model, the captured face image $I(x,y)$ can be represented as

where $\rho $ is the albedo of the object surface, and $S(x,y)$ is the final light strength received at location $(x,y)$ . The albedo $\rho $ is the intrinsic representation of the captured face and is independent of the ambient lighting condition, which can be investigated for illumination-independent eye localization. Taking the logarithm of Eq. 1, we have:If we denote $f=\mathrm{log}\left(I\right)$ , $v=\mathrm{log}\left(\rho \right)$ , and $u=\mathrm{log}\left(S\right)$ , respectively, thenChen ^{5} argued that one of the differences between the intrinsic structure and the illumination pattern of a face image is the scale difference, and the intrinsic structure is usually smaller than the illumination artifacts and shadows. In a way,
$v$
promotes the variation patterns of the albedos of small-scale facial features. Thus, in order to eliminate the interference of ambient lighting in eye localization, one needs to extract
$v$
from
$f$
. We notice that the TV-
${L}^{1}$
model has shown its effectiveness for this task.^{5} Hence, we could use the TV-
${L}^{1}$
model to estimate
$u$
:

## 4

$$u=\mathrm{arg}{\mathrm{min}}_{u}\phantom{\rule{0.2em}{0ex}}{\int}_{\Omega}(\mid \nabla u\mid +\lambda \mid f-u\mid )\mathrm{d}x,$$Therefore, the illumination normalization problem is transformed to how to solve the TV-
${L}^{1}$
model. Since partial differential equation–based algorithms often have numerical difficulties, Chen ^{5} cast Eq. 4 as a second-order cone program and solved it by the modern interior-point methods. But this iterative solution is expensive in both memory and computation time. To make this illumination normalization technique suitable for real-time applications, a graph cut–based algorithm is proposed to solve Eq. 4.

## 3.

## Efficient Solution to the TV- ${L}^{1}$ Model

First, we show how to decompose Eq. 4 as several independent binary energy minimization problems. The images discussed in this letter are defined as $m\times n$ matrices in ${\mathbb{Z}}^{m\times n}$ , where $\mathbb{Z}$ denotes the set of nonnegative integers that represent the grayscale levels of images, and $m\times n$ denotes the size of images. Let $f\u220a{\mathbb{Z}}^{m\times n}$ and $u\u220a{\mathbb{Z}}^{m\times n}$ denote the original and separated images, respectively. According to Eq. 3, each element of these matrices satisfies

## 5

$${f}_{i,j}={v}_{i,j}+{u}_{i,j},\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{0.3em}{0ex}}i=1,\dots ,m,\phantom{\rule{1em}{0ex}}j=1,\dots ,n.$$## 6

$$\mid \nabla u\mid =\sum _{i,j}(\mid {u}_{i+1,j}-{u}_{i,j}\mid +\mid {u}_{i,j+1}-{u}_{i,j}\mid ).$$## 7

$$\mid {u}_{i,j}-{u}_{k,l}\mid =\sum _{\mu =0}^{{l}_{\mathrm{max}}}\mid {B}_{i,j}-{B}_{k,l}\mid =\sum _{\mu =0}^{{l}_{\mathrm{max}}}[{({B}_{i,j}-{B}_{k,l})}^{+}+{({B}_{k,l}-{B}_{i,j})}^{+}],$$## 8

$${\int}_{\Omega}\mid \nabla u\mid \mathrm{d}x=\sum _{\mu =0}^{{l}_{\mathrm{max}}}\sum _{i,j}\{[{({B}_{i,j}-{B}_{i+1,j})}^{+}+{({B}_{i+1,j}-{B}_{i,j})}^{+}]+[{({B}_{i,j}-{B}_{i,j+1})}^{+}+{({B}_{i,j+1}-{B}_{i,j})}^{+}]\}.$$## 9

$${\int}_{\Omega}\mid f-u\mid \mathrm{d}x=\sum _{i,j}\mid {f}_{i,j}-{u}_{i,j}\mid =\sum _{\mu =0}^{{l}_{\mathrm{max}}^{\prime}}\sum _{i,j}[(1-{B}_{i,j}^{\prime}){B}_{i,j}+{B}_{i,j}^{\prime}(1-{B}_{i,j})],$$## 10

$$u=\mathrm{arg}\mathrm{min}\phantom{\rule{0.2em}{0ex}}\sum _{\mu =0}^{{l}_{\mathrm{max}}^{\prime}}E(B;f,\lambda ,\mu ),$$## 11

$$E(B;f,\lambda ,\mu )=\sum _{i,j}\{[{({B}_{i,j}-{B}_{i+1,j})}^{+}+{({B}_{i+1,j}-{B}_{i,j})}^{+}]+[{({B}_{i,j}-{B}_{i,j+1})}^{+}+{({B}_{i,j+1}-{B}_{i,j})}^{+}]+\lambda [(1-{B}_{i,j}^{\prime}){B}_{i,j}+{B}_{i,j}^{\prime}(1-{B}_{i,j})]\}.$$Thus, the problem of minimizing discretized Eq. 4 is decomposed into minimizing
$E(B;f,\lambda ,\mu )$
for all levels
$\mu =0,1,\dots ,{l}_{\mathrm{max}}^{\prime}$
. It is noted that the minimizer
${u}^{*}$
of Eq. 4 can be constructed from the minimizers
$\{{B}_{\mu}^{*}:\mu =0,1,\dots ,{l}_{\mathrm{max}}^{\prime}\}$
using the relationship^{6}

We then construct a directed capacitated graph corresponding to
$E(B;f,\lambda ,\mu )$
to find its minimizer
${B}_{\mu}^{*}$
at every level
$\mu =0,1,\dots ,{l}_{\mathrm{max}}^{\prime}$
. It is worth noting that the nodes/pixels in the graph are all binary and that the cost of each
$n$
-link connecting one pair of neighboring pixels equals 1 and a
$t$
-link connecting
$(i,j)$
with the source or the sink costs
$\lambda {B}_{i,j}^{\prime}$
and
$\lambda (1-{B}_{i,j}^{\prime})$
, respectively. In this way, a simplified two-terminal
$s\text{-}t$
graph representation of Eq.11 is constructed, and then the minimizer
${B}_{\mu}^{*}$
is obtained via the min-cut algorithm on the graph.^{7}

To sum up, by introducing divide-and-conquer methodology and a simplified graph representation, the minimizer ${u}^{*}$ of Eq. 4 can be computed more efficiently. This method is essentially identical with Ref. 6 but is easier to understand and implement. Consequently, the LTV model–based illumination normalization technique is accelerated, called Fast LTV (FLTV) in this letter.

## 4.

## Experiments

Three well-known benchmark databases were chosen to evaluate the performance of the proposed eye localization scheme under both good and bad lighting conditions. In the Chinese Academy of Science: Pose, Expression, Accessory, Lighting (CAS-PEAL) face database,^{8} the Lighting and Normal subsets were used, which contain 2450 frontal face images under widely variable illumination and 1040 frontal face images under normal illumination, respectively. Yale face database B (Ref. 9), which contains 650 frontal face images was also adopted since it allows for testing under large variations of illumination, including strong shadow and side lighting. Another 3368 frontal face images under general illumination were chosen from the Face Recognition Technology (FERET) face database.^{10} All images are roughly cropped so that the facial regions are left and then resized to the appointed size. Then, illumination normalization is executed on the images. SQI and FLTV are both conducted here for comparison.

It is obvious that the darkest pixel in the eye region is most often a part of a pupil. Thus, this gray valley can be employed to localize eyes in face images. Generally, to suppress noise and alleviate the interference of other objects (e.g., hair, eye corner), a mean filter and a circular averaging filter are usually used to enhance the image. This simple eye localization approach requires no initialization and training process. Moreover, it is extremely fast and easy to implement and thus is widely used in practical applications. This approach is also used to test the illumination normalization methods here. For higher accuracy and speed, we limit searching gray valleys to the top half of the face image.

A few localization results on the Lighting subset of CAS-PEAL and Yale B face database are illustrated in Fig. 1
and Fig. 2
, respectively. The upper images are the original images; the lower images are the eye localization results on corresponding illumination-normalized images using FLTV. Note that the same method used to select
$\lambda $
in Ref. 4 is also adopted here. It can be observed that there exist no over compensation regions in the normalized images. To evaluate the accuracy of eye localization, a general criterion^{4} to claim successful eye localization is adopted:

## 13

$$\mathit{err}=\frac{\mathrm{max}\{\Vert {l}_{c}-{l}_{c}^{\prime}\Vert ,\Vert {r}_{c}-{r}_{c}^{\prime}\Vert \}}{\Vert {l}_{c}-{r}_{c}\Vert}<0.25,$$## Table 1

Correct localization rates on the four test sets.

Illuminationpreprocess | Bad illumination | Good illumination | ||
---|---|---|---|---|

CAS-Lighting | Yale B | CAS-Normal | FERET | |

None | 50.8% | 53.4% | 69.4% | 78.6% |

SQI | 95.5% | 85.4% | 99.3% | 97.3% |

FLTV | 97.4% | 86.5% | 100.0% | 98.1% |

The experimental results demonstrate that our illumination normalization technique is reliable for robust eye localization under extreme lighting conditions. It can also greatly improve the eye localization accuracy on images under good lighting conditions. One reason may be that FLTV not only retains useful information for eye localization, but also eliminates the interference of hair of large size (structure) by leaving it in $S$ . Therefore, using such a simple eye localization approach can achieve better or closer accuracy than other complicated eye localization algorithms.

## 5.

## Conclusion and Discussion

In this letter, we propose an illumination normalization technique as a preprocessing step before localizing eyes. This eye localization scheme is proven to be very fast and reliable under variable illumination. Motivated by the effectiveness and efficiency of the proposed illumination normalization technique, we might expect good performance when combining it with other existing eye localization algorithms, which is also our future work.