In recent years, driver fatigue has become one of the most important factors for traffic accidents, which has come at a great cost to the safety and property of drivers and pedestrians. Researchers have proposed many fatigue detection methods to solve this problem, which can be divided into three types: 1 physiological parameters, vehicle behaviors, and facial feature analysis. The first method measures the driver’s physiological parameters23.4.5.–6 by using tools such as electroencephalogram and electrocardiogram. However, these methods are invasive and require contact with the driver’s body. The second method is used to measure the behaviors of vehicles,78.9.10.–11 such as speed, steering wheel rotation angle, and lane departure detection; however, this method is affected by driving conditions, driving experience, and vehicle type. The third method analyzes the driver’s face, such as the PERCLOS value, blink frequency, head posture, and yawn detection. PERCLOS is the abbreviation of percentage of eyelid closure over the pupil over time, which is the percentage of the closing time of the eye over a specific period of time. This method is noninvasive and easy to implement and is applied to the fatigue detection in this paper. The following is a brief introduction to some of the algorithms for detecting fatigue by facial feature analysis.
You et al. presented a night monitoring system for real-time fatigue driving detection,12 but the monitoring system requires additional infrared lighting equipment, which is limited to some specific applications. An eye state detection method based on projection is proposed in Ref. 13, the height and width of the iris are estimated by integral or variance projection, and the eye state is determined according to the aspect ratio. Omidyeganeh et al. calculated the mouth width and height by horizontal and vertical grayscale projection of the mouth area14 according to its aspect ratio to determine whether a driver is yawning. However, this method is not effective when the teeth are exposed or the driver has a beard. The authors15 proposed a fatigue detection system, based on a smartphone, used to detect the eyes according to a progressive locating method (PLM) algorithm. When at least three frames of the adjacent five frames are closed, the driver is considered to be in a state of fatigue. However, the limitation of this PLM algorithm is that it over-relies on the gray distribution of the facial region. Further, there are some neural networks or supporting vector machine methods to identify the state of the eyes and mouth via training classifiers,1617.18.19.–20 which have a high detection accuracy. However, these methods need to collect a large amount of training data and have a long training time.
The rest of this paper is organized as follows. Section 2 introduces face region detection, and the location of the eyes and mouth on this basis. Section 3 explains the contour feature extraction method, including eye contour extraction and mouth internal contour extraction. Section 4 evaluates the performance of the algorithm based on the experimental results and analyzes its state. Finally, the conclusion and future work are presented in Sec. 5.
Proposed Eye and Mouth Region Detection Algorithm
The block diagram of the proposed eye and mouth state detection algorithm is shown in Fig. 1. The image is acquired from three different databases. Subsequently, we introduce each part of the algorithm.
The images in the database are obtained under different lighting conditions. Because the highlights and shadows caused by the light source have a great influence on the skin color, we use the algorithm of Ref. 21 for color compensation. The facial area is extracted from the database images to obtain the image of the eye and mouth region. Thus, we use the Viola Jones face detection algorithm.22 The Viola Jones face detection algorithm is a method based on an integral graph, cascade classifier, and Adaboost algorithm, which greatly improves the speed and accuracy of face detection.
There is a fixed connection among facial features. For example, the eyes are set in the upper part of the face and the mouth is located in the lower part of the face. In order to improve the accuracy and speed of detection, our algorithm determines the region of interest (ROI) of the eyes and mouth, and then detects the target on the ROI region. After obtaining the facial image, the upper half of the image is extracted and recorded as image , the upper one-eighth of image is removed, and the lower seven-eighths of image is reserved and set as the eye ROI, as shown in Fig. 2(a). In this ROI, we use the EyeMap algorithm23 to locate the eye region. This method builds two EyeMaps in the YCbCr space,24 EyeMapC and EyeMapL; then, these two maps are combined into a single map. Experiments find high Cb components and low Cr components around the eyes, and EyeMapC is calculated as follows:
The values of , , and are normalized to the range [0, 255]. In addition, eyes contain bright and dark values in the luminance component; therefore, grayscale dilation and erosion with ball structuring elements are used to construct EyeMapL.25 EyeMapL is calculated as follows:
Then, EyeMapC is multiplied by EyeMapL to obtain EyeMap
EyeMap of a typical image (from the California Polytechnic University color face database) is constructed as shown in Fig. 2. Among them, the original eyes’ ROI is as shown in Fig. 2(a) and EyeMapC, EyeMapL, and EyeMap are as shown in Figs. 2(b)–2(d), respectively.
In order to accurately locate the eye region, the optimal threshold is obtained by leveraging the OTSU algorithm to convert the EyeMap gray image into a binary image, as shown in Fig. 3(a). We analyze the aspect ratio, position, and other characteristics of every connected component (white part), to exclude the noneye region, and finally consider a pair of connected components as the eye region, as shown in Fig. 3(b). If there is no pair of connected domains, then the threshold is reduced based on the optimal threshold value and redetected. Experiments demonstrate that eye length is approximately half of the distance between the center of eyes, and eye height is approximately half of the eye length. Therefore, we locate the region of the left and right eyes, with a rectangular box calibration, as shown in Fig. 3(c).
To improve the speed and accuracy of mouth detection, we set the ROI based on the characteristics of the mouth distribution in the face region. Saeed and Dugelay26 proposed that the mouth ROI was the lowest one-third of the detected face region. The lower one-third of the face image is extracted and recorded as the image , and the middle half of the image is extracted and set as the mouth ROI, as shown in the green box in Fig. 4(a). However, when the mouth opens widely (yawning), we cannot obtain a complete mouth region, as shown in Fig. 4(b). When the height of the facial region is expanded one-fifth downward, we obtain the complete mouth region, as shown in Fig. 4(c). However, when the mouth opens narrowly, the ROI is too large, and it will affect the extraction of the mouth internal contour; thus, it is necessary to accurately locate the mouth. Based on the difference between the colors of the lips and the skin, the mouth region is precisely positioned according to lip segmentation, and we split the lips according to the value of chromatism of the RGB space.27 The value of chromatism is defined as follows:
Experiments demonstrate that the value of chromatism of the lip region is larger than that of the skin. Assuming that the number of pixels belonging to the lip region is , we rank all of the pixel saturation values and select the pixels with the largest value as the lip region. Talea and Yaghmaie28 proposed that is in the range of 10% to 20% in the initial ROI of the mouth. Pan et al. set to 15% of the ROI of the mouth.27 In this study, we set to 20% to extract the complete lip region and the selected pixels as shown in Fig. 4(d) in the white part. Considering that the upper and lower lips are not connected at all times as shown in Fig. 4(d), and the difference between the upper and lower lips size is not large, we select the largest two connected components, according to the size of the connected components and determine whether the upper and lower lips are connected. The external rectangle of the two largest connected components (when the upper and lower lips are not connected) or a connected component (when the upper and lower lips are connected) is the located mouth region. In fact, the final region of the mouth is slightly larger than the rectangular box.
Contour Feature Extraction
After locating the eye and mouth region, we judge the state of the eyes as open or closed by extracting the eye contour and analyzing the open state of the mouth by extracting the mouth internal contour.
Eye Contour Extraction
Sclera is the white part of the eye. Based on the difference between the sclera and skin saturation, the large difference between the red and blue components of the skin and the small relative difference between the sclera regions, the sclera region is segmented by a -means clustering method. First, we exclude the impact of the iris29,30 and eyelashes; eyebrows are included in certain instances. Given that the gray value of the iris and eyelash region is the smallest and the scleral gray value is larger than the skin region, we obtain the best segmentation threshold via the OTSU algorithm, the threshold segmentation of the image on the basis of threshold and then divide the eye into two parts: the iris and eyelash region as shown in the blue region in Fig. 5(a). In the rest of the sclera and the skin region, we use the difference between the red component and blue component between the sclera and the skin (R-B) and cluster them into two parts according to a -means clustering characteristic. The final eye region is divided into three parts, as shown in Fig. 5(b). According to the characteristics of the sclera saturation, the value is small and adjacent to the iris to obtain the sclera region. In addition, through the saturation analysis in the HSV space, the pixels with large individual saturation values are removed to accurately locate the sclera region, as shown in the green area of Fig. 5(c).
According to the extracted sclera region, the boundary point of the sclera is selected to fit the upper and lower eyelids via a quadratic curve, as shown in Fig. 6(a) (the two intersections of the curve are defined as the left and right corners). The minimum circumscribed rectangle of the eye contour is calculated, as shown in Fig. 6(b), and the aspect ratio of the rectangle is used to determine whether the eye is open or closed. The details are described in the next section.
Special circumstances–eye strabismus
There is another special case: when the human eye is strabismus, as shown in Fig. 7(a), the sclera of the iris side and the effect can yield a poor fit to the eye contour with the aforementioned method. By extracting the boundary points of the sclera and iris to fit the contour of the eye, we first estimate the center of the iris. When the sclera is in the side of the iris center, the eye is in a state of strabismus. Given that the iris has the lowest value in the eye region, we use a rectangle with a side length of to traverse the entire human eye image; when the sum of the components of all pixels in the rectangle is the smallest, the rectangular center is regarded as the center of the iris. In Ref. 1, the author proposes that the iris radius is proportional to one-tenth of the distance between the centers of the eyes; assuming that the distance between the eyes is , we take , as the final location of the center of the iris as shown by the red dot in Fig. 7(b). A partial image of the iris with the iris center as the boundary is considered, and the side length is a quarter of , as shown in Fig. 7(c). The binary image is obtained by threshold segmentation using the OTSU algorithm, as shown in Fig. 7(d); a morphological operation is used to remove the noise, and then certain iris boundary points are extracted. With the previously proposed method, we extract the sclera, as shown in the green area of Fig. 7(b), and part of the sclera boundary points. According to the extracted boundary points of the sclera and iris, the eye contour is fitted via a quadratic curve, as shown in Fig. 7(e).
Mouth Internal Contour Extraction
In the RGB space, compared with the skin, the difference between the red component and green component of the lips is larger.31 Considering that the mouth is open, particularly when a person yawns, the RGB value of the internal part of the mouth is in balance, even if the teeth are exposed. The relationship between the value of the mouth, lips, and skin pixels is as follows: . Owing to the fact that the difference of the lips’ component and component is the largest, we can effectively separate the internal part of the skin and the mouth. We set the adaptive threshold according to the following formula via threshold segmentation to obtain the binary image, as shown in Fig. 8(a):
In order to obtain the internal contour of the mouth more accurately, we calculate the optimal threshold via an iterative method to determine the segmentation threshold . The algorithm for calculating the threshold is illustrated by the following steps:
1. Calculate the values of all pixels, and sort them from smallest to largest;
2. Select the top 10% of all pixels and set them to 255. Set the rest of the pixel gray values to 0. Remove image noise via a morphological operation, analyze the position of each connected component (white part), and calculate the number of pixels that do not belong inside the mouth, with skin value;
3. Determine whether the skin value is greater than of the total number of pixels of the image, and if so, is the maximum value of in these selected pixels. Otherwise, proceed to the next step;
4. Select pixels with a small value from the remaining pixels where is of the total number of pixels of the image, merge into the previously selected pixels, and calculate the skin value according to the method in 2. Then, return to step 3.
The mouth image we obtain is located in the center of the region; therefore, if the center of mass of the connected component (white part) is near the center of the image, the connected component is regarded as the inner area of the mouth. The internal area of the mouth is obtained by position analysis of each connected component, as shown in Fig. 8(b). The extracted external contour of the connected component is the internal contour of the mouth, as shown in Fig. 8(c). In addition, calculating the minimum circumscribed rectangle of the contour of the mouth, as shown in Fig. 8(d), determines whether the person is yawning by the aspect ratio of the external rectangle.
State Analysis and Test Results
In order to verify the efficiency of our algorithm, we conduct the experiment against some image databases, including the color face database of the California Institute of Technology (CIT), part of the FERET face database, and self-built face database. These pictures are taken under different conditions, such as different lighting conditions, different indoor and outdoor environments, and different head postures. Our experiments are conducted on the basis of the acquisition of the face region, and the face region is obtained by applying the Viola Jones face detector on the OpenCV 2.4 platform.
Eye state analysis
We determine the eye state according to the aspect ratio of the smallest external rectangle of the eye contour. Assuming that the length of the rectangle is , the width (or height) is , the eyelid closure value is defined as follows:
We define the threshold value according to the following criteria to determine the eye state:
For the determination of the threshold , we use the P80 standard in the PERCLOS parameter; when the degree of eye closure is more than 80%, the eye is in a closed state. In order to evaluate the eyelid aspect ratio when the eyes open normally, we performed related experiments. First, we collected eye images from different people whose eyes are open normally in the images, and then calculated the eyelid aspect ratio for each image. Experimental results show that the eyelid aspect ratio is when the eye is open normally. According to the P80 standard, the eyelid aspect ratio is less than 0.1125 when the eyes are closed, i.e., the value should be 0.1125. However, the experimental results show that when , i.e., , the eye is in a closed state. In this case, the judgment of the eye state is more accurate, with an accuracy rate of 98.67%. In addition, when the eye is completely closed, the eye region through the OTSU algorithm is divided into two parts, and this time, we cannot detect the sclera region. Therefore, when the sclera region is not detected or the eyelid closure value , the eye is considered to be in a closed state.
Mouth state analysis
According to the internal contour of the mouth, the mouth opening degree is defined by the aspect ratio of the external rectangle
In general, when people yawn, the mouth is open widely and this lasts for a few seconds. There is no universal international standard to determine the degree of mouth opening; therefore, we perform experiments to determine the degree of mouth opening when people yawn. First, we collect more than 100 pictures of people yawning from different groups, and the degree of mouth opening in these pictures is different. We calculate the mouth opening degree of these pictures according to the proposed algorithm. The experimental results show that when , the degree of mouth opening reaches the degree of mouth opening when yawning. In this case, the detection of the mouth state is more accurate, with an accuracy rate of 97.5%. Yawning is a process that lasts for a few seconds, with the mouth widely open in continuous multiframe pictures. Although the value has a little error, it has little effect on the result. In addition, the degree of mouth opening is very small at times, but the value is greater than 0.75. In order to eliminate this situation and improve accuracy, the state detection method needs to satisfy the condition that the internal area of the mouth is greater than one-eighth of the image size. When the mouth opening degree is detected to be , the internal area of the mouth is large enough; if the mouth is open widely and it lasts for a few seconds, then it can be determined that the driver is yawning.
Test and Experimental Results
The performance of the proposed algorithm is evaluated on three databases. During the eye state detection, we observe the eyes in the image to decide their state (open or closed) and compare them with the state obtained from the algorithm. The experimental results demonstrate that the algorithm can fit the contour of the eyes with different opening states, similar to the experimental results shown in Table 1, and six fitting effect diagrams shown in Fig. 9, including the situation of different lighting conditions, different groups, different open eye sizes, and wearing glasses.
Performance of the proposed eye state detection algorithm.
|Database||Number of images||Number of open||Number of closed||Open state determination accuracy||Closed state determination accuracy|
In the CIT and FERET databases, there are many pictures where the teeth are exposed or with a beard. Under this situation, the gray-projection algorithm to detect the mouth state may lose its accuracy;14 however, the algorithm discussed in this paper can obtain a better result of the internal contour of the mouth, as the experimental results shown in Table 2. Then, we provide six fitting effect diagrams of the mouth contour as shown in Fig. 10, including the situation of different lighting conditions, beards, different opening sizes, and exposed teeth.
Performance of the proposed mouth state detection algorithm.
|Database||Number of images||Number of open||Number of closed||Open state determination accuracy||Closed state determination accuracy|
In fact, the pictures in the database were obtained under different lighting conditions. Reference 14 uses the projection method to determine the eye and mouth state, that is, the image is projected horizontally and vertically to calculate the sum of the gray values of the pixels in the horizontal and vertical directions. This method is greatly affected by the light intensity, and the detection effect is decreased significantly under uneven illumination. The algorithm proposed in this paper performs illumination compensation on the image before face detection, which reduces the impact of highlights and shadows on the experiment. Thus, the proposed algorithm exhibits robustness against illumination changes. The performance of the algorithm will be reduced for images obtained under dramatic lighting changes. In addition, there are several mouth pictures where the teeth are exposed or with a beard. Owing to the large differences between the gray value of teeth and beards and the skin color, the projection method does not work well in this case. Our algorithm takes into account the color difference between teeth, beard, and skin and determines the degree of mouth opening by obtaining the internal contour of the mouth. The experimental results show that the detection effect is significantly improved, and the performance comparison is shown in Tables 3 and 4.
|Number of images||Eye state recognition accuracy|
|Proposed method||Projection method in Ref. 14|
|Number of images||Mouth state recognition accuracy|
|Proposed method||Projection method in Ref. 14|
In this paper, we proposed a method for detecting the eye and mouth state by extracting contour features. In each step, we presented new algorithms and modifications to achieve better results. The eye contour is fitted by extracting the sclera border points, and the eyelid closure value is defined according to the smallest circumscribed rectangle of the eye contour to determine whether the eye is open or closed. When the mouth is open, the internal contour of the mouth is extracted according to its external rectangle to define the mouth opening degree for analyzing the mouth open state. The experimental results demonstrate that the proposed algorithm has a high accuracy rate in different environments, and it does not require training data to improve the computational efficiency. The proposed algorithm can be used in future energy vehicles,32 intelligent vehicles, and advanced driver assistant systems and can be further analyzed to determine whether a driver is fatigued by detecting the state of the eyes and mouth. When driver fatigue is detected, it can issue a warning to remind the driver to pay attention to driving in order to reduce traffic accidents. Our future work is to build a fatigue model, which can provide a warning signal to remind the driver when he/she appears to be demonstrating fatigue symptoms.
This work was supported by the State Key Program of National Natural Science Foundation of China (61631009) and the Jilin province science and technology development project of China (20150204006GX).
Yingyu Ji received his BS degree from the School of Information Technology, Hebei University of Economics and Business, Hebei, China, in 2016. He is currently pursuing his MS degree with the College of Communication Engineering of Jilin University. His major research interests include pattern recognition and image processing.
Shigang Wang received his BS degree from Northeastern University in 1983, his MS degree in communication and electronics from Jilin University of Technology in 1998, and his PhD in communication and information system from Jilin University in 2001. Currently, he is a professor of communication engineering. His research interests include multidimensional signal processing and stereoscopic, multiview video coding, and so on.
Yang Lu received her BS degree from the College of Communication Engineering, Jilin University, Jilin, China, in 2014. Currently, she is pursuing her PhD in the College of Communication Engineering of Jilin University. Her major research interests include pattern recognition and image processing.
Jian Wei received his BS degree from Jilin University in 2008, his MS degree in communication and information systems from Jilin University in 2011, and his PhD in informatics from Tuebingen University in 2016. Currently, he is working in the Department of Communication Engineering, Jilin University. His research interests include multiview stereo and 3D display technology.
Yan Zhao received her BS degree in communication engineering from Changchun Institute of Posts and Telecommunications in 1993, her MS degree in communication and electronic from Jilin University of Technology in 1999, and her PhD in communication and information system from Jilin University in 2003. Currently, she is a professor of communication engineering. Her research interests include image and video processing, image and video coding, and so on.