In recent times, with the proliferation of digital television (TV), internet protocol (IP) TV, and smart TV, all of which provide a variety of multimedia services as well as conventional TV channels, the broadcast environment has changed significantly. A TV is no longer a passive device for receiving television broadcast signals; it has evolved into a smart device. A TV now offers multimedia services, such as video on demand, social network services, and teleconferencing, in addition to multichannel television broadcasting. The popularity of smart TV has recently increased as a result of the various functionalities it provides; however, this technology is in need of a more convenient interface design. Gesture recognition has been researched intensively using the Kinect device from the Microsoft Corporation in this regard.12.–3 Other research projects have proposed methods based on eye gaze-detection technology using near-infrared (NIR) light cameras and illuminators.45.6.–7 However, the devices based on additional NIR cameras and illuminators are difficult to use with smart TVs owing to their large size and high cost, and using conventional (visible light) web cameras can resolve these drawbacks for gaze-detection technology. Previously, Nguyen et al.8 used a conventional (visible light) web camera for gaze detection with a smart TV. They measured horizontal head pose using face boundary and facial feature positions, and estimated vertical head pose using facial features and shoulder positions. In particular, the horizontal head pose was measured using the distance from the left face boundary to the left eye and from the right eye to the right face boundary. In addition, the vertical head pose was measured using the distance between the center of the face and the shoulder, measuring the center of the face region from the average position of the eyes and the center of both nostrils. However, it is difficult to accurately detect shoulder positions because of individual variation in shoulder shapes and accurate detection of the shoulder line requires much processing time. Corcoran et al.9 and Zhang et al.10 proposed gaze-tracking methods that utilize visible light cameras. However, in these methods, the freedom of the user’s head movement is limited and the Z distance between the user’s face and the gaze-tracking system is short, which is an obstacle to such system applications to TVs. Mardanbegi and Hansen11 proposed a gaze-detection method for 55-in. TVs; however, they used a wearable gaze-tracking device shaped as a pair of glasses, which may be inconvenient for users.
In order to solve these problems, we propose a new gaze-detection method. Using the user-dependent facial information obtained in an initial calibration stage, head pose can be calculated accurately. Further, the horizontal and vertical head poses can be calculated with theoretical and generalized models of the changes in facial feature position. In addition, accurate gaze positions on the TV screen can be obtained based on the user-dependent calibration information and calculated head poses by using a single low-cost conventional web camera without an additional device for measuring the Z distance.
The remainder of this paper is organized as follows. An overview of our proposed gaze-tracking system and the methods used is presented in Sec. 2. Experimental results are presented in Sec. 3. Finally, we discuss our findings and conclude the paper in Sec. 4.
Overview of the Proposed Method
Figure 1 shows the environment in which our proposed gaze-tracking system for smart TVs is used. A conventional web camera equipped with a zoom lens () is used. The image resolution captured by the camera is . The Z distance between the user and the TV screen is , and the size of the TV screen is 60 in.
In the user-calibration procedure, RGB color images are first captured by a camera equipped with a zoom lens (step 1 in Fig. 2). In step 2, the user turns his head to look at five points on the TV screen—center, left-top, right-top, right-bottom, and left-bottom—which helps in data acquisition of facial features in step 3. In the gaze-detection procedure, RGB color images are captured by the camera (step 4). In step 5, the face regions of the input images are detected using adaptive boosting (AdaBoost) face detection,12 and the detected face regions are tracked continuously using the adaptive mean shift (CamShift) method in step 6.13 In steps 7 and 8, the eye and nostril regions are detected with adaptive template matching (ATM) and sub-block-based template matching, respectively. Based on the detected eyes and nostril regions, the head pose is estimated and the gaze’s position on the TV screen is determined in step 9.
In order to estimate a user’s head poses, an initial calibration must be performed. In the initial stage, the user is positioned centered in front of the smart TV at a Z distance (between user and camera) of 2 m. The user then looks at five points on the TV: center, left-top, right-top, right-bottom, and left-bottom, by rotating his face as shown in Fig. 3.
The face and eye regions of the input image are detected using the AdaBoost method.8,12 The nostril area is located using sub-block-based template matching. Figure 4 shows five examples of face images acquired in the calibration stage. To remove the background area from the detected face region, the face region is redefined based on the detected eye positions, as shown in Fig. 4. In addition, the distance between the eyes ( of Fig. 4) of the frontal face is obtained. In addition, the distances between the eyes and the nose ( and of Fig. 4) are obtained from the two images when a user is looking at the left-bottom and right-bottom positions. These distances are used as reference values for calculating the head pose in the gaze-detection step in Fig. 2(b).
Face and Facial Feature Detection in the Gaze-Detection Stage
In the gaze-detection stage of Fig. 2(b), information about the succession of frames can be used; the face region detected with AdaBoost is tracked using the CamShift method.13 This is because although the AdaBoost method shows high accuracy of face detection, it requires a long processing time. Face tracking using the CamShift method has the advantages of processing speed and being less affected by the variations of head pose. The CamShift algorithm has been widely used for object detection and tracking.13 This algorithm is based upon the MeanShift method,14 and both the MeanShift and CamShift methods usually use color histograms of the target to be tracked. Considering the probability distributions of the target that change in time, the CamShift algorithm tracks the object.15 In our method, we use the CamShift method based on color images (the hue histogram of HSV color space). Because the CamShift method tracks the target based on the correlation of the histogram, it is less affected by the head pose than pixel-based matching, and it has a fast processing speed.
The eyes are detected and tracked using AdaBoost and ATM, respectively, within the predetermined area of the detected face box. Only in the first frame, the eyes are detected by the AdaBoost method, and they are tracked by ATM in successive frames. That is, in the second frame, the left and right eyes are detected with the two templates that were determined by the AdaBoost method in the first frame. If the eyes are successfully detected in the second frame, the templates are updated with the newly detected eye regions, and these are used for template matching in the third frame. Owing to the template-update scheme, this method is called adaptive template matching. If they are not detected in the second frame, the eyes are located by the AdaBoost method within the predetermined area, based on the detected face region in the second frame. This procedure is iterated in successive images. For template matching, the correlation value between the template and the corresponding region to be matched is calculated by moving the region with an overlap of 1 pixel in the horizontal and vertical directions, respectively. The position that maximizes the correlation value is determined to be the final matching area.
Further, the nostril region is detected using a nostril-detection mask, and it is tracked using ATM. That is, in the first frame, the nostril region is detected using the nostril-detection mask, and it is located using ATM from the second frame on. If it is not located in the second frame, the nostril area is detected by the nostril detection mask within the predetermined area based on the detected face region in the second frame. This procedure is iterated in successive images.
The nostril-detection mask is designed based on the shape of the nostrils. In general, nostrils exhibit similar shape characteristics, i.e., the intensity differences between the nostril and its neighbors (skin region) are significant. Using this property, nostril-detection masks were defined as shown in Fig. 5, and sub-block-based template matching was performed.8 Figures 5(a) and 5(c) are frontal face and nostril-detection masks, respectively. Figures 5(b) and 5(d) show the rotated face and nostril-detection masks, respectively. The numbers of sub-blocks in Figs. 5(c) and 5(d) are and , respectively. When the face is rotated as shown in Fig. 5(b), it is often the case that the two nostrils overlap, and we use the mask with only one black area, as shown in Fig. 5(d). Because it is difficult to know whether the face is oriented frontally or rotated, matching with both masks, in Figs. 5(c) and 5(d), is performed. If the matching value using the mask in Fig. 5(c) is less than a predetermined threshold, further matching using the mask in Fig. 5(d) is performed.
The nostril region varies according to the Z distance between the camera and the user. Therefore, we change the size of the mask based on the Z distance estimated from the width of the detected face box. Accordingly, if the estimated Z distance is large, we use a small mask, and vice versa.
For sub-block-based template matching, we compute the average intensity of each sub-block in Figs. 5(c) and 5(d). Within the searching area of the nostril region, sub-block-based matching is iterated by moving the masks of Figs. 5(c) and 5(d) by an overlap of 2 pixels in the horizontal and vertical directions. If the average intensity of the black sub-blocks in Figs. 5(c) and 5(d) is lower than other sub-blocks, this location is assigned to the nostril candidate. Among the candidate areas, the region that has a maximum value of intensity differences between the black sub-block and its neighboring sub-blocks is determined to be the nostril region. To reduce computational complexity, the integral image technique is used to calculate the average intensity.12 The detected nostril region is tracked using ATM in successive frames.
Head Pose Estimation
We estimate the head pose based on the detected eye and nostril positions. In order to estimate rotation in the direction of the axis (horizontal rotation), we utilize the change in distance between the two eyes. Specifically, we calculate the rotation angle in the direction of the axis based on the distance between the eyes in the rotated-face image and in the front-facing image in the calibration stage (gazing at the center position of the TV screen in Fig. 4).
Figure 6(a) is a frontally directed face in the calibration stage, and Fig. 6(b) is a face rotated in the direction of the axis in the gaze-detection stage. The angle in Fig. 6(b) is the face rotation angle in the direction of the axis; it can be calculated as follows.Fig. 6(b) is calculated using Eq. (2) as follows:
In Eq. (6), and are the distances between both detected eyes in the capture image during the calibration stage and the current input image, respectively. and are measured from the widths of face box detected in the image captured in the calibration stage and the current input image, respectively.
In Fig. 6(b), it is difficult to determine whether the face is rotated in the clockwise or counterclockwise direction with Eq. (6) alone. Therefore, we measured the X-distance (horizontal distance) between the left eye and the nostril, and between the right eye and the nostril, as and , respectively. If is larger than as shown in Fig. 6(b), the face has been rotated in the counterclockwise direction. If is smaller than , the face is determined as having been rotated in the clockwise direction.
If the face is rotated in the direction of the axis (vertical direction), we can calculate the rotation angle in the direction of the axis based on the changing distance between the eyes and the nostril, as illustrated in Fig. 7. Figure 7(a) is a face (when a user is looking at the left-lower or right-lower position) in the calibration stage, and Fig. 7(b) is the rotated face in the direction of the axis in the gaze-detection stage. As shown in Fig. 1, the camera is positioned below the TV screen and a user gazes at the position on the TV screen. So, the image plane of the camera is positioned below the gaze vector extending from the user, as shown in Fig. 7(b). Angle in Fig. 7(b), the face rotation angle, can be calculated as follows.
Distance in Fig. 7(a), the distance between the eyes and the nostril, is measured from the face images when a user is looking at the left-lower and right-lower positions of Fig. 4 during the calibration stage ( and of the left-lower and right-lower images of Fig. 4). The reasoning behind why we used face images of a user looking at the left-lower and right-lower position, instead of an image of the user gazing at the center position [as with the measurement of of Fig. 6(a)] goes as follows. Since the camera is positioned below the TV, when a user gazes at the (left or right) lower position, the resolution of distance in Fig. 7(a) is maximized. Because the vertical head pose is measured based on the distance , it is necessary to obtain the largest distance for the best accuracy in estimating the vertical head pose.
Distance follows the following relationship:Fig. 7(b) is calculated with Eq. (12) as follows:
In Eq. (16), and are measured from the distance between the detected eyes and nostrils in the image captured in the calibration stage and the current input image, respectively. and are measured from the width of face box detected in the image captured in the calibration stage and the current input image, respectively. The width () of a user’s detected face box is obtained in the image of initial calibration stage. If the width () of the face box in a current image is compared to , the change in the Z distance can be estimated based on the camera-perspective model. Since the initial calibration is done at a Z distance () of 2 m, the Z distance [ of Eq. (18)] in the current frame is measured as from Eqs. (17) and (18).
In order to estimate the gaze’s position on TV screen, the five rotation angles acquired when a user looks at the five positions in the calibration stage are used. By using these angles, the user-dependent thresholds that define the candidate areas for gaze position are calculated. Based on thresholds and the head pose of and in Figs. 6(b) and 7(b), respectively, we obtain the final gaze position on the TV screen.
We tested our proposed gaze tracking method using a desktop computer with an Intel Core™ I7 3.5 GHz CPU and 8 GB of RAM. Our proposed algorithm was implemented using Microsoft Foundation Class-based C++ programming and the DirectX 9.0 software development kit.
To measure the gaze-detection performance of our proposed method, we utilized a database comprising head poses for five people.8 In our experiment, each user was asked to look at nine different positions on a 60-in. TV screen (as shown in Fig. 8) by rotating his/her head up and down and left to right (left-upper, left-middle, left-lower, center-upper, center-middle, center-lower, right-upper, right-middle, and right-lower positions).8 The Z distance between the camera and each user’s face was . Our database had 1350 head pose images ().8 In addition, an additional five images per person were obtained for user calibration when the user gazed at the five positions 1, 3, 5, 7, and 9 indicated in Fig. 8.
We used the strictly correct estimation rate (SCER)8,16 to measure the accuracy of our proposed gaze-detection method. The SCER is the ratio of the number of strictly correctly determined frames to the number of total image frames. For example, if a user gazes at region 2 in an image frame and our system correctly determines that he/she is gazing at region 2, this image is determined to be the correct frame. If 1100 images among 1350 are determined to be correct frames, the consequent SCER is .8 Thus, a higher SCER value signifies a better estimation performance.
Table 1 shows the SCER results obtained with the collected database for both the method at hand and the scheme we proposed previously.8 As shown in Table 1, the average SCER value of all nine target positions is , which means that our method correctly detected the gaze position of of the images. In addition, the average SCER of the proposed method is improved over our previous effort.8 The proposed method outperforms our previous method because of the shoulder detection error, which came from measuring the vertical head pose using facial features and shoulder positions.
Strictly correct estimation rate (SCER) result for each gaze region (target region) for our proposed method and previous one (Ref. 8).
|Target region||SCER (%)|
|Previous method8||Proposed method|
Table 2 shows the confusion matrix for gaze detection results from the proposed method. For example, 7 in the (2, 3) cell (reference gaze positions in Fig. 8, calculated gaze positions) represents that seven times the system predicted the user looked at position 3 although users actually gazed at position 2 of Fig. 8. The diagonal in Table 2 shows the number of correct detections by the proposed gaze-detection method. The matrix confirms that the correct detection rates for all the gaze positions are similar.
The confusion matrix of gaze detection results by the proposed method.
|Calculated gaze positions|
|Reference gaze positions of Fig. 8||1||149||1||0||0||0||0||0||0||0|
Table 3 shows the accuracies of the proposed gaze-detection method for horizontal (column) and vertical (row) rotations. In the first column, 95.6 represents the rate of the system calculating the gaze position to be 1, 2, or 3 when the user actually gazes at position 1, 2, or 3 in Table 2. Table 3 confirms that the horizontal and vertical accuracies are similar.
Horizontal and vertical accuracies of the proposed method (%).
For the next experiment, we measured the influence of the accuracy of facial feature detection/tracking. With the 1350 head pose images () used in Table 1, we obtained noisy data by including the Gaussian random noise in the detected positions of both eyes and nostrils according to sigma values of 0.5, 1.0, and 1.5. As shown in Table 4, the accuracy of gaze detection is not much affected by the Gaussian random noise. Here, the case of sigma value 0 indicates no Gaussian random noise included in the detected positions of facial features.
SCER results for each gaze region (target region) according to the sigma value of Gaussian random noise added to the detected positions of facial features (%).
|Sigma value||Target region|
In addition, we measured the influence of the detection accuracy on each facial feature. In Table 5, with each of the three sigma values for Gaussian random noise added to the detected position of the left eye, we measured the accuracy of gaze detection according to the Gaussian random noise added to the detected position of the right eye.
Average SCER of nine target regions according to the sigma values of Gaussian random noise added to the detected positions of left (S_LE) and right eyes (S_RE), respectively (%).
By a similar method, we measured the accuracy of gaze detection with the detected positions of the left eye and nostrils, and those of the right eye and nostrils, as shown in Tables 6 and 7, respectively. As shown in Tables 5, 6, and 7, the influence of the detection accuracy for each facial feature on the accuracy of gaze detection is similar. In addition, we confirm that the accuracy of gaze detection is not much affected by the Gaussian random noise added to the detected positions of facial features.
Average SCER of nine target regions according to the sigma values of Gaussian random noise added to the detected positions of left eye (S_LE) and nostril (S_N), respectively (%).
Average SCER of nine target regions according to the sigma values of Gaussian random noise added to the detected positions of right eye (S_RE) and nostril (S_N), respectively (%).
Figure 9 shows the successful and unsuccessful results for gaze detection by the proposed method. Figure 9(a) includes images that gave good results. The left, center, and right images of Fig. 9(a) are the cases when the user is gazing at regions 1, 7, and 2 of Fig. 8, respectively. Figures 9(b) and 9(c) are images that gave bad results. The user is actually gazing at region 3 in Fig. 9(b). However, since the rotation (in the lower direction) of his head is small [the head rotation seems to be similar to that of the right image of Fig. 9(a)], his gaze is incorrectly determined as being directed at region 2. The user’s eye rotation explains the small head rotation.
The left and right images of Fig. 9(c) are cases in which the user is gazing at regions 8 and 5 of Fig. 8, respectively. However, due to the user’s eyes blinking, the left eye positions are incorrectly detected, which causes the incorrect detection of gaze position The gazes for the left and right images of Fig. 9(c) were incorrectly determined as being directed at regions 9 and 6, respectively.
For the next test, we measured the accuracy of gaze detection in a case where the training images for the calibration stage and the current testing image are from different sers. With the 1350 head pose images () used to compile Table 1, we randomly selected the cases where the training and testing images are from different persons and measured the accuracy (SCER) of gaze detection. The accuracy was 59.4%, which is much less than that for user-dependent training data in Table 1. The reasons are as follows.
As shown in Figs. 6 and 7, and are obtained from the distance between two facial features in the images obtained in the initial user-dependent calibration stage. and are obtained from the current captured image. In addition, and are measured by the widths of face boxes detected in the image of the initial user-dependent calibration stage and the current input image, respectively.
If the training images in the calibration stage and the current testing image are of different people, of Fig. 6(a) is different from that of Fig. 6(b). In addition, of Fig. 7(a) is different from that of Fig. 7(b). These do not satisfy our assumption that and are equal in the initial calibration stage [Figs 6(a) and 7(a)] and current input image [Figs. 6(b) and 7(b)]. In addition, the actual width of the face in the calibration stage [Figs. 6(a) and 7(a)] is different from that in the current input image [Figs. 6(b) and 7(b)], which does not match our assumptions, either. As shown in Figs. 6 and 7, and are calculated from , and and are obtained from . In addition, and are measured based on the actual width of the face.
So, we inevitably obtain inaccurate , , and [Eqs. (6) and (16)], which degrades the accuracy of the estimation of () of Eq. (6) [Eq. (10)], and () of Eq. (16). Consequently, the gaze-detection accuracy is much reduced.
Next, we compared the processing time of the proposed method with that of our previous effort.8 Experimental results showed that the processing time of the method at hand was , which is much faster than that of the previous method ().8
In the next test, we performed the experiments with two open datasets, the CAS-PEAL-R1 database17,18 and the FEI face database.19 Although many open face databases exist, such as the AR database and Pal database, few include a variety of poses for each face. The CAS-PEAL-R1 database includes 30,900 images of 1040 Mongolian subjects (595 males and 445 females). Among them, 21,840 images () with pose variations were acquired according to different camera positions (C1 to C7) as shown in Fig. 10.17,18 The image resolution is .
We use only nine images that were captured by C3 to C5 of Fig. 10 for each face in the CAS-PEAL-R1 database for our experiments because we divide the gazing regions into a grid as shown in Fig. 8. We assume that the upper, middle, and lower images from C5 are obtained when a user gazes at the 1, 2, and 3 positions of Fig. 8. C4 corresponds with the 4, 5, and 6 positions similarly, as does C3 with the 7, 8, and 9 positions of Fig. 8.
The remained images from C1, C2, C6, and C7 in Fig. 10 are not used for our experiments because severe rotation of head occurs, which does not happen when a user looks at a TV normally. In addition, one of the eyes or nostrils is occluded due to the severe rotation of the head. Consequently, a total of 9360 images () were used for our experiments. We used nine images for each face in the database, and five images (upper and lower of C5, middle of C4, and upper and lower of C3) among nine were used for user calibration, as in Fig. 4. So, we define five images as the training set and the remaining four as the test set. Thus, the accuracies of our gaze-detection method were measured with the training and test sets as shown in Table 8.
SCER results with CAS-PEAL-R1 face database.
|Reference gaze positions||Training sets||Test sets|
|Average of total sets (%)||86.15|
As shown in Table 8, the average SCERs for the training and test sets are similar, and the average SCER of all of the sets is . Figure 11 shows examples of the detection results of facial features using the CAS-PEAL-R1 database.
The FEI open face database consists of 2800 images of 200 subjects (100 males and 100 females). Among them, 2200 images () include pose variations. Ten images were obtained by profile rotation (up to ) with each person in an upright frontal position, and one additional frontal image was acquired.19 The image resolution is . All participants were between 19 and 40 years and were Brazilian. Figure 12 shows examples from the FEI face database.
The FEI face database does not include images with rotations in the direction of the axis (upper, middle, and lower directions). Therefore, we define new gazing positions of a 1 (row) by 7 (column) grid on the screen (instead of using the positions of a grid in Fig. 8), and the seven images of Figs. 12(a) and 12(b) are used for experiments, while assuming that they are obtained when the user gazes at these seven positions (1 to 7). Among these seven images, we used three in Fig. 12(a) for user calibration (training), which are assumed to be obtained when the user gazes at the 2, 4, and 6 gazing positions, respectively. The other four images in Fig. 12(b) were used for testing, which are assumed to be obtained when the user gazes at the 1, 3, 5, and 7 gazing positions, respectively. The remaining four images of Fig. 12(c) were not used for experiment because severe head rotation occurs, as discussed above. Therefore, 1400 images () were used for our experiments. Table 9 shows the SCER results from the FEI face database.
SCER result with FEI face database.
|Reference gaze positions||Training sets||Test sets|
|Average of total sets (%)||88.21|
As shown in Table 9, the average SCER of all the sets is . This lower SCER of testing data compared to that of training data can be explained by it often being the case that the head rotations in images 1 and 2 of Figs. 12(a) and 12(b) are similar in the database. In addition, the head rotations in images 6 and 7 of Figs. 12(a) and 12(b) tend to be similar. Figure 13 shows examples of detection results for facial features from the FEI database.
In our experiments, we used three databases (the database we collected, CAS-PEAL-R1, and FEI databases). The rough size () of faces in the database we collected is . Those in the CAS-PEAL-R1 and FEI databases are and , respectively.
Our research aims at developing gaze detection for use in a smart TV based upon a conventional, low-cost web camera without needing a high-power zoom lens and additional devices for pan and tilt functionalities. So, eye gaze cannot be detected due to the low image resolution in the eye region. Figure 14 shows examples of nine images, which are obtained when a user gazes at the nine positions in Fig. 8 using only eye movement. As shown in Fig. 14, the X- and Y-disparities of eye positions according to each gazing position are very small and the detected eye positions using an AdaBoost eye detector are inaccurate due to the low image resolution of the eye region. So, the detected eye positions are impractical for use in gaze detection. Consequently, we propose that gaze detection in smart TVs be based on natural user head movements and not eye-gaze detection.
In this paper, we proposed a new gaze-detection method that uses a conventional (visible light) web camera. By using facial information obtained in the initial calibration stage, accurate head poses can be calculated. Horizontally and vertically rotated head poses were calculated based upon a geometrical analysis of the changes in facial feature position. The results of experiments conducted indicate that the gaze-detection accuracy for our method using a 60-in. smart TV is 90.5%. When the user’s eye is falsely detected due to blinking eye, the accuracy of the proposed method decreases. In addition, when the user’s head rotation is small due to eye movement accompanying the gaze, the accuracy of the proposed method is also reduced.
In our research, we aim at developing a gaze-detection system using only a conventional visible-light web camera without additional special devices, through which we can reduce the cost and size of our system, and easily adopt our system for applications in smart TV. Thus, a conventional device for measuring the Z distance, such as a Microsoft Kinect, is not considered in our research.
In future work, we plan to combine the facial feature positions and shoulder positions in order to enhance gaze-detection accuracy. Further, we plan to incorporate facial texture information into the calculation of the gaze position.
This research was supported by the Korea Communications Commission, Korea, under the title of Development of Beyond Smart TV Technology (11921-03001). The research in this paper uses the CAS-PEAL-R1 face database collected under the sponsor of the Chinese National Hi-Tech Program and ISVISION Tech. Co. Ltd.
Won Oh Lee received his BS degree in electronics engineering from Dongguk University, Seoul, Republic of Korea, in 2009. He is currently pursuing a combined course of MS and PhD degrees in electronics and electrical engineering at Dongguk University. His research interests include biometrics and pattern recognition.
Yeong Gon Kim received his BS and MS degrees in computer engineering and electronics and electrical engineering from Dongguk University, Seoul, Republic of Korea, in 2011 and 2013, respectively. He is currently pursuing his PhD degree in electronics and electrical engineering at Dongguk University. His research interests include biometrics and pattern recognition.
Kwang Yong Shin received his BS in electronics engineering from Dongguk University, Republic of Korea, in 2008. He also received a combined MS and PhD degrees in electronics and electrical engineering at Dongguk University in 2014. He is a researcher at the Korea Research Institute of Standards and Science. His research interests include biometrics and image processing.
Dat Tien Nguyen received his BS degree in electronics and telecommunication technology from Hanoi University of Technology, Hanoi, Vietnam, in 2009. He is currently pursuing a combined course of MS and PhD degrees in electronics and electrical engineering at Dongguk University. His research interests include biometrics and image processing.
Ki Wan Kim received his BS degree in computer science from Sangmyung University, Seoul, Republic of Korea, in 2012. He is currently pursuing his MS degree in electronics and electrical engineering at Dongguk University. His research interests include biometrics and image processing.
Kang Ryoung Park received his BS and MS degrees in electronic engineering from Yonsei University, Seoul, Republic of Korea, in 1994 and 1996, respectively. He received his PhD degree in electrical and computer engineering from Yonsei University in 2000. He has been a professor in the Division of Electronics and Electrical Engineering at Dongguk University since March 2013. His research interests include image processing and biometrics.
Cheon In Oh received his BS degree in electronic engineering from the Sungkyunkwan University, Suwon, Republic of Korea, in 2005 and his MS degree from the University of Science and Technology, Daejeon, Republic of Korea, in 2007. He is now a senior researcher at the ETRI, Daejeon, South Korea. His research interests lie in the areas of broadcasting system, with particular emphasis on audience recognition, and advertising services/system.