Recently, with various stereoscopy technologies commercialized, more three-dimensional (3-D) applications have been accepted as an element of modern life. Three-dimensional televisions (3-DTVs) and 3-D movie theaters are also becoming popular. However, the development of 3-D technology is facing some critical barriers, specifically stereoscopic visual fatigue. Visual fatigue caused by conflict between accommodation and convergence is unavoidable in most stereoscopic applications. As described in Refs. 1 and 2, although viewers are able to perceive a smooth 3-D watching experience after resolving the visual conflicts, a series of fatigue can be incurred (such as eyestrain and headaches), which is usually experienced after about 20 min of observation on 3-D displays. In order to ensure the safety of 3-D applications, it is essential to measure visual fatigue for stereoscopic images. Thus, many studies have investigated the visual fatigue of stereoscopy.34.5.–6
Figure 1(a) describes the main measurement schemes existing in 3-D visual fatigue research: the mean opinion score (MOS)-based scheme, the contact and contactless physiological feature (CLPF)-based scheme [such as electroencephalogram (EEG), electrocardiograph (ECG), and eye movement (EM) detection]. As noted by Kim and Cho,7 the MOS is to measure subjective 3-D visual fatigue using questionnaires that have high correlation with the subjective 3-D visual fatigue. Such as question “How much do you feel visual fatigue?” and answers “comfortable, a little uncomfortable, uncomfortable.” The CLPF, as shown in Kim et al.,8 and Chae et al.9 designs a visual fatigue measurement model using eyes’ response curve and blink frequency. According to the result of eye tracking, they determine the level of visual fatigue in stereoscopy. The contact physiological feature (CPF) as described by Gomarus et al.10 and Fang et al.11 is a measurement model based on records of electrical activities to visual fatigue. The level of stereoscopic visual fatigue is determined by the reflection of bio-signals on human body.
However, both subjective and objective measurements have their own advantages and defects. Unfortunately, in most studies, they ignore the influence of extraneous state variable (e.g., the human body and testing environment). For this reason, with the same test method on different subjects, the results of measurement may have a significant deviation. Therefore, we develop a measurement model based on a strong correlation structure (the BN structure) as depicted in Fig. 1(b) that can reliably recognize stereoscopic visual fatigue.
Figure 1(b) shows our proposed measurement model on a BN structure. The feature vector (node) is comprised on the BN tree. The results of each node are fused with BN inference algorithm, and then the final fusion result could be inferred according to the probability values of different variable states. To the best of our knowledge, this is the first adaptation of a probabilistic framework on the BN structure for inferring the 3-D viewer’s state of visual fatigue. As opposed to the previous works described in Refs. 4, 5, 8, 10, and 12, our proposed model does not employ a single physiological feature as a decision factor, but deals with probability values of different variables’ states from interdependencies between aspects of both observation and contextual features.
The organization of this article is as follows. After a brief introduction in Sec. 1, Sec. 2 introduces the background and related work for this study. Section 3 describes the BN-based 3-D visual fatigue measurement framework. Section 4 presents the experimental results. Finally, Sec. 5 summarizes the article.
Background and Related Work
Visual Fatigue Description in Stereoscopic
A binocular vision is produced when we use two separate images corresponding to the left and right eyes, although slightly different, merged in viewer’s brain to build a common impression.13 Hodges and McAllister14 describe the method of right and left perspective view in the 3-D display. Based on binocular parallax, the 3-D screen that can be implemented, relies on the format of the image presented and the viewing format. Figure 2 illustrates the watcher experiencing a stereoscopic sensation on images depending on presenting the appropriate view to each eye on a 3-D screen. Also, by improving depth perception, we can feel an added realism for stereoscopy. Although stereoscopic imagery can be presented on 3-D displays, it violates the relationship of natural viewing in the real world. In Fig. 2, the viewer observes a real object or an image on a two-dimensional (2-D) device, the eyes accommodate (focus on) and converge to a specific point. Accommodate distance matches with the convergence distance. Conversely, a viewer obtains a stereoscopic image on 3-D display, the remaining focus point is also on the plane of screen, while the eyes convergances of the image are located at a different distance. Because of the breakdown of the relationship between the accommodation and convergence, a visual discomfort is caused.
For 3-D comfort evaluation, Choi et al.15 identify some factors to capture the spatiotemporal characteristics of disparity. The prediction of visual comfort is determined by factors fusing. Figure 3 illustrates types of disparity during stereoscopic viewing. Two disparities are indicated on the coordinate plane, positive (uncrossed) and negative (crossed) disparities by blue and red zone.13 In Fig. 3, the horizontal gray line position of display represents zero disparity planes. A zero disparity plane is a converged domain of stereoscopic imaging, and also the zero disparity area is commonly referred to as a comfortable zone of stereoscopic imaging.16,17 Depending on the stereoscopic disparity, different 3-D imaging positions can be implemented, such as in front of or behind the screen. Stereoscopic disparity refers to the difference in image location of one object viewed by the left and right eyes. When a 3-D camera captures a stereoscopic image, each lens separately converges on the main object, and generates stereoscopic disparity. The main object can be seen as a single image, but the background would be seen as double images with disparity.
In Fig. 3(a), the positive disparity in the stereoscopic image corresponds to the uncrossed line. In Fig. 3(b), the negative disparity corresponds to the crossed line. The negative disparity exhibits crosstalk that occurs between accommodations of each eye. In addition, the negative disparity shows a larger disparity and object size than positive disparity, since the imaging in negative disparity is closer than in positive disparity. This phenomenon is related to the geometry of a binocular viewing. Therefore, negative disparity can incur more visual fatigue than positive disparity.17,7 Yilmaz and Gudukbay18 point that the crosstalk (or ghosting effect) is the faded image viewed by the untargeted eye. This effect is undesirable because it may cause visual fatigue and other problems. Gudukbay and Yilmaz19 indicate that a more comfort stereo view can be achieved in terms of reduced crosstalk (or ghosting effect).
Visual Fatigue Measurement Model Description
Body fatigue can be easily tracked from observable physiological features.20,21 This scheme is considered the relatively objective method for visual measurement. Physiological features may be classified into: The contactless and the contact features. Contactless features contain the EMs, head movement, etc., and these movements can be easily detected from a real-time monitor. Contact features contain the brain activity, heart rate variability, etc., and these movements can be detected by EEG, ECG, and other bio-sensor systems.
The CLPF-based scheme focuses on inferring the fatigue from the contactless features. Ji et al.22 demonstrate that the human in fatigue should exhibit some visual cues in long-time visual experiments. Horng et al.23 present a fatigue measurement algorithm depend on the eye tracking and dynamic matching. Kim et al.24 construct a neural network-based scheme for fatigue recognition by detecting the movement of the mouth and eyes, respectively.
The CPF-based scheme focuses on inferring the fatigue from the contact features. For example, the EEG can represent abundant information on the human cognitive states, according to the detection in the major EEG bands (, , , and ). Lal et al.25 present a fatigue recognition algorithm on different levels of EEG bands. Also, Jung et al.,26 and Wilson and Bracewell27 propose a method to estimate and predict the fatigue level based on the EEG power spectrum estimation and fuzzy neural network model. According to the main electroencephalography (EEG) activities (, , , and ) for 52 subjects (36 males and 16 females) during fatigue measurement, Budi et al.21 found that and activities is stable over time, but there is a slight decrease for activity of , and a significant decrease for activity of . For the other important CPF ECG signal, in Refs. 28 and 29 fatigue recognition refer to heart exhibition on low frequency (LF), very low frequency (VFH), high frequency (HF), and the LF/HF ratio.
Previous physiological feature-based schemes focus only on a single specific aspect. That may lead to inaccurate results because the fatigue is not directly observable, which can only be inferred from the information available. There are a number of reasons for the inaccuracies using the scheme mentioned above: (1) Contextual factor. Fatigue recognition contains much subjectivity that cannot always reflect the real objectivity. (2) Environment factor. For example, when human is present in a not well acquainted environment,30 an inaccurate interpretation of the facial expression (such as eye and mouth movement) would be caused, especially for the introverted persons. Therefore, to fuse as many as possible features from uncertain events is a better way to make an accurate inference.31 Further, Picard et al.32 figured out that it was necessary to fuse the contextual and physiological features and the human performance in order to make the fatigue measurement more reliable.
By considering the evidence and beliefs of the contextual information and physiological features from measurement, Ji et al.22 construct a BN-based algorithm to infer and predict the fatigue of human beings, enhancing the reliability of fatigue detection. Yang et al.20 develop a BN-based fatigue recognition model to be used in systems that evolve over time. However, such visual fatigue network in Refs. 20, 22, 3334.35.–36 mostly apply to driving, visual display terminals monitoring, and marine industry. To the best of our knowledge, there is no relating issue on stereoscopic visual fatigue based on probabilistic framework or BN. Eventually, considering the states and beliefs of contextual information and physiological features, a novel probabilistic framework-based (the BN-based) measurement model for stereoscopic visual fatigue is proposed in this article.
Bayesian Networks Method Description
Hubbard37 describes uncertainty as the lack of certainty, a state of having limited knowledge where it is difficult to infer precisely the existing state or future outcome. Decision making is generally recognized by engineers as an indispensable part of the whole engineering design process. Just as most fatigue recognition, the stereoscopic visual fatigue measurement is also comprised of a number of uncertainty factors. Because of the fact that uncertainty has a significant impact on judgment, the engineer tries to manage uncertainty via compound methods and intelligent systems. The most reliable tool for modeling uncertainty is the use of probabilities theory.35
One of the most prevalent and effective graphical models to manage uncertainty is the BNs.38 A BN, belief network or directed acyclic graphical model, is a probabilistic graphical model that correlates the conditional dependencies of a number of random variables with the use of a Directed Acyclic Graph (DAG). A DAG is a directed graph with no directed cycles. The formation of a DAG includes vertices and directed edges, each edge connecting one vertex to another so that a cyclic route is impossible to appear. Figure 4 shows an implementation of DAG in our application.
The basic concept in the Bayesian treatment of certainties in causal networks is conditional probability. Whenever a statement of the probability of an event is given, then it is given conditioned by other known factors. Therefore, according to the feature vector mentioned above and conditional probability, the probability of estimated fatigue is obtained through Bayesian theorem in Refs. 20 and 39:
• represents the fatigue node, and represents the fatigue state value.
• represents the evidences , represents the contextual evidences and represents the observations.
• represents the posterior probability of given , and hence it is the new estimation for the probability that the hypothesis is true, taking evidence into account.
• represents the conditional probability of observable evidence , if the hypothesis turns out to be true.
• represents the prior probability of hypothesis before providing contextual evidences.
• represents the marginal probability, which is the prior probability under all possible fatigue hypotheses.
BN-Based Visual Fatigue Measurement Implementation
To set up a fatigue recognition model based on the discrete BN, the first step is to specify the nodes of the discrete BN. In other words, we need to specify the contextual, contactless and contact physiological variables that are used to construct the discrete BN. The second step is to determine the values that are used to represent the discrete variables. The third step is to configure the states of the variables, to calculate the conditional probability, and to evaluate the visual fatigue in stereoscopy. In the following, these steps are described.
Specifying the Nodes of the Discrete Bayesian Networks
As remarked in Fig. 4, there are many contextual and physiological features related to fatigue. Among these features, some of them lead to more contributions to fatigue while others have lesser contributions to the fatigue. For the sake of simplicity but without any loss of generality, we only select those contextual and physiological features that have immediate relations with the fatigue measurement. In particular, the following features are described in step 1. For the contextual, hidden and observable selected in Fig. 4, the fuzzy method is used to determine the discrete values for each variable based on a set of heuristic knowledge rules.40
Stereoscopic contextual features node
Binocular disparity (BD) node. Lambooij et al.41 noted that the human eye experiences conflict between the accommodation and vergence that mostly affect visual fatigue in stereoscopy. Ohzawa et al.13 classified the disparity as positive disparity and negative disparity. Kim and Cho7 suggested a simplified relative visual fatigue metric that considers the “accommodation and vergence” factors that can be calculated by the disparities in stereoscopy. We are motivated by Ohzawa et al.13 and Kim and Cho.7 As exhibited in Fig. 5, several sets of different stereoscopic instances were provided to evaluate visual fatigue. The different sample image in the negative disparity zone and in the positive disparity zone has been shown in experiment for 3-D fatigue measuring.
Display quality (DQ) node. As Michel et al.12 described, with 3-DTV and 3-D cinema at the extremes of the screen size spectrum, comfort zone issues for stereoscopy are different when trying to use them to present the same content. Apparently, resolution and luminance are also key elements of display. For example, an unsuitable resolution and luminance also causes a visual discomfort. However, among these features, the screen size has immediate relations with the DQ on issues that are our concern as mentioned in Refs. 12 and 42. Therefore, the display size is taken as a main contextual features corresponding to the DQ nodes.
Nonstereoscopic contextual features (NSCF) node
Sleeping quality (SQ) node. SQ is immediately associated with the fatigue.29 Therefore, we take the SQ as a nonstereoscopic node on the BN DAG (Fig. 4). Gomarus et al.10 noted that the SQ is related to such quantities as the duration of sleep, difficulty in falling asleep at night, the sleeping environment, and so on. Among them, the sleeping time and the sleeping satisfaction were taken as the key contributors to the SQ, since a certain minimum sleep time is necessary for everyone, and also whether the SQ is satisfied depends on the human’s subjective judgments.
Circadian rhythm (CR) node. CR is also a cardinal factor in the fatigue measurement. Lal and Craig43 identified that the CR plays an important role in the study of the fatigue recognition. There are two sleep peaks each day, one of which appears after midnight, and other appears approximately after lunch time. Humans are easily fatigued during these peak periods.
Experiment environment (EE) node. EE is the last selected factor by the proposed method. Apparently, light, noise, temperatures, and other EE factors have a strong relation with fatigue measurement, especially the light influence to the viewer on the screen. Therefore, we take the EE as a nonstereoscopic node on BN graph.
Observation state node
EEG node. In the frequency domain, the EEG mainly includes the band (0.5 to 4 Hz) corresponding to the sleep activity, the band (4 to 7 Hz) that is related to drowsiness, the band (8 to 13 Hz) corresponding to relaxation and creativity, and the band (13 to 25 Hz) that corresponds to activity and alertness. Budi et al.21 note that the band has strong relations with visual fatigue. Through the variations in the EEG tracing, the power of frequencies increase as watching duration increases, and it is much stronger in 3-D rather than in 2-D conditions, as shown in Fig. 6(a). Li et al.44 identified that the 3-D content affected the power of brain wave in the frequency. The power was stronger at viewing the 3-D contents. Also, subjective results also showed more strong visual fatigue in the 3-D condition than in the 2-D condition. Therefore, we take the waveband magnitude of the EEG spectrum in the band as an observable variables node in BN diagram.
EM node. The EM-based visual fatigue measurement is related to such quantities such as eye gaze, eye blink, and eyelid closure. These manifestations are described in Ref. 45 for the fatigue detection. Zhu and Lan22 pointed out that EM is a reliable and valid determination of fatigue. In Ref. 46 the percentage of eyelid closure over the pupil in a given time (PERCLOS) is indicated. It illustrates that the viewer is possibly in a state of fatigue if the eyes are at least 80% closed during a period of 1 min. Thus, the proportion of the eye-closed time was taken in this article as one of the observable variables corresponding to the nodes of the BN diagram.
Determining Discrete Variables in Each Node
The construction of BN has two tasks: one is the determination of nodes; and the other is the determination of its parent discrete variables and their states for each note. In the previous step, the related nodes are determined. While in the following section in step 2, we describe the discrete variables and their states that indicate the likelihood of a particular feature that contributes to the fatigue.
Visual fatigue node: in which and represent the fatigue and no-fatigue states, respectively.
Contextual features node: represents the nonstereoscopic factor node state, in which , , and represent the sleep quality, CR and EE, respectively. Here, in which and represent the sleep parameters, including the sleep time and sleep satisfaction. represents the stereoscopic factor node, in which and represent the binocular disparity and DQ, respectively.
Observation features node: represents the observation features node, in which represents the CLPF (e.g., EM), and represents the CPF (e.g., EEG).
As remarked in Fig. 4, , , , and denote the specific values taken by , , , and , respectively. In Fig. 4 the variables, together with the directed edges, form the DAG. represents the probability of the sleep quality node states , CR node states and EE node states ; represents the probability of the binocular disparity node states and DQ node states ; represents the probability of the contact physiological node states (EEG node) and contactless physiological node states (EM node) .
Calculating Bayesian Networks
Assume that the evidences from the contextual nodes are represented as , and the evidences from the observable nodes are represented as , where represents the evidence of the ’th contextual node with the ’th state value ( and ), and represents the evidence of the ’th observable node with the ’th state value (). as evidences from the contextual factor and observable feature nodes, respectively. In Eqs. (2) and (3), is the prior probability of visual fatigue that was inferred before the parents’ contextual evidence was available. is the conditional probability of observable evidence , if the parent visual fatigue turns out to be true.
Then the conditional probability of given the occurrence of the node can be written as in Ref. 39
The conditional probability of given the occurrence of node can be written as in Ref. 39
Simulation Results and Discussion
In this work, in order to acquire the conditional probabilities information for each node, we employ some previous research methods from several literatures. For example, the conditional probabilities information for the BD and DQ nodes are obtained from Refs. 12 and 7. The conditional probabilities information for the CR, SQ, and EE nodes is obtained from Refs. 20, 22, 29, 32, 4748.49.–50. The conditional probabilities information for the EEG and EM nodes is obtained from Refs. 5, 20, and 8. However, some probabilities cannot be directly obtained from these studies; we adopted similar acquisition methods based on our experiments. For instance, binocular disparity comfort judgment is mainly based on personal satisfaction, due to the difference of visual sensing for each person. Here, subjective feeling (like MOS) is considered to be relatively high. In order to obtain this data set, we adopt a statistical analysis scheme to acquire them based on Ref. 7. Finally, depending on these efforts, all probabilities in BN model have been acquired which are shown as following. Table 1 describes the conditional probability that BD node states is the main factor of visual fatigue in stereoscopy. Table 2 describes the conditional probability for visual fatigue as the states of CR, SQ and EE. Table 3 describes the conditional probability for EEG and EM, respectively, as the event of visual fatigue takes place simultaneously.
Conditional probability for fatigue node with BD.
|BD negative||Fatigue node||BD positive||Fatigue node|
Conditional probability for fatigue node with CR, SQ, and EE.
|CR node||SQ node||EE node||Fatigue node|
Conditional probabilities for EM and EEG given fatigue.
|Fatigue||EEG node||EM node|
With the help of the System Neuroscience Laboratory at Sungkyunkwan University, we obtained the EEG and EM data sets. Here, we used EM tracking system called Eyelink II to measure at the 500 Hz temporal resolution. Twenty students from Sungkyunkwan University volunteered to participate in the experiments. Each participant was asked to watch the test 3-D image at different disparities on 3-DTV, and no break or rest was permitted during the 25 min experiment. Due to display limitations (our research only focus on the 3-D-HDTV application), we cannot include a variety of DQ requirements. The EEG and EM signals of each participant were collected at a rate of 1 sample/min. Then results were processed based on the statistical properties to form the evidence data sets that are needed to infer the viewer fatigue estimation. For example, according to the statistical properties of the contactless physiological data from the participants, if the PERCLOS value of EM is equal to 85, , , ; and for the contact physiological data, if the EEG signal indicates that the decreases of rhythms are large, , , .
In order to obtain the probability for CR, SQ, and EE, we adopted a statistical analysis-based questionnaire that mainly concerned the information about the CR, SQ, and EE state. The questionnaires were distributed among the twenty students before the simulation experiment. There are two groups of probability for CR and SQ. For the first group simulation, we required 20 students who did not have any kind of sleep disorder to maintain a relatively good SQ state before the test day, so the probability for SQ were and . We asked the volunteers to participate in the simulation test from 8:30 to 11:30 AM, so the probabilities for CR were and . For the second group simulation, some of the volunteers were deprived of a good sleep during the previous night (e.g., sleep time was less than 6 h), and we asked them to participate in the simulation test from 1:00 to 2:30 PM the next day. Then the probabilities for SQ and CR were , , , and . In our experiment, EE was relatively good, and the probabilities for EE were and .
A partial test image was shown in Fig. 5(a). We adopted a different parallax pairwise comparison in a stereoscopy for a fair evaluation. Figure 5(b) drew the MOS result from the total results with various averages of the converged objective disparity. We obtained a relatively accurate visual fatigue from the validated MOS evaluation in Ref. 7. MOS is a common evaluation method for stereoscopy visual fatigue. Therefore, we decided to fit a curve from these results as a contrast database in our simulation. From Fig. 5(b) we can observe that the disparity of the comfortable zone is between and disparity 70.
In Fig. 7(a), the measurement results are calculated with various converged objective disparities, based on the SQ, CR, and EE probabilities , , , , , and . In Fig. 7(b), the results are based on the different SQ and CR probabilities , , , and . From Fig. 7(b) we can observe that when we include an SQ and CR factor under a worse state to infer the viewer’s fatigue, the estimation will bring a large deviation in measuring the stereoscopic fatigue. In order to intuitively understand the results, we can also obtain a validation from the mean absolute error (MAE). Here, and . Thus, the measurement of the visual fatigue in stereoscopy is influenced by other factors (nonstereoscopic factors). If we ignore the nonstereoscopic contextual features factor, the measurement performance for visual fatigue is unreliable in stereoscopy, which can be explained by the fact that the MAE in Fig. 7(b) is 0.2782, while the MAE in Fig. 7(a) is 0.0848.
We proposed a BN-based measurement model for stereoscopic visual fatigue estimation. Two important conclusions can be drawn from this study: (1) multiple features, including the stereoscopic contextual, nonstereoscopic contextual, contact physiological, and CLPFs were used to infer the viewer’s fatigue, providing a wide coverage of the categories of features. Covering more nodes in the BN that imply fatigue recognition helps to infer the fatigue more reliably and accurately. Especially, most previous studies have ignored the influence from condition variables such as CR, SQ, and EE. (2) Furthermore, the contactless physiological and CPFs are two important observation features for fatigue recognition. The test validation indicates that based on EM and EEG model the visual fatigue in stereoscopic can be accurately measured. It would be of significant interest to extend the current measurement model to handle more practical situations from various 3-D devices. We also have an interest in how to improve the subjective factors in determining the probability.
This work is supported by Ministry of Trade, Industry and Energy (MOTIE) Foundation of the World-Class300 Project: Development of automated manufacturing robot system technology integrating with the 6 Degree of Freedom (DOF) robot mechanism and the S/W platform for assembling mobile Information Technology (IT) products. (10043213).
Zhongyun Yuan received his BS degree and MS degree in the Department of Electronic and Electrical Engineering from North University of China, in 2005 and 2008. He is currently pursuing a PhD degree in the Department of Electrical and Computer Engineering, Sungkyunkwan University (SKKU), Suwon, Korea. His interests include measurement, 3-D vision, data compression, data acquisition, and compressive sampling.
Jong Hak Kim received a BS degree in radio communication engineering from the Kyunghee University, Suwon, Korea, in 2009, the MS degree from the Department of Electrical and Computer Engineering, Sungkyunkwan University, in 2012, and he is studying for a PhD degree at Sungkyunkwan University. He is interested in efficient low-power and real-time processing systems for mobile equipment and currently studies image processing algorithms and hardware implementation for stereo-systems.
Jun Dong Cho received a BS degree in electronic engineering, Sungkyunkwan University in Seoul, Korea, 1980, an MS degree from Polytechnic University, Brooklyn, NY, 1989, and a PhD degree from Northwestern University, Evanston, IL, 1993, both in computer science. He was a senior CAD engineer at Samsung Electronics, Co., Ltd. He is now professor of Department of Electronic Engineering, Sungkyunkwan University, Korea. He was a visiting scientist of IBM T.J. Watson Research Center, from 2000 to 2001. He has been an IEEE senior member since April 1996. His research interests are in the area of VLSI/SoC CAD and lower power design of multimedia and communication.