Recent advances in psychophysical studies have revealed that the visual-attention system often fails to detect large and obvious changes in a visual scene if the changes are masked with a uniform gray blank screen for a short duration (i.e., a one-shot or flicker paradigm),1 rendered slow (slow change),2 or shown together with attentional distracters such as “mud splashes.”3 Blindness to changes occurs not only in a laboratory setting but also in real-world situations.4, 5 Contrary to our common belief that our visual system maintains a meticulously detailed duplicate of a visual scene, changes in a visual scene are not readily detectable unless focused spatial attention is allocated to a place where changes occur.6 (Regarding the discussion on how much visual memory is maintained, see Ref. 7) This inability to detect visual changes is referred to as change blindness and is a topic of psychophysical studies. Understanding the neural and psychological processes underlying change detection and blindness is of practical interest for preventing accidents caused by an operator's oversight or inattention in areas such as industrial interface designs,8 operator training,9 and driving safety.10
Neural correlates of change detection and change blindness have been investigated with functional magnetic-resonance imaging11, 12, 13, 14 (fMRI) and electroencephalography (EEG),15, 16, 17, 18, 19, 20 as well as with single-unit recording of humans and monkeys.21, 22 These functional-neuroimaging studies have reported neural loci and processes related to change detection and change blindness. The fMRI studies identified stronger activations in the dorsal and ventral visual areas12 and in the parieto-frontal attentional network when a change was detected.11, 13, 14 On the other hand, the EEG studies revealed temporal dynamics of neural correlates of sub–second order and found increased P1 and P3 components, which reflected subjective awareness of changes in a visual scene15, 16, 17 and that predicted the onset of visual awareness.15, 20
A recent trend in functional neuroimaging, especially using EEG and fMRI, is to extract, or decode, the information that an activated neural pattern encodes, namely, the field now known as brain decoding.23, 24, 25 In contrast to traditional studies on single-voxel-based activation,26, 27 brain decoding makes use of distributed, multivoxel activation patterns to infer primary visual28, 29, 30, 31 and sensorimotor32 representations often on a trial-by-trial basis. This new approach extends the applicability of neuroimaging to extracting latent information buried in subtle patterns in neuroimaging data and offers a new possibility of brain–machine interfaces (BMIs) for normal and physically impaired users. This decoding approach is not limited to primary sensory and motor areas but is also being applied successfully to higher cognitive states such as intention, decision-making, and emotion.33, 34, 35, 36 It can therefore be expected that change detection and change blindness can be decoded from neuroimaging data.
This study attempted to classify change detection and change blindness by using near-infrared spectroscopy (NIRS) signals. NIRS has a centimeter-order spatial resolution for identifying neural loci and sub–second-order temporal sampling for identifying hemodynamic changes,37 and has been successfully applied to sensorimotor functions,38, 39, 40, 41 visual functions,42, 43 auditory functions,44 and higher cognitive functions, such as working memory,45, 46, 47 cognitive inhibition,48, 49 and language processing.50, 51 Several recent studies have extended the application of NIRS to brain decoding of hand-movement motor imagery,52 moment-by-moment magnitudes of pinch-force production,53 subjective preference of beverages,54 emotional responses to facial expressions,55 and applications to brain–computer interfaces.56, 57, 58 NIRS has advantages, such as relatively low cost and negligible physical constraints, compared to other imaging modalities, such as fMRI, making it an ideal candidate for applications of brain–machine interfaces in a real-life environment.57
In the current study, while subjects were performing a change-detection task, their cerebral activities were continuously monitored with NIRS. We tested the hypothesis that successful and unsuccessful trials in detecting a change can be classified on a trial-by-trial basis by applying a machine learning algorithm to the NIRS signals.
Seven subjects that were recruited in our laboratory (five male and two female with normal or corrected-to-normal vision and no reported history of neurological problems; age 27 to 37) volunteered for the behavioral experiment, and five of them (three male and two female; age 33 to 37) participated in the NIRS experiment. Two of them were the authors. The other subjects were familiar in general with psychophysical and NIRS experiments but not informed of the purpose of these particular experiments. All the subjects provided written informed consent. All measurements were conducted with the approval of the Ethics Committee of Hitachi, Ltd.
A task designed by Beck 11 was used for both the behavioral and NIRS experiments. A single trial of this task consisted of two subtasks—a character task and a face task—and the subjects were instructed to respond to both of the subtasks. The time line of stimulus presentation and subject's responses is shown in Fig. 1. First, a fixation cross appeared at the center of a display on which subjects were instructed to fixate during the trial. A screen with two faces, one on the right and another on the left (each 2.0 deg from the fixation cross), was then shown for 500 ms. At the same time, two alphabet strings composed of three characters were shown (2.4 deg above and below the fixation cross). The face images and character strings subtended 3.2 × 3.7 and 1.8 × 1.0 deg, respectively. Four facial images of middle-aged men wearing glasses, adopted from a face-image database,59 were used (see upper-right inset in Fig. 1). Then, a uniform gray blank screen was interleaved for 500 ms. The screen with faces and character strings and the gray screen were repeated four times. The subjects were seated ∼30 cm in front of the computer display with their heads stabilized on a chin rest and were instructed to keep fixating at the fixation cross. While the screen was flickering, the subjects were asked to report if a target character (in this case X) was contained either in the top or bottom of the screen by using the key pad of an external keyboard (pressing “7” for an X present in top, “0” for an X present at bottom, or no key press indicating the absence of an X). This procedure was referred to as the character task. The target character appeared in one-third of the runs either at the top or bottom at one time.
At the end of each trial run, a question mark appearing at the center of the screen for 1000 ms prompted subjects to report if there was a change in the face stimuli during the character task (pressing “0” for no change or “8” for change). This subtask was referred to as the face task. The subject had three chances to notice a change of the face stimuli. A single-trial of the change-detection task run took 5.0 s. Because the attentional resources of the visual system were divided into the two tasks described above, occasional failure to detect face changes was expected. In-house Matlab codes on a Windows-based laptop computer were used to present the face and character-string stimuli and to record subjects’ responses for later off-line analysis.
The face stimuli were changed in two-thirds of all trials. These trials are hereafter referred to as the face-change trials in which either the left stimulus or the right stimulus was changed at equal frequency [referred to as left-change (LC) or right-change (RC) trials, respectively]. In the remaining one-third of the entire trials, the face stimuli were not changed [no change (NC) trials]. The NC trials were included to avoid the risk that subjects might falsely report a change without paying attention to the face stimuli. The sequence LC, RC, and NC trials was randomized session by session so that the subjects could not memorize it from previous sessions with a constraint of equal frequencies. In some change trials, the subjects correctly reported a change in the face stimuli (defined as successful trials), but in other change trials the subjects failed to report a change irrespective of the physical change (defined as unsuccessful trials). Subjects were instructed to report a change in the face task only when they were confident that a change occurred.
A behavioral experiment was conducted prior to the NIRS experiment described below. The purpose of this experiment was twofold, namely, to familiarize the subjects with this rather difficult change-detection task and to evaluate the task difficulty so as to invoke change blindness to some degree. One session consisted of 25 NC trials, 25 LC trials, and 25 RC trials (totally, 75 trials) and took ∼8 min. The intertrial interval was 1.0 s. Seven subjects attended three to seven sessions each (4.0 sessions on average). The subjects’ responses during the character task and the face task were recorded for later analysis. During this behavioral experiment, the experimenter closely watched the eye movements of subjects and instructed to keep fixating at the fixation cross whenever they broke their fixations.
Near-infrared spectroscopy experiment
The same change-detection task as described above was used for the NIRS experiment, with a few modifications. First, the intertrial interval was taken randomly from 20 to 25 s so that the hemodynamic responses went back to the baseline level by the beginning of each trial.60 Second, to minimize the physical burden and to conduct each session in a reasonable time, the total number of trials was limited to 30 (10 NC, 10 LC, and 10 RC trials). The 30 trials took ∼14 min, and the whole experimental duration, including NIRS preparation, took ∼30 min. No subjects reported any discomfort either during or after the experimental session.
An ETG-7000 (Hitachi Medical Co., Tokyo, Japan) was used for the NIRS measurements and was controlled by the same Windows-based computer used for the visual-stimulus presentation and response recording. Sixty probes (32 light-emitting optical fibers and 28 light-detecting optical fibers) were stabilized on the scalp by means of four probe-holder sheets, each of which had 15 probes. Each laser emitter and detector formed a pair that provided a recording channel, resulting in 88 channels in total (Fig. 2). Note that channels refer to cortical locations, located approximately at the midline of corresponding laser emitters and detectors that are located on the scalp.61 The probe-holder sheets were connected to each other with elastic bands. The landmarks of the nasion, inion, and the left and right tragus in the ear were identified for each subject, and the reference site C z was determined from these four landmarks according to the international 10–20 method.62 The probe-holder sheets were arranged so that their center of mass was located at C z. To map out the corresponding cortical locations, the actual three-dimensional positions of all probes were measured with a three-dimensional digitizer (ISOTRAK II, Polhemus Corporation, Colchester, Vermont) for two subjects. The three-dimensional locations of channels were derived as midpoints of the corresponding light-emitter/-detector pairs. These locations were first translated into MNI coordinates by using statistical spatial registration63 and then into corresponding cortical areas by applying automatic anatomical labeling.64
The subjects’ behavioral data and NIRS signal data were analyzed with Matlab (The MathWorks, Natick, Massachusetts). The NIRS signals were preprocessed with an analysis software (Platform for Optical Topography Analysis Tools, POTATo) running on Matlab that has been developed by T.K.65 The POTATo software provides several convenient functions for preprocessing NIRS data. The overall flow for preprocessing and classifying NIRS data is summarized in Fig. 3.
Behavioral data analysis
Correct responses and reaction times in the behavioral experiment, for both the character task and the face task, were counted. In a single trial of the character task, the subjects had to report the presence or absence of the target character X and its location in four consecutive screens. Responses were defined as successful when subjects correctly reported the location of the X when it was present and when subjects did not press any key when no X was present. Any other responses were regarded as unsuccessful. The correct-response ratio was computed by dividing the number of successful trials by the number of screens presented to subjects. The distribution of reaction times of successful responses when the subjects correctly reported the presence of an X was also computed.
For the face task, trials were defined as successful if subjects reported a change when there was a change in the face stimuli or if subjects reported no change when there was no change in the face stimuli. The correct-response ratio was defined as the ratio of successful trials to the number of trials. The correct-response ratio and the response times for the NC, LC, and RC trials were computed.
Preprocessing of near-infrared spectroscopy data
All analysis procedures were performed offline. Oxy-hemoglobin-concentration signals were used for the analysis because they have been shown to have a high signal-to-noise ratio.66 The NIRS signals were preprocessed as follows. First, to remove certain noises of extra-cortex origins, such as pulsations, a temporal moving average over 3.0-s duration was applied.67, 68 A third-order polynomial a 0 + a 1 t + a 2 t 2 + a 3 t 3 was fitted to NIRS signals of an entire experimental session by the least-squares method, and this polynomial component was subtracted from the NIRS signals in order to remove global trends. This polynomial detrending was applied separately to all channels. This procedure ensured that the detrended NIRS signals had no drift components.
The subsequent classification analysis used NIRS signals from combinations of channels and required a considerable computational resource. Out of the entire 88 channels, those noisy channels were therefore excluded if their amplitudes exceeded 0.5. Channels whose power spectrum appeared to be white were also excluded according to the following criterion. Low-frequency components (<3.0 Hz) contain biologically relevant signals including hemodynamic changes, whereas high-frequency components (>3.0) can be regarded as biologically irrelevant noises. If the spectrum power of the high-frequency band is comparable to that of the low-frequency band in a NIRS channel, then biologically relevant signals in the low-frequency band are not to be detected due to high-frequency noise. In other words, the ratio of low- and high-frequency bands can be used as a measure of “whiteness.” A t-test was applied to determine whether the spectrum power of low- (0.01–0.5 Hz) and high-frequency bands (4.0–4.5 Hz) in each channel was statistically distinguishable (an indication that the channel's spectrum differed from a white spectrum). Channels were used for further analysis if the p-value computed in the t-test was smaller than an empirically determined threshold of 1 × 10−10, and were excluded otherwise. The threshold p-value was empirically determined by visual inspection of NIRS data in this study. A more objective criterion for determining a threshold value is being developed.
Classification analysis by support-vector machine
There are a variety of machine-learning algorithms applied to classification of neuroimaging data. Some examples are minimum-distance classification,69 Fisher's linear-discriminant analysis (LDA),70 support-vector-machine (SVM) algorithms,71 and hidden Markov models.52 In this study, a support-vector-machine algorithm with a linear kernel72 was adopted for the binary classification problem.
To exploit the temporal resolution of NIRS, classification probabilities of successful and unsuccessful trials were computed in time steps of 0.1 s (Fig. 4). First, a combination of channels was chosen, NIRS signals were extracted from all face-change trials, and each subject's trial was labeled as being a change-detected (successful) or change-undetected (unsuccessful) trial [green and red lines, respectively, in Fig. 4a]. The signals were then averaged in a temporal window [blue-shaded areas in Fig. 4a] to give points in a multidimensional space [Fig. 4b]. The width of the temporal window was fixed at 3.0 s, and the onset of the temporal average was varied from −5.0 to 13.0 s in steps of 0.1 s (note that 0 s was defined as the onset of the task). The SVM classification algorithm was then applied to the points of the multidimensional space [Fig. 4b]. To evaluate how well the data points could be classified, twofold cross validation was used [Figs. 4c and 4d]. Approximately half of the data points were randomly selected as training data [green and red points in Fig. 4c], and the rest were preserved as test data [gray points in Fig. 4c]. An SVM decision boundary was computed using only the training data [blue dashed line in Fig. 4c]. The performance of the decision boundary was evaluated by applying it to the test data [green and red points in Fig. 4d] and by computing the probability that the test data were correctly classified>. (defined as classification probability). This cross-validation procedure [Figs. 4c and 4d] was repeated 30 times with randomly chosen test and trial data in order to compute the mean classification probability and its confidence intervals. This SVM classification was performed on a subject-by-subject basis; a decision boundary was determined from a subject's data and its performance was evaluated with the same subject's data.
Number of channels used for classification analysis
Before applying the classification procedure described earlier, it was necessary to determine how many dimensions or channels should be used. In general, there should be optimal dimensions for a classification problem. Small feature dimensions may not provide sufficient information for classifying the change-detected and change-undetected trials. On the other hand, large feature dimensions incur the risk of overfitting to training data. Too small or too large feature dimensions both result in poor performance of classification of the test data. Moreover, the number of possible combinations of features grows rapidly with an increasing number of feature dimensions; therefore, it is desirable to keep the feature dimension as small as possible to save the computational time needed for analysis. For the classification problem stated in Sec. 2.2.3, the performances of classification using single channels, pairs of channels, or triplets of channels were compared in terms of the test data of Subject 1. After the number of channels was determined, the same number of channels was used for the data of other subjects.
Clustering analysis of temporal classification probabilities
To discover typical temporal profiles of classification probabilities, an unsupervised classification algorithm was applied to the classification probabilities computed by using the SVM method (as described in Sec. 2.2.4). The k-means clustering algorithm with a squared Euclidean distance was adopted as a similarity measure.73 The number of clusters was adjusted by visual inspection.
Most subjects reported that, at first, the task was difficult because of its tight requirement concerning keyboard response time and the unfamiliar method of reporting their responses. The subjects reported that these issues disappeared after a few sessions of training. Seven subjects attended 28 sessions in total (i.e., 75 × 28 = 2100 trials) of the behavioral experiment (average 4.0 sessions per subject). For the character task, the correct response ratio was 95.4% for all seven subjects, indicating that they performed the tasks almost perfectly. The reaction time when subjects reported an X was 577 ms (SD 127 ms).
For the face task, the subjects responded within the time reaction limit (1.0 s) in 97.05% of all the trials. For the following behavioral analysis, the trials in which the subjects responded within the time limit were used. Reaction times had a mean of 436 ms and a standard deviation of 165 ms. Figure 5a plots the distribution of reaction times for all trials of the seven subjects. Reaction times for the NC, LC, and RC trials were 445 ms (SD 170 ms), 431 ms (SD 167 ms), and 432 ms (SD 160 ms), respectively. There was no statistical difference between the reaction times for these three conditions [one-way analysis of variance (ANOVA); F(2,2035) = 1.32, p = 0.266].
The correct-response ratio for the face task was 54.5%, indicating that the task difficulty was properly adjusted. Interestingly, there was asymmetry between the success rates in the LC and RC trials; the subjects detected visual changes more correctly in the left visual hemifield [15.5 (SD 4.57) trials] than in the right [13.3 (SD 5.33) trials] [Fig. 5b]. The performance difference was statistically significant [unpaired double-sided t-test; t(54) = 2.82.; p = 0.0066)]. This asymmetry between the left and right visual fields is consistent with a clinical study that showed right-hemisphere (which corresponds to the left visual hemifield) superiority in a face match-to-sample task in split-brain patients74 and with imaging studies that showed the right-hemisphere dominance of face-processing activities.75, 76 The false-positive rate that subjects reported a change when there was actually no change was 6.4%.
Analysis of Near-Infrared Spectroscopy Signals
The correct-response ratios of five subjects are summarized in Table 1. Four of them had the correct response ratios in the range of 43.8–66.7%, and their NIRS signals were classified according to the procedure stated in Sec. 2. Subject 5, however, had a high correct-response ratio (83.3%) with far fewer undetected trials than detected trials. This subject was excluded from further analysis. For the other four subjects, 70 channels on average (out of 88 channels) were used (Table 1) after excluding channels that were considered not to reflect cortical activities under the criteria described in 2.2.2.
Summary of behavioral and NIRS data used for the classification analysis.
|Subject||No. 1||No. 2||No. 3||No. 4||No. 5|
|Correct response (%)||43.8||43.8||62.5||66.7||83.3|
|No. NIRS channels||60/88||56/88||88/88||76/88||N/A|
|Maximum classification probability (%)||86.4±2.4||89.4±3.2||77.3±1.5||84.8±3.0||N/A|
|Most informative channel(s)||33 (left angular gyrus)||35 (left occipital area)||20 (right occipital area)||32, 36 (left posterior parietal association cortex)||N/A|
Optimal number of channels necessary for classification algorithm
To determine the optimal number of channels, the classification probabilities were evaluated by using single channels (60), channel pairs (60 C 2 = 1,770), and channel triplets (60 C 3 = 34,220) and the response data of Subject 1. The computation for these three cases took, respectively, 15 min, 7.2 h, and five days on a Windows-based computer (Intel Core2Duo, 3.0 GHz). Figure 6 summarizes 50 classification probabilities that varied significantly in their time courses for the three cases. Average values of maximum classification probabilities of the 50 combinations were 77% (SD 6.7%), 86% (SD 3.1%), and 87% (SD 2.1%). The classification probabilities in the channel-pair case were considerably improved compared to those in the single-channel case. In contrast, when channel triplets were used instead of the channel pairs, the classification probabilities changed little, indicating that the channel pairs were sufficient for classifying the successful from unsuccessful trials. Moreover, the exhaustive search using channel triplets was prohibitively time consuming for analyzing the data of multiple subjects. Accordingly, channel pairs were used for further analyses of the other subjects.
Distribution of classification probabilities
Moment-by-moment classification probabilities were computed using all possible channel pairs for the four subjects. The temporal window was first fixed, and the population distributions of classification probabilities for possible channel pairs were investigated. The red histograms in Fig. 7 illustrate distributions of classification probabilities computed using all channel pairs for Subject 1 during the task period (0–3 s) [Fig. 7a] and 5 s after the task completion (10–13 s) [Fig. 7b]. The distribution in Fig. 7a had a mean of 51.6% and a standard deviation of 7.3%. In contrast, the distribution in Fig. 7b had a mean of 61.3% and a standard deviation of 10.1%. Figures 7c and 7d illustrate NIRS signals that show the maximum and 50% classification probabilities, respectively, in the distribution of Fig. 7b.
Note that the distributions computed using randomly relabeled, surrogate data [blue histograms in Figs. 7a and 7b] were highly peaked and did not change during and after the task. The differences between the histograms computed using the subjects’ response and surrogate data were statistically significant [Mann–Whitney U-test; p = 8.9 × 10−4 for Fig. 7a, and p = 6.2 × 10−128 for Fig. 7b]. This analysis confirmed that the high values of posttrial classification could not be attributed to a statistical chance.
Temporal profiles of classification probabilities
It was found that there were, generally, two types of temporal profiles, which we call postdictive and predictive. The postdictive type of temporal profile exhibited a plateau before and during the task period, a gradual increase on task completion, and a peak value ∼5 s after task completion. Figure 8a illustrates a representative probability profile computed from a channel pair (25, 33) of Subject 1 [for corresponding images of the NIRS signal, see Appendix (Fig. 11 and 1). By classifying the signals of this channel pair, it was possible to determine if this subject noticed a face change in the change trial that had just been finished. We interpreted that this temporal profile of classification probability reflected the success or failure of a trial.
The predictive type of temporal profile was found in the case of Subject 2 [Fig. 8b]. The classification probability increased 3.0 s before task initiation, took a maximum value around the time of the task initiation, and decreased to approximately the correct response ratio [for corresponding snapshots of NIRS signal, see Appendix (Fig. 12 and 1)]. By classifying the signals of this channel pair, it was possible to predict whether the subject noticed a face chance immediately before the trial was completed. The signals from this channel pair were interpreted as predictive for the success or failure of a trial10.1117/1.3606494.1
Analysis of four subjects
We performed the same classification procedure exhaustively on all possible channel pairs (1770, 1596, 3828, and 2580 pairs from Subjects 1–4, respectively; see Table 1). Because the temporal dynamics of classification probabilities was the focus of interest, the 50 channel pairs that showed maximal amplitudes of classification probabilities (i.e., maximum probability – minimum probability in each temporal profile) were chosen. Figure 9 summarizes the results for Subjects 1 and 3. Classification probabilities in Fig. 9a (Subject 1) had postdictive temporal profiles that were similar to the one shown in Fig. 8a. The corresponding channel pairs from which the probabilities in Fig. 9a were computed are shown in Fig. 9b. We looked for the most informative channels, or hubs, which are defined as NIRS channels that most frequently appeared in the 50 channel pairs. Channel 33 (indicated by a blue circle) located in the left angular gyrus appeared in 32 channel pairs out of the total 50 channel pairs, contributing most to the postdicitive classification. The results for Subject 3 are summarized in Figs. 9c and 9d, exhibiting characteristics similar to those of Subject 1. The most informative channel was channel 35 (indicated by a blue circle), which was located in the left occipital area and appeared in 24 channel pairs. For both subjects, the pairs that contained channels in the left temporoparietal or occipital areas gave the higher values of classification probabilities, and the most informative channels tended to be paired with channels in the frontal lobe [Figs. 9b and 9d].
Figure 10a shows predictive temporal profiles of classification probabilities computed from 50 channel pairs of Subject 2. Most temporal profiles exhibited maximum values before the task completion, thereby predicting whether that subject would report the presence or absence of a face change. Pairs that contained channels in the right temporoparietal areas contributed to these high-classification probabilities [Fig. 10b]. Channel 20 (located at the right occipito-temporal junction), appearing in 34 pairs, was most informative for this predictive classification. Subjects 1–3 had either pre- or postdicitive temporal profiles only. Interestingly, Subject 4 had both pre- and postdicitive components [Fig. 10c]. Channel pairs that had posterior-parietal-lobe channels had contributed to both pre- and postdicitive classification [light blue and purple curves for postdictive and predictive classification, Fig. 10d]. Channel 32 (located in the posterior parietal area) appeared in 34 pairs and was most informative for postdictive classification (a blue circle), and channel 36 (located in the postcentral area) appeared in 16 pairs and was most informative for predictive classification (red circle).
The above analysis found either predictive or postdictive components of classification probabilities that exhibited high temporal amplitudes. The clustering analysis was performed on classification probabilities computed from all possible channel pairs, and it was found that three subjects (1–4) had both predictive and postdictive components [see Appendix (Fig. 13)].
This study demonstrated that moment-by-moment classification probabilities could be computed from NIRS signals measured in a change-detection task. The NIRS signals provided the temporal dynamics and the most informative cortical locations simultaneously for classifying whether subjects noticed a change in a visual scene.
Postdictive and Predictive Classification Probabilities
The classification probabilities had two distinct types: postdictive and predicate. The postdictive type [Figs. 8a and 9] remained at ∼50% before and during the task, began to increase at the task completion, and took the maximum value approximately 5 s after the task completion. These postdictive classification probabilities arose mainly from a combination of the most informative channels in the parietal, temporal, or occipital lobe and another channel in the frontal lobe. The predictive classification probability [Figs. 8b and 10] predicted the performance in subsequent trials immediately before and during the task onset. A combination of the frontal and temporoparietal cortices contributed most to this type of classification probability.
Although we initially did not expect to find a predictive component in the change-detection experiment, such a component was not surprising in retrospect because recent fMRI studies reported that bold signals immediately before task initiation predicted task performances such as magnitude of force production32 and sensitivity to somatosensory stimuli.77 Another fMRI study reported brain activation that began to evolve gradually as early as thirty seconds before subjects made errors in a flanker task.78 Moreover, hemodynamic signals recorded simultaneously with electrophysiological recording revealed an anticipatory component predicting upcoming sensory stimuli.79 The NIRS signals measured from the temporoparietal and frontal cortices contributed to the predictive type of classification probability in general agreement with those fMRI studies.
There are, in general, two types of visual attention;80, 81 bottom-up, sensory-driven attention which enhances a stimulus whose features differ from those of other surrounding stimuli, and top-down, behavior-driven attention, which enhances a stimulus of behavioral relevance. It is tempting to speculate that our finding of pre- and postdicitive types may correspond to the two types of visual attention. There are equally plausible speculations for the predictive component. If the predictive component resulted from allocation of attentional resources to face task, then it could be related to top-down attention. Or, if the predictive component reflected enhancement to sensory stimulus processing, then it could be attributed to bottom-up attention. Also, the postdictive component could be due to either top-down or bottom-up attention. The postdictive component might have reflected a process that top-down attention was covertly attracted to coincidently found face changes. Or, bottom-up attention may have caused the postdictive component due to a novel event, such as face changes. Designing an experiment to dissociate top-down and bottom-up components in relation to our findings of pre- and postdictive components will be of neuroscientific interest.
Statistical Validation of Decoding Results
A recent NIRS study reported that subjective preference of beverages could be decoded on a trial-by-trial basis with a probability of ∼80% by using analysis and classification methods similar to ours.54 A subsequent commentary pointed that there is a risk of high-classification probability out of random data with no information about actual preference, and it questioned if the high-classification probability of the study might be caused by a statistical coincidence in choosing “the best feature” out of a large number of features.82 It is emphasized that this methodological criticism is not applicable to our study. We demonstrated that the distribution of classification probabilities was shifted from the task period to the posttask period (Fig. 7). If no information concerning change-detected and change-undetected trials were encoded, such a shift in the distribution would not occur. Using surrogate data (i.e., randomly relabeled successful and unsuccessful trials), the distributions of classification probabilities were computed and found to differ significantly from the distributions created from subjects’ responses. We thus concluded that the structured temporal profiles of classification probabilities in Figs. 8, 9, 10 reflected the subjects’ performance.
Possible Applications toward Brain-Machine Interfaces
According to a recent study,83 NIRS can be an appropriate substitute for fMRI across multiple cognitive tasks, although care should be taken for its lower spatial resolution and weaker signal-to-noise ratio. NIRS has a few important advantages (such as fewer physical constraints and relatively lower cost) with regard to BMIs.37 This opens up an alternative possibility of monitoring an operator's latent cognitive states from NIRS measurements in a real-world setting. Our result that classified the success and failure in a change-detection task suggests, for example, an interface based on NIRS signals to monitor an operator's performance and attentive states. In addition, our analysis of how many channels were necessary for decoding visual awareness to changes revealed that a small number of channels were sufficient if their locations were deliberately chosen. It is possible to make a compact and portable device84, 85 for NIRS-based decoding.
Limitations of Current Study
Despite the success in classifying the successful and unsuccessful trials, the current study has a few concerns. First, we could not monitor the subjects’ eye movements due to a lack of gaze tracking instrument. It might be possible to increase the classification performance by removing trials in which overt eye movements occur. Second, although most subjects exhibited both pre- and postdictive types, their proportions differed considerably; some had strong postdictive and weak predictive components, whereas others showed weak postdictive and strong predictive components. It is unclear, at this moment, what caused the difference. Third, although we showed that two channels sufficed for successful classification, the locations of the most informative channels varied from subject to subject. It will be desirable to optimize the number of required channels on a subject-by-subject basis. Also, we used only instantaneous oxy-hemoglobin signals; however, a recent study suggested that a decoding performance can be improved by considering a history or gradients of NIRS signals.86 A thorough search for an optimal set of variables should be performed to achieve the better performance and robustness of decoding. Lastly, the current study employed a small number of subjects, and two of them were the authors themselves who were aware of the aim of this study. We tried to minimize the confounding risks of using the authors as subjects by randomizing trial sequences session by session so that they could not expect what trial type would come next. However, the authors knew the proportion of the three trial types; thus, it cannot be excluded that they might have implicitly made a statistical guess of trial types.
To overcome these limitations, it would be worth extending the current study by recruiting a larger number of subjects. By inspecting a large data set, we expect to see what variables play a dominant role in boosting the decoding performance and what determines the relative strengths of post- and predictive components. Also, one interesting direction is to decode one subject's state by using a classifier trained by other's NIRS data set. This will save a training session, which is of practical convenience for NIRS-based brain machine interfaces. This approach has not been examined with a few exceptions.87 These lines of studies will be pursued in our future study.
Appendix A: Temporal Snapshots of Near-Infrared Spectroscopy Signals
Figures 11 and 12 demonstrate typical examples of NIRS signals that were used to compute the classification probabilities in Fig. 8. These snapshots were computed from –5.0 to +8.0 s in steps of 1.0 s. The green and red circles in each snapshot denote NIRS signals from successful (change detected) and unsuccessful (change undetected) trials (see 1).
Appendix B: Coexisting Predictive and Postdictive Components
In Figs. 9 and 10, only 50 temporal profiles of classification probabilities that showed maximal amplitudes were analyzed, and in the cases of Subjects 1–3, either a pre- or postdictive component was found. Both components were found only in the case of Subject 4. Here we show that, when all possible profiles were analyzed, Subjects 1, 2, and 4 exhibited both pre- and postdictive components.
The k-means clustering algorithm was applied to all possible temporal profiles of NIRS signals. The number of clusters was set to 3 (k = 3). In Fig. 13a, three representative profiles of the clusters are depicted using three colors (light blue, magenta, and black in descending order of temporal amplitudes). In the case of Subjects 1 and 3, the largest-amplitude components (light blue) had peak values after task completion. The second largest-amplitude components (magenta) had two peaks before and after the task completion, indicating that these contained both pre- and postdictive components. Interestingly, the cortical locations for light blue and magenta components considerably overlapped. The components with the smallest amplitude (black) are almost flat, indicating these are irrelevant to subjective visual experience. The same trend was observed in the case of Subject 4, who already showed predictive and postdictive components in Fig. 10. Three temporal profiles computed from Subject 2's data have peaks only in the task period, indicating that only predictive components were found in the case of Subject 2. Figure 13c depicts the ratios of three components; for Subjects 1, 3, and 4, the postdictive components are most dominant.
We thank Kyoko Yamazaki for assisting with the experiments, Hiroki Sato for his advice on NIRS experiments and analyses, Hiroshi Imamizu for commenting on a previous version of the manuscript, and Hideaki Koizumi for his continuous encouragement.