Functional near-infrared spectroscopy (fNIRS) is a convenient, noninvasive tool for assessing cortical hemodynamics in various clinical and neuropsychological situations.12.3.–4 The first noninvasive monitoring of brain hemodynamics during hyperventilation1 was realized by Jobsis.5 One and a half decades later, fNIRS developed into a genuinely “functional” measurement tool allowing the assessment of human cognitive and visual functions.67.8.–9 As the term “spectroscopy” implies, fNIRS started as a single-channel measurement. The number of channels increased to a few that were distantly placed on the scalp so as to avoid light interference between neighboring channels.10 Later, the problem of light interference was solved by adopting a frequency division multiple access technique, and thus multichannel fNIRS using an array of optodes was invented to simultaneously monitor multiple brain regions.11 Multichannel fNIRS basically generates discrete channel data. This also enables interpolation from neighboring channels to fill the interchannel space, called optical topography, to produce two-dimensional continuous images of cortical activation.11 Moreover, in diffuse optical tomography (DOT), a combination of short- and long-distance channels utilizes depth resolution to reconstruct a continuous image, providing more accurate source estimations in three-dimensional space.2,1213.–14
Since the invention of multichannel fNIRS, the number of channels has increased to a few dozen in typical studies or to over 100 for whole-head measurements.15 With increasing multiplicity in fNIRS channels, we have to be aware of the increasing importance of controlling Type I errors (or false positives). In the fNIRS scheme, this corresponds to the risk of falsely treating nonactivated channels as activated channels. In a single-channel measurement, the issue is whether or not the channel is activated: there is only one hypothesis. In a multiple-channel measurement, the number of hypotheses increases up to the number of channels. Such multiple comparisons entail increased risk of Type I errors, which should be controlled as family-wise errors (FWEs). Unlike the experimental-wise error rate control often used in psychological studies, where a large number of possible comparisons are conducted, FWE rate control deals with a limited number of comparisons that are necessary to test hypotheses, which is to say the number of channels.16
The FWE in multichannel fNIRS studies is typically controlled using one of the two different approaches. When a continuous image is constructed, it is often treated using a random field theory (RFT), where, in a smoothed image, the number of falsely detected active blobs, or clusters of activated pixels or voxels, is controlled to a certain statistical threshold (e.g., 0.05 representing 5 blobs on average in 100 randomly generated images).17 Typically, the number of false active blobs is set to 0.05.
On the other hand, fNIRS data are often represented as channel-wise data. In this case, multiplicity is equal to the number of channels. The most stringent control for FWE in channel-wise data is the Bonferroni method, in which the statistical significance level () is divided by the number of channels ()
The purpose of Bonferroni correction is not to allow a single FWE, but this can be too stringent as it increases the chance for Type II errors or false negatives. One alternative is an extension of the Bonferroni method, called Holm’s correction, which offers less stringent multistep and step-down procedures.18 Another often used alternative is the false discovery rate (FDR) control method,19 which maintains a proportion of false positives relative to the total number of significant results. The FDR is often favored in functional magnetic resonance imaging (fMRI)20 and fNIRS21 studies because of its increased power in comparison with FWE control methods.
However, we must note here that the Bonferroni correction, Holm’s correction, and FDR controlling methods all start from the same multiplicity, which is equal to the number of channels, to control Type I errors for the most activated channel. Differences in stringency appear only in channels with less activation. Here, we encounter a dilemma: in order to identify as many activated channels as possible, regions of interest should be broader with a larger number of channels, but this can lead to a reduced number of statistically activated channels being identified due to inflated multiplicity. For example, in a typical multiple-channel fNIRS with 20 channels, an uncorrected -value of 0.0025 in the most activated channel is necessary to achieve a corrected -value of 0.05. Nevertheless, in a recent whole-head measurement using 120 channels, the uncorrected -value necessary for a corrected -value of 0.05 was 0.0004. Hence, fNIRS researchers employing a multiple-channel measurement often get stuck in what is known as the “Valley of Bonferroni.”
Thus far, only one approach, called the Dubey/Armitage–Parmar (D/AP) method,22 has been used to avoid applying correction with a large multiplicity equaling the number of channels in fNIRS studies.2324.–25 The D/AP method utilizes the spatial correlation between a given channel and other channels to correct the multiplicity of the given channel. One problem with this method is that the multiplicity differs among channels, so regional bias cannot be avoided. In addition, since the D/AP method is dependent on correlation, it is only applicable to a single contrast.
One plausible approach for solving these FWE correction issues in multichannel fNIRS would be the correction with the effective multiplicity () using eigenvalues of correlation matrices. The method was invented by Nyholt26 for correction for multiple testing in the field of genome studies. This method adjusts the original multiplicity by reducing the contribution of large correlations to obtain the . It was later modified to recount the after leveling out large eigenvalues.27 Since this method was dependent on logical criteria and separation of integer and fraction parts of the eigenvalues, Galwey28 proposed a more general equation to exclude such dependence.
When a set of correlated data, say activation data from fNIRS channels, are given, the method decomposes the correlation matrices of the data to yield eigenvectors that reflect the strength of the correlation between each data point. From these eigenvalues, the method estimates the effective multiplicity that corresponds to the number of independent tests. Accordingly, multiple hypotheses are corrected by instead of .
Inspired by the potential of the method, here we examined its utility in statistical analyses for multiple-channel fNIRS data. We prepared three sets of experimental data with focused, moderate, and broad activations. From these, we generated synthetic data with smaller multiplicity and compared the performance of the Bonferroni and methods. Based on these results, we will discuss whether the approach can be an effective alternative to Bonferroni-based methods.
Materials and Methods
In this study, we compared two methods for correcting FWE. One is a conventional Bonferroni method used to adjust the significance level () by dividing it by the number of tests as in Eq. (1) presented above. The other is the method utilizing eigenvalues of correlation matrices. To reflect the strength of correlation between each data point, the method adjusts the original multiplicity to yield the effective multiplicity, 2627.–28 In multichannel fNIRS group analyses with multiple subjects, we derive a correlation matrix from measured signals, such as channel-wise average values of oxy-Hb signals, from multiple subjects and calculate the eigenvalues. We then obtain the from the eigenvalues and apply it instead of the original to adjust the value.
Let us explain the theoretical framework of the method using a typical fNIRS data structure as an example. Note that the major methods described below were originally developed for genetic data, but we modified them to fit the fNIRS data scheme. When multiple-channel fNIRS data are obtained from channels for subjects, summary data for group analyses can be denoted as . Then, the eigenvalues () are derived from a correlation matrix () of the measured signals () as follows:
Using the eigenvalue variance , Nyholt26 calculated the as described below:
In order to present a more accurate estimate of , Li and Ji27 introduced a new calculation for as described below:
is an indicator function, which yields 1 when and gives 0 otherwise.  is the floor function, which gives the largest integer less than or equal to .
Galwey28 pointed out that the by Li and Ji gives more weight to the fractional part than to the integer part and proposed a generalized function as follows:
In this research, we implemented Galwey’s function to calculate because this method is optimized for multiple signals with strong correlations and can be treated in a continuous way.
Experimental data used in this study were obtained from three of our previous, separate studies2930.–31 and reanalyzed from different perspectives. Here, we briefly describe the experimental procedures.
Participants in the visual-based oddball task (OBT, task 1) comprised 22 right-handed, healthy children (15 boys, 7 girls; , , and range: 6 to 13 years). Those for the verbal category fluency task (CFT, task 2) comprised 22 right-handed, healthy volunteers (6 participants were excluded; 16 males, 6 females; , , and range: 22 to 57 years). Those for the naming task (NMT, task 3) comprised 26 right-handed, healthy volunteers (4 participants were excluded; 19 males, 7 females; , , and range: 22 to 57 years). Written informed consent was given by all participants and, in the case of the OBT experiment, their parents. The study was approved by the Jichi Medical University ethics committee.
In the OBT, participants were requested to push a button in response to stimuli. The task included detection of and response to infrequent (oddball) target events included in a series of repetitive events. During the session, subjects viewed a series of pictures once every second and responded with a key press to every picture. In the baseline block, subjects were presented one picture and asked to press a blue button for that picture. Following the baseline block, two kinds of pictures were presented sequentially and the participants were instructed to respond to the standard stimuli and target stimuli by pressing a blue and red button, respectively. Each session consisted of six block sets, each containing alternating baseline (25 s) and oddball (25 s) blocks.
In the CFT, participants were requested to overtly generate words for five categories. The task paradigm was a periodic block design with five alternating conditions of rest (30 s) and experimental task (20 s).
In the NMT, participants were requested to overtly name the content of a picture exhibited by an experimenter. The task paradigm was a periodic block design with five alternating conditions of rest (30 s) and experimental task (20 s). During the experimental task, each picture was presented for about 3 s.
Using resampling simulations from experimental data of fNIRS studies, we evaluated the relationship between values and the number of channels. For the resampling simulations, we selected, from published studies, three kinds of experimental data with different activation profiles: OBT data from children with lateralized focused activation (task 1),29 CFT data from adults with lateralized moderately focused activation (task 2),30 and NMT data from adults with bilateral broad activation (task 3).31 The -values of a general linear model (GLM) with regression to a hemodynamic response function or the average values of oxy-Hb signals were used as summary data. Because the measured signals of the children contained many movement artifacts, the averaged values of oxy-Hb, which were compatible with -values, were selected to analyze the children’s data (note that -values match the average when a GLM is performed with regression to a box-car function alone). Specifically, after the blocks with considerable artifacts were removed, for each channel of each subject the averaged differences in oxy-Hb values between each task and the preceding baseline periods were further averaged across the task blocks.29
To find the relationship between the number of channels and the , we randomly resampled a given number channels () from each dataset and calculated . For each , resampling was performed 1000 times, and the average value and standard deviation were calculated.
In order to find the relationship between the overall number of channels and the number of channels detected as active with the derived for a given dataset, truly active channels must be set. We performed one-sample -tests for the measured signals for each channel, and those channels with -values above a given -value threshold were defined as “truly active channels” for the corresponding task. The -value thresholds were set to 2.0, 2.5, 3.0, and 3.5. Using these obtained truly active channels as a starting point (e.g., 2 channels for task 1, 9 for task 2, and 12 for task 3 at a -value threshold of 3.0), we added additional channels, derived the , and calculated the number of active channels. For each channel number, resampling was performed 1000 times (or for a theoretical maximum when the resampling range was limited), and average values and standard deviations were calculated. For comparison, the numbers of active channels were also obtained for uncorrected data and for Bonferroni correction.
Selection of Active Channels as “Truly Active”
For each task, second-level group analyses using one-sample -tests against zero were performed. We defined these channels as “truly active channels” for the resampling simulation. The numbers of truly active channels are shown in Fig. 3. For example, when the -value threshold was 3.0, the number of truly active channels was 2 for task 1 (OBT), 9 for task 2 (CFT), and 12 for task 3 (NMT). These cases are presented in Fig. 1 for each task. The numbers and distributions of truly active channels varied for each task. We confirmed that task 1 had a right-lateralized focused activation pattern, task 2 had a left-lateralized moderate activation pattern, and task 3 had a bilateral diffuse activation pattern.
We plotted the average values and SD for each number of channels for each task (Fig. 2). The values, used for correction in the Bonferroni method, were also plotted for comparison. When the number of measurement channels was less than 10, there was little difference between and for all tasks. However, when the number of measurement channels exceeded 10, the value of became gradually controlled, with tapered response curves in all cases. Eventually at around , converged at a range of approximately 10 to 15.
Evaluation of Detected Active Channels
For each task, significantly active channels for which -values were less than 0.05 were calculated for different numbers of channels. The number of active channels was derived with no correction, with Bonferroni correction, and with the method. The number of active channels for each case is plotted in Fig. 3.
As shown in Fig. 3, without correction, the number of active channels increased linearly for each task. For a large number (approximately 40) of channels, the number of detected active channels did not inflate linearly but converged to a certain value, which equaled the number of detected active channels with the method applied to the whole number of channels. Irrespective of the total number of channels and truly active channels initially given, the numbers of detected active channels with methods remained larger than or equal to those with Bonferroni methods and smaller than or equal to those without correction.
The current study examined the relationship between the number of channels and detected active channels based on FWE correction with the approach. We conducted resampling simulations applied to actual multichannel fNIRS data for different cognitive tasks. Compared with the conventional Bonferroni method, the approach generally yielded sufficiently moderate FWE corrections and increased statistical power.
Since the Bonferroni method regards each channel as independent, its multiplicity linearly increased as the number of channels increased. In contrast, the approach limited the inflation of when there were, roughly, more than 10 channels. Accordingly, converged to a certain value, which was specific to task species. Considering these characteristics, can be considered an effective FWE correction method for modern fNIRS measurement with a large number of channels.
methods are expected to have a high affinity to fNIRS data. First, they do not require any assumptions about the spatial configuration of channels, in contrast to RFT, which assumes spatial smoothness and continuity in a good-lattice assumption. Sparsely and irregularly distributed fNIRS channels can be treated effectively using the method. Second, the method deals with the spatial correlation structure of data holistically, but is not adjusted by local variations of spatial correlation. Thus, is applicable to data from all channels without local bias. Third, the method is applicable to a wide variety of data structures including both single- and multiple-subject analyses and to multivariate analyses such as ANOVA.26,27
Interpretation of Results
increased with the number of channels, but not in a linear manner. The increase ratio went down and finally converged at a range of 10 to 15. Average and standard deviation at 20 channels were , , and for OBT, CFT, and NMT, respectively. Since variability (SD) due to different channel selections was negligible, we suggest that the estimations in resampling simulations could well represent actual channel number increases. Note that relatively small SD at around channel numbers 1 and 44 simply reflect the limited number of channel selections.
For each task, reached 80% of maximum values at 20 channels. This observation suggests that a substantial increase beyond 44 channels is not likely to occur and that converges to a certain level for each task.
As shown in Fig. 3, the lack of FWE correction resulted in an increased number of false positive detected-active channels. This suggests the necessity for Type I error corrections.
Although application of the Bonferroni method maintained a number of active channels similar to the number of truly active channels with a small number of channels, the number of detected active channels tended to drop below that of truly active channels as the total number of channels increased. Hence, we suggest that the current resampling simulations represent the over-correction of FWE, the so-called “Valley of Bonferroni,” often encountered in actual multichannel fNIRS measurements well.
On the other hand, the approach yielded a number of detected active channels similar to that of truly active channels at a -value threshold of 3.0 even when the overall number of channels increased. This clearly demonstrates that the approach can reduce Type II errors compared with the conventional Bonferroni method, while maintaining control over Type I errors compared with uncorrected states.
In other words, consideration of spatial correlation among channels and analysis of their eigenvalue can lead to estimations that are similar to actual multiplicity for FWE correction. Accordingly, the approach can provide a simple, data-driven FWE correction method that maintains statistical power for multichannel fNIRS and its increasing number of channels.
Consideration of Task Species
To explore the applicability of the approach, we selected three different experimental tasks from previously published studies with the same numbers of channels. This was in order to cover a wide range of activation patterns from a focused and lateralized activation pattern evoked by an OBT in healthy children,29 to a moderately focused and lateralized activation pattern evoked by a CFT in healthy adults,30 and to a broad and less lateralized activation pattern evoked by an NMT in healthy adults.31 Our first expectation was to detect changes in associated with degrees of focus. However, we did not find any tendency between focus size (the number of truly active channels) and . Thus, we tentatively suggest that is affected by task species, but not by focus size. However, further exploration is essential to draw discrete conclusions. First, the current samples were from different groups and included both adults and children. The true effect may have been buried by sample heterogeneity. In order to explore the factors determining , we have to examine the effects of focus size and task species using the same group of subjects. Despite this, however, did not vary much among different tasks and subject groups, with values ranging from 10 to 15 out of 44 channels. This leads us to expect that is robust to different tasks when similar experimental conditions are adopted. In this sense, the approach could be regarded as a practical option that can be readily adapted to many fNIRS studies.
For the current analyses, we selected Galwey’s method28 from among several different methods used in the realm of genetics.26,27 As represented in Eq. (5) root-mean-square values, Galwey’s method puts weight on large spatial correlations with eigenvalues over 1. Selection of an appropriate multiplier (currently 2) in Galwey’s equation would require further examination.
One inherent merit of the approach is its wide applicability. Since its core idea lies in determination of in a data-driven way, it can be combined with other methods such as the FDR method. The combination of and FDR has been adopted in genetic studies.27 Indeed, the current study itself may be rather interpreted as a comparison of two Bonferroni correction methods with or without data-driven . Thus, how the combination of FDR and affects the statistical power would be an interesting topic to explore.
Another aspect of the applicability of is its extension to various experimental designs beyond one-sample -tests. Thus far, two methods with less lenient statistical thresholds than Bonferroni correction have been adopted for fNIRS studies. The first is the D/AP alpha boundary2223.24.–25 for adjusting the spatial correlation among channels which confers effective values to each fNIRS channel.23 However, this method has two major limitations: One is that values differ among channels so that no unified criterion can be realized for multichannel measurements, and the other is that it is limited to a one-sample -test scheme. Similarly, Singh and Dan explored a resampling-based approach based on the Max-T method,17,21 but this was also limited to a one-sample -test scheme.
On the other hand, the approach can be adopted for a wide range of experimental designs categorized into a GLM including two-sample -tests, ANOVA, and regression analysis. The former two designs are important for group studies and the latter is essential for individual studies dealing with temporal as well as spatial correlations.
In addition, although the current study limited its focus to channel-wise analyses, the approach may be applicable to pixel- or voxel-based methods, namely continuous images created by interpolation, image reconstruction, and DOT. Since these continuous images are constructed from channel-wise data, comparison between the two approaches as to how they affect would be an intriguing topic to investigate.
In this sense, we would expect the current study to be regarded as a primer with the goal of inspiring a wide range of applications for in order to take fNIRS studies in various directions.
We appreciate ELCS for English proofreading. This work was supported in part by the Grant-in-Aid for Scientific Research from the Japan Society for Promotion of Science (25670625 to MU, 25282243 and 24300105 to ID, 26120528 and 23300357 to YHT, and 23390354 to EW).
Minako Uga received her BSc (1992) and MSc (1994) degrees in engineering from Sophia University and her MS (2005) and PhD (2010) degrees in medical sciences from the University of Tsukuba. She is a postdoctoral fellow at Jichi Medical University, Japan. Her current research interests are development of fNIRS methodology and adaptation to brain science research of miniature pigs as laboratory animals.
Ippeita Dan graduated from International Christian University (1993) and received his PhD from the University of Tokyo (2002). Was a senior research fellow at the National Food Research Institute and an associate professor at Jichi Medical University. Was appointed a professor at Chuo University. Has authored 80 peer-reviewed articles attracting approximately 3000 citations. His main research missions lie in clinical application and methodological development of fNIRS, and application of psychometrics for marketing in food-related business.
Haruka Dan received her MA (2001) degree from International Christian University and her PhD (2005) degree from Tohoku University, Japan. She was a postdoctoral fellow at the National Food Research Institute and Jichi Medical University, Japan. She is now a visiting assistant professor of Research and Development Initiatives at Chuo University, Japan. She has authored 32 peer-reviewed scientific articles including 20 top-authored contributions over various topics ranging from plant physiology, behavioral food science, and cognitive neuroscience.
Yasushi Kyutoku received his BA (2001) degree from the University of Texas at San Antonio and his MSc (2007) and PhD (2008) degrees from the University of Texas at Arlington, Texas, USA. He was a postdoctoral fellow in Japan at the National Food Research Institute and at Jichi Medical University. Currently, he is an associate professor at Research and Development Initiative, Chuo University, Tokyo, Japan. His research focuses on psychometrics and their applications.
Y-h Taguchi received his BSc (1984), MSc (1986), and PhD (1988) degrees in physics from Tokyo Institute of Technology (TITech). After serving as an assistant professor at TIT (1988 to 1997), he was appointed as an associate professor in the Department of Physics at Chuo University, where he has been a professor since 2006. He is an expert in multidimensional scaling (MDS) techniques with various fields of application most notably in the field of bioinformatics.
Eiju Watanabe received his MD (1976) and PhD (1984) from the University of Tokyo. Former research fellow at the University of Erlangen, and chairman of the Department of Neurosurgery at Tokyo Metropolitan Police Hospital. He is currently a professor and the chairman of the Department of Neurosurgery at Jichi Medical University, Japan. He is known for his work as an inventor of a neuro-navigation system and an initiator of fNIRS imaging known as optical topography.