Exploring effective multiplicity in multichannel functional near-infrared spectroscopy using eigenvalues of correlation matrices

Abstract. Recent advances in multichannel functional near-infrared spectroscopy (fNIRS) allow wide coverage of cortical areas while entailing the necessity to control family-wise errors (FWEs) due to increased multiplicity. Conventionally, the Bonferroni method has been used to control FWE. While Type I errors (false positives) can be strictly controlled, the application of a large number of channel settings may inflate the chance of Type II errors (false negatives). The Bonferroni-based methods are especially stringent in controlling Type I errors of the most activated channel with the smallest p value. To maintain a balance between Types I and II errors, effective multiplicity (Meff) derived from the eigenvalues of correlation matrices is a method that has been introduced in genetic studies. Thus, we explored its feasibility in multichannel fNIRS studies. Applying the Meff method to three kinds of experimental data with different activation profiles, we performed resampling simulations and found that Meff was controlled at 10 to 15 in a 44-channel setting. Consequently, the number of significantly activated channels remained almost constant regardless of the number of measured channels. We demonstrated that the Meff approach can be an effective alternative to Bonferroni-based methods for multichannel fNIRS studies.

Exploring effective multiplicity in multichannel functional near-infrared spectroscopy using eigenvalues of correlation matrices Minako

Introduction
Functional near-infrared spectroscopy (fNIRS) is a convenient, noninvasive tool for assessing cortical hemodynamics in various clinical and neuropsychological situations. [1][2][3][4] The first noninvasive monitoring of brain hemodynamics during hyperventilation 1 was realized by Jobsis. 5 One and a half decades later, fNIRS developed into a genuinely "functional" measurement tool allowing the assessment of human cognitive and visual functions. [6][7][8][9] As the term "spectroscopy" implies, fNIRS started as a single-channel measurement. The number of channels increased to a few that were distantly placed on the scalp so as to avoid light interference between neighboring channels. 10 Later, the problem of light interference was solved by adopting a frequency division multiple access technique, and thus multichannel fNIRS using an array of optodes was invented to simultaneously monitor multiple brain regions. 11 Multichannel fNIRS basically generates discrete channel data. This also enables interpolation from neighboring channels to fill the interchannel space, called optical topography, to produce two-dimensional continuous images of cortical activation. 11 Moreover, in diffuse optical tomography (DOT), a combination of short-and longdistance channels utilizes depth resolution to reconstruct a continuous image, providing more accurate source estimations in three-dimensional space. 2,[12][13][14] Since the invention of multichannel fNIRS, the number of channels has increased to a few dozen in typical studies or to over 100 for whole-head measurements. 15 With increasing multiplicity in fNIRS channels, we have to be aware of the increasing importance of controlling Type I errors (or false positives). In the fNIRS scheme, this corresponds to the risk of falsely treating nonactivated channels as activated channels. In a single-channel measurement, the issue is whether or not the channel is activated: there is only one hypothesis. In a multiple-channel measurement, the number of hypotheses increases up to the number of channels. Such multiple comparisons entail increased risk of Type I errors, which should be controlled as family-wise errors (FWEs). Unlike the experimental-wise error rate control often used in psychological studies, where a large number of possible comparisons are conducted, FWE rate control deals with a limited number of comparisons that are necessary to test hypotheses, which is to say the number of channels. 16 The FWE in multichannel fNIRS studies is typically controlled using one of the two different approaches. When a continuous image is constructed, it is often treated using a random field theory (RFT), where, in a smoothed image, the number of falsely detected active blobs, or clusters of activated pixels or voxels, is controlled to a certain statistical threshold (e.g., 0.05 representing 5 blobs on average in 100 randomly generated images). 17 Typically, the number of false active blobs is set to 0.05.
On the other hand, fNIRS data are often represented as channel-wise data. In this case, multiplicity is equal to the number of channels. The most stringent control for FWE in channel-wise data is the Bonferroni method, in which the statistical significance level (α) is divided by the number of channels (M) The purpose of Bonferroni correction is not to allow a single FWE, but this can be too stringent as it increases the chance for Type II errors or false negatives. One alternative is an extension of the Bonferroni method, called Holm's correction, which offers less stringent multistep and step-down procedures. 18 Another often used alternative is the false discovery rate (FDR) control method, 19 which maintains a proportion of false positives relative to the total number of significant results. The FDR is often favored in functional magnetic resonance imaging (fMRI) 20 and fNIRS 21 studies because of its increased power in comparison with FWE control methods.
However, we must note here that the Bonferroni correction, Holm's correction, and FDR controlling methods all start from the same multiplicity, which is equal to the number of channels, to control Type I errors for the most activated channel. Differences in stringency appear only in channels with less activation. Here, we encounter a dilemma: in order to identify as many activated channels as possible, regions of interest should be broader with a larger number of channels, but this can lead to a reduced number of statistically activated channels being identified due to inflated multiplicity. For example, in a typical multiple-channel fNIRS with 20 channels, an uncorrected p-value of 0.0025 in the most activated channel is necessary to achieve a corrected p-value of 0.05. Nevertheless, in a recent whole-head measurement using 120 channels, the uncorrected p-value necessary for a corrected p-value of 0.05 was 0.0004. Hence, fNIRS researchers employing a multiple-channel measurement often get stuck in what is known as the "Valley of Bonferroni." Thus far, only one approach, called the Dubey/Armitage-Parmar (D/AP) method, 22 has been used to avoid applying correction with a large multiplicity equaling the number of channels in fNIRS studies. [23][24][25] The D/AP method utilizes the spatial correlation between a given channel and other channels to correct the multiplicity of the given channel. One problem with this method is that the multiplicity differs among channels, so regional bias cannot be avoided. In addition, since the D/AP method is dependent on correlation, it is only applicable to a single contrast.
One plausible approach for solving these FWE correction issues in multichannel fNIRS would be the correction with the effective multiplicity (M eff ) using eigenvalues of correlation matrices. The M eff method was invented by Nyholt 26 for correction for multiple testing in the field of genome studies. This method adjusts the original multiplicity by reducing the contribution of large correlations to obtain the M eff . It was later modified to recount the M eff after leveling out large eigenvalues. 27 Since this method was dependent on logical criteria and separation of integer and fraction parts of the eigenvalues, Galwey 28 proposed a more general equation to exclude such dependence.
When a set of correlated data, say activation data from fNIRS channels, are given, the M eff method decomposes the correlation matrices of the data to yield eigenvectors that reflect the strength of the correlation between each data point. From these eigenvalues, the M eff method estimates the effective multiplicity that corresponds to the number of independent tests. Accordingly, multiple hypotheses are corrected by M eff instead of M.
Inspired by the potential of the M eff method, here we examined its utility in statistical analyses for multiple-channel fNIRS data. We prepared three sets of experimental data with focused, moderate, and broad activations. From these, we generated synthetic data with smaller multiplicity and compared the performance of the Bonferroni and M eff methods. Based on these results, we will discuss whether the M eff approach can be an effective alternative to Bonferroni-based methods.

Correction Methods
In this study, we compared two methods for correcting FWE. One is a conventional Bonferroni method used to adjust the significance level (α) by dividing it by the number of tests as in Eq. (1) presented above. The other is the M eff method utilizing eigenvalues of correlation matrices. To reflect the strength of correlation between each data point, the M eff method adjusts the original multiplicity to yield the effective multiplicity, M eff [26][27][28] In multichannel fNIRS group analyses with multiple subjects, we derive a correlation matrix from measured signals, such as channel-wise average values of oxy-Hb signals, from multiple subjects and calculate the eigenvalues. We then obtain the M eff from the eigenvalues and apply it instead of the original M to adjust the α value.
Let us explain the theoretical framework of the M eff method using a typical fNIRS data structure as an example. Note that the major M eff methods described below were originally developed for genetic data, but we modified them to fit the fNIRS data scheme. When multiple-channel fNIRS data are obtained from M channels for N subjects, summary data for group analyses can be denoted as β M×N . Then, the eigenvalues (λ i ) are derived from a correlation matrix (M × M) of the measured signals (β M×N ) as follows: λ 1 ; λ 2 ; : : : ; λ M : (2) Using the eigenvalue variance V λ , Nyholt 26 calculated the M eff as described below: In order to present a more accurate estimate of M eff , Li and Ji 27 introduced a new calculation for M eff as described below: Iðx ≥ 1Þ is an indicator function, which yields 1 when x ≥ 1 and gives 0 otherwise. [x] is the floor function, which gives the largest integer less than or equal to x.
Galwey 28 pointed out that the M eff by Li and Ji gives more weight to the fractional part than to the integer part and proposed a generalized function as follows: In this research, we implemented Galwey's function to calculate M eff because this method is optimized for multiple signals with strong correlations and can be treated in a continuous way.

Experimental Procedures
Experimental data used in this study were obtained from three of our previous, separate studies [29][30][31] and reanalyzed from different perspectives. Here, we briefly describe the experimental procedures.
Participants in the visual-based oddball task (OBT, task 1) comprised 22 right-handed, healthy children (15 boys, 7 girls; average age ¼ 9.8 years, SD ¼ 2.0, and range: 6 to 13 years). Those for the verbal category fluency task (CFT, task 2) comprised 22 right-handed, healthy volunteers (6 participants were excluded; 16 males, 6 females; average age ¼ 34.0 years, SD ¼ 10.5, and range: 22 to 57 years). Those for the naming task (NMT, task 3) comprised 26 right-handed, healthy volunteers (4 participants were excluded; 19 males, 7 females; averageage ¼ 33.7 years, SD ¼ 10.7, and range: 22 to 57 years). Written informed consent was given by all participants and, in the case of the OBT experiment, their parents. The study was approved by the Jichi Medical University ethics committee.
In the OBT, participants were requested to push a button in response to stimuli. The task included detection of and response to infrequent (oddball) target events included in a series of repetitive events. During the session, subjects viewed a series of pictures once every second and responded with a key press to every picture. In the baseline block, subjects were presented one picture and asked to press a blue button for that picture. Following the baseline block, two kinds of pictures were presented sequentially and the participants were instructed to respond to the standard stimuli and target stimuli by pressing a blue and red button, respectively. Each session consisted of six block sets, each containing alternating baseline (25 s) and oddball (25 s) blocks.
In the CFT, participants were requested to overtly generate words for five categories. The task paradigm was a periodic block design with five alternating conditions of rest (30 s) and experimental task (20 s).
In the NMT, participants were requested to overtly name the content of a picture exhibited by an experimenter. The task paradigm was a periodic block design with five alternating conditions of rest (30 s) and experimental task (20 s). During the experimental task, each picture was presented for about 3 s.

Evaluation of M eff
Using resampling simulations from experimental data of fNIRS studies, we evaluated the relationship between M eff values and the number of channels. For the resampling simulations, we selected, from published studies, three kinds of experimental data with different activation profiles: OBT data from children with lateralized focused activation (task 1), 29 CFT data from adults with lateralized moderately focused activation (task 2), 30 and NMT data from adults with bilateral broad activation (task 3). 31 The β-values of a general linear model (GLM) with regression to a hemodynamic response function or the average values of oxy-Hb signals were used as summary data. Because the measured signals of the children contained many movement artifacts, the averaged values of oxy-Hb, which were compatible with β-values, were selected to analyze the children's data (note that β-values match the average when a GLM is performed with regression to a box-car function alone). Specifically, after the blocks with considerable artifacts were removed, for each channel of each subject the averaged differences in oxy-Hb values between each task and the preceding baseline periods were further averaged across the task blocks. 29 To find the relationship between the number of channels and the M eff , we randomly resampled a given number channels (m∶ð1 ≤ m ≤ 44Þ) from each dataset and calculated M eff . For each m, resampling was performed 1000 times, and the average value and standard deviation were calculated.
In order to find the relationship between the overall number of channels and the number of channels detected as active with the M eff derived for a given dataset, truly active channels must be set. We performed one-sample t-tests for the measured signals for each channel, and those channels with t-values above a given t-value threshold were defined as "truly active channels" for the corresponding task. The t-value thresholds were set to 2.0, 2.5, 3.0, and 3.5. Using these obtained truly active channels as a starting point (e.g., 2 channels for task 1, 9 for task 2, and 12 for task 3 at a t-value threshold of 3.0), we added additional channels, derived the M eff , and calculated the number of active channels. For each channel number, resampling was performed 1000 times (or for a theoretical maximum when the resampling range was limited), and average values and standard deviations were calculated. For comparison, the numbers of active channels were also obtained for uncorrected data and for Bonferroni correction.

Selection of Active Channels as "Truly Active"
For each task, second-level group analyses using one-sample t-tests against zero were performed. We defined these channels as "truly active channels" for the resampling simulation. The numbers of truly active channels are shown in Fig. 3. For example, when the t-value threshold was 3.0, the number of truly active channels was 2 for task 1 (OBT), 9 for task 2 (CFT), and 12 for task 3 (NMT). These cases are presented in Fig. 1 for each task. The numbers and distributions of truly active channels varied for each task. We confirmed that task 1 had a right-lateralized focused activation pattern, task 2 had a left-lateralized moderate activation pattern, and task 3 had a bilateral diffuse activation pattern.

Evaluation of M eff
We plotted the average M eff values and SD for each number of channels for each task (Fig. 2). The M values, used for correction in the Bonferroni method, were also plotted for comparison. When the number of measurement channels was less than 10, there was little difference between M eff and M for all tasks. However, when the number of measurement channels exceeded 10, the value of M eff became gradually controlled, with tapered response curves in all cases. Eventually at around m ¼ 44, M eff converged at a range of approximately 10 to 15.

Evaluation of Detected Active Channels
For each task, significantly active channels for which p-values were less than 0.05 were calculated for different numbers of channels. The number of active channels was derived with no correction, with Bonferroni correction, and with the M eff method. The number of active channels for each case is plotted in Fig. 3.
As shown in Fig. 3, without correction, the number of active channels increased linearly for each task. For a large number (approximately 40) of channels, the number of detected active channels did not inflate linearly but converged to a certain value, which equaled the number of detected active channels with the M eff method applied to the whole number of channels. Irrespective of the total number of channels and truly active channels initially given, the numbers of detected active channels with M eff methods remained larger than or equal to those with Bonferroni methods and smaller than or equal to those without correction.

Overview
The current study examined the relationship between the number of channels and detected active channels based on FWE correction with the M eff approach. We conducted resampling simulations applied to actual multichannel fNIRS data for different cognitive tasks. Compared with the conventional Bonferroni method, the M eff approach generally yielded sufficiently moderate FWE corrections and increased statistical power.
Since the Bonferroni method regards each channel as independent, its multiplicity linearly increased as the number of channels increased. In contrast, the M eff approach limited the inflation of M eff when there were, roughly, more than 10 channels. Accordingly, M eff converged to a certain value, which was specific to task species. Considering these characteristics, M eff can be considered an effective FWE correction method for modern fNIRS measurement with a large number of channels.
M eff methods are expected to have a high affinity to fNIRS data. First, they do not require any assumptions about the spatial configuration of channels, in contrast to RFT, which assumes spatial smoothness and continuity in a good-lattice assumption. Sparsely and irregularly distributed fNIRS channels can be treated effectively using the M eff method. Second, the M eff method deals with the spatial correlation structure of data holistically, but is not adjusted by local variations of spatial correlation. Thus, M eff is applicable to data from all channels without local bias. Third, the M eff method is applicable to a wide variety of data structures including both single-and multiple-subject analyses and to multivariate analyses such as ANOVA. 26,27

Interpretation of Results
M eff increased with the number of channels, but not in a linear manner. The increase ratio went down and finally converged at a range of 10 to 15. Average M eff and standard deviation at 20 channels were 12.2 AE 0.4, 8.8 AE 0.3, and 11.4 AE 0.5 for OBT, CFT, and NMT, respectively. Since variability (SD) due to different channel selections was negligible, we suggest that the M eff estimations in resampling simulations could well represent actual channel number increases. Note that relatively small SD  For each task, M eff reached 80% of maximum values at 20 channels. This observation suggests that a substantial M eff increase beyond 44 channels is not likely to occur and that M eff converges to a certain level for each task.
As shown in Fig. 3, the lack of FWE correction resulted in an increased number of false positive detected-active channels. This suggests the necessity for Type I error corrections.
Although application of the Bonferroni method maintained a number of active channels similar to the number of truly active channels with a small number of channels, the number of detected active channels tended to drop below that of truly active channels as the total number of channels increased. Hence, we suggest that the current resampling simulations represent the over-correction of FWE, the so-called "Valley of Bonferroni," often encountered in actual multichannel fNIRS measurements well.
On the other hand, the M eff approach yielded a number of detected active channels similar to that of truly active channels at a t-value threshold of 3.0 even when the overall number of channels increased. This clearly demonstrates that the M eff approach can reduce Type II errors compared with the conventional Bonferroni method, while maintaining control over Type I errors compared with uncorrected states.
In other words, consideration of spatial correlation among channels and analysis of their eigenvalue can lead to M eff estimations that are similar to actual multiplicity for FWE correction. Accordingly, the M eff approach can provide a simple, data-driven FWE correction method that maintains statistical power for multichannel fNIRS and its increasing number of channels.

Consideration of Task Species
To explore the applicability of the M eff approach, we selected three different experimental tasks from previously published studies with the same numbers of channels. This was in order to cover a wide range of activation patterns from a focused and lateralized activation pattern evoked by an OBT in healthy children, 29 to a moderately focused and lateralized activation pattern evoked by a CFT in healthy adults, 30 and to a broad and less lateralized activation pattern evoked by an NMT in healthy adults. 31 Our first expectation was to detect changes in M eff associated with degrees of focus. However, we did not find any tendency between focus size (the number of truly active channels) and M eff . Thus, we tentatively suggest that M eff is affected by task species, but not by focus size. However, further exploration is essential to draw discrete conclusions. First, the current samples were from different groups and included both adults and children. The true effect may have been buried by sample heterogeneity. In order to explore the factors determining M eff , we have to examine the effects of focus size and task species using the same group of subjects. Despite this, however, M eff did not vary much among different tasks and subject groups, with values ranging from 10 to 15 out of 44 channels. This leads us to expect that M eff is robust to different tasks when similar experimental conditions are adopted. In this sense, the M eff approach could be regarded as a practical option that can be readily adapted to many fNIRS studies.

Technical Considerations
For the current analyses, we selected Galwey's method 28 from among several different M eff methods used in the realm of genetics. 26,27 As represented in Eq. (5) root-mean-square values, Galwey's method puts weight on large spatial correlations with eigenvalues over 1. Selection of an appropriate multiplier (currently 2) in Galwey's equation would require further examination. One inherent merit of the M eff approach is its wide applicability. Since its core idea lies in determination of M eff in a data-driven way, it can be combined with other methods such as the FDR method. The combination of M eff and FDR has been adopted in genetic studies. 27 Indeed, the current study itself may be rather interpreted as a comparison of two Bonferroni correction methods with or without data-driven M eff . Thus, how the combination of FDR and M eff affects the statistical power would be an interesting topic to explore.
Another aspect of the applicability of M eff is its extension to various experimental designs beyond one-sample t-tests. Thus far, two methods with less lenient statistical thresholds than Bonferroni correction have been adopted for fNIRS studies. The first is the D/AP alpha boundary [22][23][24][25] for adjusting the spatial correlation among channels which confers effective M values to each fNIRS channel. 23 However, this method has two major limitations: One is that M values differ among channels so that no unified criterion can be realized for multichannel measurements, and the other is that it is limited to a one-sample t-test scheme. Similarly, Singh and Dan explored a resamplingbased approach based on the Max-T method, 17,21 but this was also limited to a one-sample t-test scheme.
On the other hand, the M eff approach can be adopted for a wide range of experimental designs categorized into a GLM including two-sample t-tests, ANOVA, and regression analysis. The former two designs are important for group studies and the latter is essential for individual studies dealing with temporal as well as spatial correlations.
In addition, although the current study limited its focus to channel-wise analyses, the M eff approach may be applicable to pixel-or voxel-based methods, namely continuous images created by interpolation, image reconstruction, and DOT. Since these continuous images are constructed from channel-wise data, comparison between the two approaches as to how they affect M eff would be an intriguing topic to investigate.
In this sense, we would expect the current study to be regarded as a primer with the goal of inspiring a wide range of applications for M eff in order to take fNIRS studies in various directions.