Signal processing of functional NIRS data acquired during overt speaking

Abstract. Functional near-infrared spectroscopy (fNIRS) offers an advantage over traditional functional imaging methods [such as functional magnetic resonance imaging (fMRI)] by allowing participants to move and speak relatively freely. However, neuroimaging while actively speaking has proven to be particularly challenging due to the systemic artifacts that tend to be located in the critical brain areas. To overcome these limitations and enhance the utility of fNIRS, we describe methods for investigating cortical activity during spoken language tasks through refinement of deoxyhemoglobin (deoxyHb) signals with principal component analysis (PCA) spatial filtering to remove global components. We studied overt picture naming and compared oxyhemoglobin (oxyHb) and deoxyHb signals with and without global component removal using general linear model approaches. Activity in Broca’s region and supplementary motor cortex was observed only when the filter was applied to the deoxyHb signal and was shown to be spatially comparable to fMRI data acquired using a similar task and to meta-analysis data. oxyHb signals did not yield expected activity in Broca’s region with or without global component removal. This study demonstrates the utility of a PCA spatial filter on the deoxyHb signal in revealing neural activity related to a spoken language task and extends applications of fNIRS to natural and ecologically valid conditions.


Introduction
Speech is a primary human function; however, brain activity related to tasks using overt speaking is difficult to investigate using traditional imaging methods, such as functional magnetic resonance imaging (fMRI), due to motion artifacts resulting from mouth and head movements. Language production has primarily been studied using imagined (covert or internal) speech 1 or sparse sampling methods. 2,3 These studies generally support classic literature on the canonical language system, [4][5][6] in which brain activity associated with speech production has been localized to Broca's region and supplementary motor cortex. This prior literature plus the gold-standard from lesion studies and neurosurgical interventions where cortical stimulations document functional loci for speech production based on picture-naming tasks 7 provide a valid reference for the findings of this study. Our primary goal in this study was to develop a technique to reliably acquire hemodynamic signals during overt speech production. Here, we compare the blood oxygen level-dependent signals of fMRI using the picture-naming task and other prior language studies using Neurosynth 8 with hemodynamic signals of functional near-infrared spectroscopy (fNIRS) (acquired during covert object naming) based on concentrations of both oxyHb and deoxyHb with and without spatial filtering.
Although fNIRS has been available as a neuroimaging methodology for more than 20 years, 9,10-15 many technical and computational challenges remain in order to investigate spatially localized neural cognitive functions in adult subjects. [16][17][18] However, one of the primary advantages of fNIRS includes signal acquisition in natural conditions that allow relatively free movement and communication. One of the specific challenges for this application includes filtering of systemic artifacts, such as effects of blood pressure and respiratory changes, that are often prominent in fNIRS signals. 16,19,20 Overt speaking tasks, as compared to nonverbal cognitive tasks such as mental arithmetic, have been shown to effect breathing and the end-tidal CO 2 concentration in blood (PetCO 2 ) with differential global effects on task-related changes in oxyHb and deoxyHb signals. 20 The complex combination of effects due to speaking and breathing activities as well as volitional cognitive tasks challenges interpretations of fNIRS signals. In this paper, we attempt to address the issue of global systemic artifact using a spatial component removal method 21 and using the deoxyhemoglobin (deoxyHb) signal, which may be less susceptible to global systemic components as well as local variations within and across subjects. However, both deoxyHb and oxyHb signals are shown for illustrative purposes.
The global systemic artifact in fNIRS is often addressed by using short channel recording, 22,23 which is assumed to be only sensitive to systemic components that can be removed from the data. This approach is a method of choice for region-of-interest (ROI) studies that do not employ full head coverage. However, since short channel separation relies on the temporal characteristics of the waveform of the systemic artifact, this method is challenged by the fact that these artifacts can have similar waveforms to the task-related fNIRS signal. 16,21,22 Thus, a regression method using temporal domain information from the short channels may remove both the global effects as well as the spatially localized task-related neuronal signals, reducing sensitivity to main effects.
To address this problem, we previously reported the results of a principal component analysis (PCA) spatial filter that was used to remove global components from oxyhemoglobin (oxyHb) and deoxyHb signals during a finger-thumb tapping task, with optode coverage that was distributed over most of the head. 21 The effects of global systemic artifacts within the oxyHb signal were more pronounced relative to the deoxyHb signal. However, following the application of the PCA filter, the oxyHb signal also showed expected spatial specificity as did deoxyHb signals.
In this study, we applied the previously developed PCA spatial filter to fNIRS signals recorded during an overt picture-naming task, which was similar to the classic Boston Naming Test. 24 In addition, we compared recorded fNIRS signals with fMRI data previously acquired during silent speech 25 to evaluate the spatial correlation of results between these two methods using similar tasks and paradigms. Tasks that elicit hemodynamic signals with well-defined functional patterns, such as finger-thumb tapping or flashing checkerboard viewing, have typically been used to develop and verify fNIRS recording and systemic artifact removal techniques. Spatial patterns generated by simple language tasks, such as picture naming and description, can also be compared to meta-analyses of functional imaging results. Figure 1 shows the results of a Neurosynth forward inference map generated from a meta-analysis of 6983 studies using the search term "Broca." Neurosynth is an online metaanalysis tool that uses references to specific terms in many published studies to generate activity maps. 8 To generate the forward inference map, a statistical analysis is performed using the coordinates reported in studies that do and do not reference Broca's region.
We employed picture naming and description in order to confirm well-known, previously verified, functional results that serve as fiducial markers for verification of the spatial filter technique. We aim to compare results from oxyHb and deoxyHb signals and two signal processing methods (with and without spatial filtering) to validate mapping procedures associated with spoken language using fNIRS.

Participants
A total of 22 individuals (14 female, mean age ¼ 24.5 AE 7.8, ranging from 18 to 55 years) participated in the experiment. All were fluent English speakers but language history and lateralization was not obtained for this study. All but two participants were right-handed, as determined by the Edinburgh Handedness Inventory. 26 No participants were excluded from the experiment. Written informed consent was obtained from each participant in accordance with guidelines approved by Yale University Human Investigations Committee (HIC #1501015178). All data were obtained from the Brain Function Laboratory at Yale School of Medicine, New Haven, Connecticut, and each person was compensated for their participation in the study.

Functional NIRS Signal Acquisition
fNIRS signals were acquired using a LABNIRS system (Shimadzu Corp., Kyoto, Japan). Thirty emitter and 29 detector optodes were positioned 3 cm apart, providing a grid of 98 acquisition channels [ Fig. 2(a)]. Each emitter optode connected to laser diodes at three wavelengths (780, 805, and 830 nm) used to measure changes in concentration of deoxyHb and oxyHb. Signals were acquired every 0.093 s. For analysis, signals were down-sampled to 0.93 samples∕s by averaging 10 data points into one value.

Task and Paradigm
To investigate cortical activity during language production acquired by fNIRS, we used an overt picture-naming task that was similar to the object-naming tasks commonly used in fMRI for neurosurgical planning applications. 7 Participants were instructed to name and give a short description of each picture, which was presented for 3 s. A 15-s task block (five pictures) alternated with a 15-s rest block [ Fig. 2(b)]. Each run consisted of six task/rest cycles, and two runs were performed for a total of 6 min.

Optode Localization and Definition of Region of Interest
The locations of emitters and receivers, along with standard 10 to 20 (Ref. 27) landmarks, including inion, nasion, Cz, T3, and T4, were determined using a Patriot three-dimensional (3-D) digitizer (Polhemus, Vermont). The Montreal Neurological Institute (MNI) coordinates for each recording channel and the corresponding anatomical locations of these channels were determined with the statistical parametric mapping package, NIRS-SPM. 28 The native form of fNIRS data is channelbased since signals are recorded through channels and not individual voxels, which are interpolated between channel locations. Due to individual anatomical variations (e.g., head size and shape), the channel locations (represented by MNI coordinates)  are not necessarily identical across participants (Fig. 3). To correct for these variations, we projected the data from each participant onto regions that represent the median channel locations for the group ( Table 1 in Appendix A).

Functional NIRS Data Preprocessing
Temporal baseline drift was removed with the wavelet detrending algorithm procedure provided in NIRS-SPM. 28 Global components were removed using the PCA spatial filter algorithm reported previously. 21 The value of the width at half-maximum of the spatial filter was set at 46 deg rather than 50 deg. See Appendix B for a detailed explanation on the optimization of this parameter. Beta values (i.e., the amplitude of neural activity defined as the scale of best fit hemodynamic response function) were projected into MNI standard brain space (2 × 2 × 2 mm 3 ).
Transforming fNIRS data into a 3-D volume is done with triangulation-based linear interpolation (using the grid data command in MATLAB). For voxels located directly on a channel, the spatial smoothing range was zero. For a voxel at the center of a triangular pyramid, the smoothing value was the mean of surrounding channels. In general, the range of spatial smoothing was less than 1.5 cm, half the distance between two channels. No additional smoothing was applied.

Voxel-Wise Analysis
First-level (single subject) and second-level (group) general linear model analyses were performed using SPM8. 29 Beta values (i.e., hemodynamic signal amplitude as fit to the hemodynamic response function) were projected into MNI standard brain space using linear interpolation. Any voxel located farther than 18 mm away from the brain surface was excluded. In order to compare the effect of the task on the deoxyHb and the oxyHb signals, we have adopted a convention of inverting the polarity of the deoxyHb signals for the group analyses so that both oxyHb and deoxyHb data show the same polarity in terms of representing neural activity. A reduction in deoxyHb concentration and an increase in oxyHb concentration both correspond to "positive" fNIRS activity as represented by the figures and the reverse was true for "negative" activity. Results for the contrast, object naming versus rest, were rendered at threshold level p < 0.05 corrected by a false discovery rate (FDR). 30 3 Results

Deoxyhemoglobin
We report results from both the deoxyHb and oxyHb signals that were processed (1) to remove global components ("clean" results) and (2) to show the unmodified signals ("raw" results). Figures 4(a) and 4(c) show the uncorrected results at a lenient threshold to illustrate the overall pattern of activity. The clean deoxyHb (upper left) data shows positive (red-yellow) activity covering left pars triangularis, premotor, and supplementary areas. While raw deoxyHb data show distributed activity covering most of the entire recorded area, data from deoxyHb signals with the application of the spatial filter were corrected for multiple comparison error using FDR (p < 0.05), 30 and are shown in Figs. 5(a) and 5(b) and Table 3 (Appendix C).

Oxyhemoglobin Results
Uncorrected and lenient results obtained from the oxyHb signals with and without the spatial filter are shown in Figs. 4(b) and 4(d) to illustrate the general distribution patterns. Both the clean and raw signals show a large cluster of negative activity covering most of the recording area. Negative activity indicates that the oxygen concentration was higher during baseline (resting) epochs compared to speaking epochs. Thresholded and corrected results from the spatially filtered oxyHb signal [ Fig. 5

Event Triggered Average Results
Figure 6(a) shows the event-triggered average plot for each channel from a representative subject prior to general linear modeling analyses. Following the fNIRS data presentation convention as stated above, both an upward oxyHb signal (red) and a downward deoxyHb signal (blue) indicate positive neural activity. A global component is clearly visible in all of the channels and is especially noticeable in the oxyHb ("w-shaped" signal). The oxyHb signal shows a decrease (negative activity) in almost all channels consistent with the raw data shown in Fig. 4(d). The deoxyHb signal shows a decrease (positive activity) in almost all channels, consistent with the raw data shown in Fig. 4

Comparison of Functional NIRS, Neurosynth, and Functional Magnetic Resonance Imaging Results
An independent fMRI dataset based on a similar task and paradigm is presented here for comparison with the fNIRS     Fig. 7(a)]. Although the task completed during acquisition of these fMRI images was covert (silent) naming rather than our overt (spoken) picture naming, the activity around Broca's region is expected to be similar and serves as a second fiducial marker for the findings of this study. Figure 7(b) shows the neural activity measured with fNIRS deoxyHb data after global component removal. Within the coverage of the fNIRS channels, activity around Broca's region overlays the activity shown in the fMRI data. Note that the optode coverage [ Fig. 2(a)] does not include the most lateral ventral regions observed in either the fMRI data [ Fig. 7(a)] or the Neurosynth marker (Fig. 1). The fNIRS data [ Fig. 7(b), dorsal view] show increased activity near the supplementary motor area (SMA) as compared to the fMRI data [ Fig. 7(a), dorsal view]. This is as expected for an overt speaking task where the supplementary motor system is actively engaged during speech articulation. The result obtained from the spatially filtered deoxyHb signals was compared with the fMRI data set, Fig. 1(a), and the Neurosynth map of Broca's area (Fig. 1). Figure 8 shows the fMRI activity during covert speaking [ Fig. 8(a)], the Neurosynth map of Broca's area [ Fig. 8(b)], and the present fNIRS result [ Fig. 8(d)]. The overlap of all three is shown within the open circle in Fig. 8(c), illustrating a common area of activity. Note that since SPM group analysis is limited to the channels that are present for all subjects, the fNIRS coverage shown in Fig. 8 (the white boundary) is smaller than the individual coverage shown as median channel locations in Fig. 2(a). As shown in Fig. 8, the coverage in common across all subjects does not include the most ventral regions observed in either the fMRI data [ Fig. 8(a)] or the Neurosynth marker [ Fig. 8(b)].

Discussion
Previously, we have shown that global component removal during preprocessing using spatial filtering reveals activity consistent with expected cortical activity for finger tapping tasks. 21 Here, we extend these findings to include overt speaking and determine that this spatial filter can be applied for deoxyHb signals, revealing expected cortical activity in areas of the brain specialized for speech production. Specifically, "clean" deoxyHb signals yielded activity localized to left frontal regions included in Broca's region, and pre-and supplementary motor cortex consistent with a previous fMRI study using a similar task and paradigm with silent speech 25 as well as the Neurosynth meta-analysis using a wide range of silent language tasks performed during scanning with fMRI. Both are consistent with well-described findings from intraoperative stimulation.
Although the deoxyHb signals with global component removal show specific activity in Broca's region and the SMA [ Fig. 5(a)], the unfiltered deoxyHb data show widespread global component [Figs. 5(c)] during the picture-naming task. This is different from our previous findings based on finger thumb tapping, which suggested that global components in the deoxyHb were not significant. 21 The current results imply that the global component in the deoxyHb signal is more apparent in some tasks than others, suggesting that global component removal is generally beneficial to an analysis pipeline to maximize the likelihood of reflecting neural activity.
The coupling between neurological and physiological processes that underlie changes in oxyHb and deoxyHb concentrations in the brain during cognitive and motor tasks is an active topic of investigation. The anticorrelation between these two signals that is typically observed during task-rest cycles is believed to reflect (1) increases in blood flow related to neutrally active tissue and serves as a proxy for task-specific neural activity that underlies cognitive function; (2) increases in blood flow related to systemic physiological factors; and (3) relative decreases in deoxyHb concentrations also related to neurovascular coupling and serves as a proxy for neural activity, respectively. Multiple systemic physiological factors not directly related to the neurovascular coupling have been described. 18 For example, variations in partial pressure of end-tidal carbon dioxide (PetCO 2 ) associated with respiration have been observed during speech production and shown to decrease with similar tasks performed with only internal and cognitive responses. 20 Other nonneural physiological factors, such as heart rate, blood pressure, respiration rate, and concentration of CO 2 , have also been shown to influence blood oxygen concentrations as measured by fNIRS (Refs. 18, 31 and 32). It is widely understood that these factors are modulated by subject characteristics, such as age, gender, fitness, body size, time from exercise, medications, anxiety levels, and further complicate computational approaches to separate neural and systemic components in both oxyHb and deoxyHb signals. Furthermore, assumptions of equal variance across whole brains of individual subjects may also be violated by both individual differences and task demands. 33 To the extent that these sources of variation are systemic in origin, they would be expected to differentially affect the oxyHb and deoxyHb signals. For example, the task related increase in the oxyHb signal is attributed to both neural and systemic physiological factors, whereas the task-related decrease in the deoxyHb signal is primarily attributed to neurovascular coupling.
The paradoxical group observation in the unthresholded, averaged raw oxyHb signals [ Fig. 4(d)], showing both the absence of signal in the ROI, Broca's Area, and the negative group average in frontal areas is consistent with the hypothesis that systemic factors such as end-tidal carbon dioxide may have resulted in a negative signal. Regional differences in systemic factors were also present, as illustrated by the difference between the oxyHb signal in the three channels in Figs. 6(b)-6(d). These localized systemic effects may have prevented the spatial filter from adequately removing this global negative signal, as shown by the group-averaged result in Fig. 5(b). When the oxyHb was subjected to a threshold and multiple comparisons correction, individual differences in systemic factors may have washed out a group effect. However, the widely distributed group signal for the simultaneously acquired raw deoxyHb data, Fig. 4(c), suggests that the deoxyHb signal may be less affected by these sources of variation than the oxyHb signal for a speaking task. This suggestion and observation is an important topic for future research and the development of computational and experimental approaches as fNIRS emerges as a method of choice for studies of cognitive processes in natural conditions.

Limitations
The finding that group data for the oxyHb signal during the overt speaking did not reveal canonical regions associated with Broca's area, i.e., left pre-and supplementary motor cortex and left pars opercularis, was unexpected. Although increased individual variability of systemic factors associated with breathing that occur during a speaking task as well as individually specific regional brain differences may contribute, there are other possible contributing factors. The movement of head, mouth, and the temporalis muscle during overt speech creates particularly challenging circumstances for an imaging study. These findings suggest that future investigations of speech functions would benefit from movement extraction algorithms, and, in particular, the oxyHb signal may benefit from simultaneous measurements of PetCO 2 , as previously suggested by Scholkmann et al. 20 Algorithms that employ physiological regressors to further refine the separation between neural and systemic effects, in addition to PetCO 2 , such as heart rate, blood pressure, and respiration, 18 may also be particularly beneficial to the oxyHb signal. Additionally, while traditional short channel regression techniques in the temporal domain may also remove cortical responses, newer techniques that only regress data that only has a positive (nonstandard) correlation between oxyHb and deoxyHb have been suggested and may further increase signal to noise in the oxyHb recordings. 33 An additional limitation of the study was the variability of detector locations in the inferior aspect of the left frontal lobe. This was due to the effects of variability of channel location in that area resulting from variations in head and cap size. As the field of view indicates (Fig. 3), the inferior aspects of Broca's area were not reliably sampled. This is a potential pitfall that can be avoided in future investigations with cap sizes designed to fit various head sizes.

Conclusion
In this study, we compared fNIRS activity from an overt picture-naming task to both a Neurosynth activity map and fMRI activity during a silent picture-naming task. 25 Spatial filtering of global components from the fNIRS deoxyHb signal yielded results similar to those obtained with fMRI. Even after spatial filtering, fNIRS oxyHb signals did not show expected activity patterns related to picture naming. One possible explanation is that the oxyHb signal is more sensitive to modulation by systemic sources. The deoxyHb yielded activity patterns similar to fMRI and Neurosynth results only after global component removal was applied. This study is the first to our knowledge to show the benefits of systemic artifact removal on fNIRS signals recorded during a task involving spoken language to eliminate neural responses from Broca's area. Findings suggest that fNIRS may be used to study spoken language outside the confines of an fMRI scanner and thereby extends the applications of fNIRS to neuroimaging in natural and freely moving conditions.

Appendix A: Median Channel Locations
The median locations for each channel are listed in Table 1. Table 1 Median channel locations for all subjects. The X , Y , and Z columns represent MNI coordinates. MNI coordinates were converted to Talairach coordinates to generate anatomical areas. The last column lists the atlas-based probability that the X Y Z coordinates are within that anatomical location (only probabilities greater than 20% were listed here).        Swethasri Dravida received her BS degree in mathematics and brain and cognitive sciences from the Massachusetts Institute of Technology in 2013. She is a graduate student at Yale School of Medicine. Her current research interests include using functional near-infrared spectroscopy and EEG to study social interaction, especially in clinical contexts such as autism.
Joy Hirsch received her PhD in psychology and visual science from Columbia University and is now a professor of psychiatry and neurobiology at the Yale School of Medicine, and a professor of neuroscience at University College London. She is also the director of the Brain Function Laboratory at Yale University. Her research is focused on investigations of neural circuitry that underlies human social interactions using multimodal neuroimaging techniques including fNIRS, fMRI, EEG, eye-tracking, and behavioral measures. Prior to recruitment to Yale, she was a director of the fMRI Research Center at Columbia University.