Cerebral hemodynamic and oxygenation changes induced by inner and heard speech: a study combining functional near-infrared spectroscopy and capnography

Abstract. The aim of this study was to investigate the effects of inner and heard speech on cerebral hemodynamics and oxygenation in the anterior prefrontal cortex (PFC) using functional near-infrared spectroscopy and to test whether potential effects were caused by alterations in the arterial carbon dioxide pressure (PaCO2). Twenty-nine healthy adult volunteers performed six different tasks of inner and heard speech according to a randomized crossover design. During the tasks, we generally found a decrease in PaCO2 (only for inner speech), tissue oxygen saturation (StO2), oxyhemoglobin ([O2Hb]), total hemoglobin ([tHb]) concentration and an increase in deoxyhemoglobin concentration ([HHb]). Furthermore, we found significant relations between changes in [O2Hb], [HHb], [tHb], or StO2 and the participants’ age, the baseline PETCO2, or certain speech tasks. We conclude that changes in breathing during the tasks led to lower PaCO2 (hypocapnia) for inner speech. During heard speech, no significant changes in PaCO2 occurred, but the decreases in StO2, [O2Hb], and [tHb] suggest that changes in PaCO2 were also involved here. Different verse types (hexameter and alliteration) led to different changes in [tHb], implying different brain activations. In conclusion, StO2, [O2Hb], [HHb], and [tHb] are affected by interplay of both PaCO2 reactivity and functional brain activity.


Introduction
Previous studies of guided rhythmic speech exercises in the context of arts speech therapy (AST) on human physiology showed that AST tasks, i.e., recitations, are associated with characteristic changes in variations in the heart rate 1,2 and cardiorespiratory interaction. In a next step, effects on cerebral and systemic changes of hemodynamics and oxygenation were investigated by our research group using functional near-infrared spectroscopy (fNIRS). 3,4 A decrease in cerebral hemodynamics and oxygenation was found to occur during speech exercises, which was hypothesized to be a result of a decrease in the partial pressure of carbon dioxide (CO 2 ) in the arterial blood (PaCO 2 ) during speaking. This hypothesis was confirmed in a subsequent study combining fNIRS and capnography, 5 where we observed changes in end-tidal CO 2 (P ET CO 2 ), a reliable and accurate estimate of PaCO 2 , 6-8 during all speech tasks and even during the control task (mental arithmetic). This led us to conclude that the changes in hemodynamics and oxygenation are a combination of two factors: (1) a hypercapnia mediated by changes in breathing (hyperventilation) during the tasks and (2) changes in brain activity (neurovascular coupling), where previously the first factor was probably stronger than the second one.
In a following study, 9 we demonstrated that even inner speech, i.e., speech not spoken aloud, leads to significant changes in P ET CO 2 as well as cerebral hemodynamics and oxygenation. Table 1 gives an overview of previous study results.
These results triggered the question of whether simply hearing a recitation also causes changes in PaCO 2 and, thus, in cerebral hemodynamics and oxygenation. Since this question had not been addressed yet, but is of importance in functional studies involving speech, the aim of the present study was to investigate the effect of different types of (1) inner speech and (2) heard speech on P ET CO 2 dynamics as well as cerebral hemodynamics and oxygenation measured using fNIRS and capnography.

Subjects, Experimental Protocol, and Instrumentation
Twenty-nine healthy subjects (14 men, 15 women, mean age: 47.0 AE 12.8 years) participated in this study, which was carried out as a controlled and randomized crossover trial. Study participants were German/Swiss German native speakers who had no previous knowledge of AST and were asked not to smoke, eat, or consume any stimulants (such as caffeine or other ingredients in energy drinks) for at least 2 h before the start of the measurements. Approval for the study was obtained from the Ethical Committee of the Canton of Zurich: the design of the study was in accordance with the Declaration of Helsinki and informed consent was obtained from the subjects prior to each measurement.
Three task modalities were applied: (1) inner recitation of the text (inner speech, IS), and listening to the text (heard speech, HS) while (2) the text was recited by a person (HSP), or (3) a recorded recitation was played (HSR). Two different types of text were used, i.e., one with hexameter and one with alliterative verses. Thus, six different tasks were investigated: inner speech of a (1) hexameter (IS-H) and (2) alliterative (IS-A) text, listening to a live recited (3) hexameter (HSP-H) and (4) alliterative (HSP-A) text, and listening to a recorded (5) hexameter (HSR-H) and (6) alliterative (HSR-A) recitation.
The two text types have the following characteristics: alliteration is a form of rhythmic speech with repetition of a particular initial sound in the first syllables of a series of words or phrases; the hexameter is a metrical line of verse consisting of six feet. These two verse types induce a different flow of speech and may induce different patterns of brain activity. Alliteration and hexameter recitation were investigated in this study since they are frequently used in the context of AST. Recitation of a person was applied because it closely resembles the situation in AST; recorded recitation was additionally employed because it ensures the same recitation for every participant.
Each measurement lasted 43 min [8 min baseline (interval 1), 5 min task (interval 2), 5 min recovery (interval 3), 5 min task (interval 4), and 20 min recovery (intervals 5 to 8)]. To avoid potential carry-over effects, each task was performed on a separate day. During the IS and HSP tasks, the subjects sat opposite a person who recited the respective text verse-by-verse. During the HSR tasks, the subjects sat opposite two speakers (LS11, Logitech Inc., Fremont, USA) and the prerecorded recitation (performed by the same person who recited during the IS and HSP tasks) of the texts was played. The recitations were recorded using an electret condenser microphone (ECIMF8, Sony, Japan) and the open-source software Audacity (http:// audacity.sourceforge.net). In order to improve sound quality, the recorded recitations were denoised, normalized, and equalized by Audacity. The recorded recitation was played at such a volume that at the position of the subject (directly in front of his face), the sound level was 65 dBA on average, which was measured using a sound meter (320, Voltcraft, Hirschau, Germany). Also during the HSP condition, the sound level was 65 dBA on average.
The OxiplexTS optodes were placed on the left and right side of the forehead over the prefrontal cortex (PFC) at Fp1-F3/Fp2-F4 according to the international 10-20 system. 10 The PFC was chosen because it constitutes a region of the brain that has been demonstrated to be activated in language processing, 11 speech production, 12 semantic processing, and phonological/lexical processing, 13,14 and also easily allows for fNIRS measurements with a high signal-to-noise ratio due to the absence of absorbing hair in this region of the head.
The P ET CO 2 probe was positioned directly below the right nostril of the subject. A visualization of the placement of the NIRS optode, the CO 2 sensor, and a schematic representation of the NIRS optode can be found in Fig. 1. For a detailed description of the ISS OxiplexTS frequency-domain NIR spectrometer specifications, please refer to Ref. 5. Absolute values of StO 2 , ½O 2 Hb, [HHb], and [tHb] were calculated by the OxiplexTS software using the frequency-domain multidistance (FDMD) method, which enables sensitivity to the extracerebral tissue to be reduced. 15,16 This methodological approach is superior to the traditional modified Beer-Lambert method in excluding extracranial hemodynamic changes. 17 The FDMD method calculates the following slopes: (1) the amplitude slope (S AC ), i.e., lnðd 2 U AC Þ versus d, where d is the distance and U AC is the amplitude of the light intensity, and (2) the phase slope (S φ ), i.e., φ versus d, where φ is the phase shift of the light. From S AC and S φ the absorption coefficient was determined using the equation with v the speed of light in the tissue and ω the angular modulation frequency of the source intensity. Based on the μ a at 690 and 830 nm and using the molar extinction coefficients for O 2 Hb and HHb, the concentrations of O 2 Hb and HHb were calculated in a subsequent step (for the equations, please refer to Ref. 5).

Signal Processing and Statistical Analysis
Signal processing was performed using MATLAB® ( 26 Care was taken to ensure that no artificial new trends were introduced to the signals while applying the algorithm. The P ET CO 2 values were calculated directly from the capnography waveform signal by detecting the peaks of each breath cycle using a method recently developed in-house 27 and by interpolating the peaks by a piecewise cubic interpolation to form a continuous signal. Each time series was segmented into intervals with a length of 3 min each, covering the time intervals 4 to 7, 9 to 12, 14 to 17, 19 to 22, 24 to 27, 29 to 32, 34 to 38, and 39 to 42 min. For further analysis, all signals were downsampled to 5 Hz and the fNIRS-derived signals were low-pass filtered using a moving average filter (window length: 10 s). For all signals, median values of each interval were calculated and normalized by subtracting the median value of the first interval. Statistical analysis was performed using SPSS software (version 20.0, IBM Corp., Armonk, NY). It was tested whether (1) the interval median values have a distribution with a zero median (Wilcoxon test), (2) the changes in the left PFC are different from those in the right PFC (Wilcoxon paired test), (3) the six types of tasks cause significantly different changes (Kruskal-Wallis test), and (4) the two types of recitation texts (hexameter and alliteration) are associated with different changes in the signals (Mann-Whitney-U test). For all tests we calculated the raw p-values and also applied the Benjamini-Hochberg correction 28 to account for the multiple comparison situation.
Additionally, data of the present and two previous own studies with equal design 5,9 were combined. To test for significant relations between changes in cerebral hemodynamics and oxygenation and the different speech tasks, linear regression analyses were calculated with a stepwise procedure for each StO 2 , ½O 2 Hb, [HHb], and [tHb] as the dependent variable and age, gender, body mass index, side of measurement (right or left PFC), baseline P ET CO 2 , and the various speech tasks as the covariates. The changes (during the task and during the recovery period) in StO 2 and [HHb] were statistically significantly different between the right and left PFC (Benjamini-Hochberg corrected) for the HSR-A task. StO 2 generally decreased (and [HHb] increased) more strongly at the right PFC compared to the left PFC, reaching statistically significant changes for the HSR-A task.

Results
The test of different dynamic behaviors of the signals over the right and left PFC when combining all tasks performed in the present study showed statistically significant differences. The comparison of the changes with respect to the two different speech types (hexameter versus alliteration) showed that for the left PFC, statistically significant differences in changes occurred for [tHb] during and after the task, and for ½O 2 Hb, after the task. The right PFC showed no differences; also P ET CO 2 was not different for the two types of tasks.
The test of differences in changes for all tasks and signals revealed only the P ET CO 2 changes in the recovery period were statistically different when comparing HSP-A and HSR-A. When comparing hexameter and alliteration tasks, the changes in [tHb] at the left PFC during tasks and after them were statistically different.  Linear regression analyses revealed several significant relations between changes in ½O 2 Hb, [HHb], [tHb], or StO 2 and the participants' age, the baseline P ET CO 2 , or certain speech tasks (see Table 2).
The participants' age influenced changes in cerebral hemodynamics and oxygenation. With increasing age, the decreases in StO 2 , ½O 2 Hb, and [tHb] during the tasks (interval 4) were less pronounced. This significant relation largely vanished when only participants below the age of 50 were included in the analyses (not shown). We also found that compared to younger subjects, older subjects had a lower ½O 2 Hb and [HHb] value in the baseline period (interval 1).
The recitations of hexameter or alliteration verses aloud had no significant relations with any of the dependent variables, in contrast to mental arithmetic, inner speech, or listening to hexameter or alliteration verses. Hexameter verses affected changes during the tasks (interval 4), while alliteration verses only affected changes during the recovery phase (interval 7). Figure 4, illustrated by the inner recitation tasks, shows that the changes in P ET CO 2 appear quite early after the start of the tasks. In order to explain the observed cerebral hemodynamic and oxygenation changes obtained from the studies involving speech tasks, two major physiological processes should be considered: 5 neurovascular coupling (NC) and CO 2 reactivity (CO 2 R).
NC occurs due to increased neuronal activity leading to an increase in the cerebral metabolic rate of O 2 , resulting in an increase in cerebral blood flow (CBF) and thus volume (CBV). 29,30 This effect causes the following characteristic changes of the fNIRS signals: increase in ½O 2 Hb, decrease in [HHb], and increase in [tHb] and StO 2 . CO 2 R describes the effect of changes in CBF and CBV in response to changes in PaCO 2 , mediated by a cerebral vasoconstriction or vasodilation. Changes in PaCO 2 have a strong and robust effect on cerebral hemodynamics, i.e., an increase in the frequency and/or breathing volume (hyperventilation) causes a decrease in PaCO 2 (hypocapnia), which leads to a reduction in CBF by cerebral vasoconstriction. 31,32 This effect results in the following characteristic changes of the fNIRS signals: decrease in ½O 2 Hb, increase in [HHb], and decrease in [tHb] and StO 2 .
Regarding the results obtained from the present study, the questions arises whether one of the two effects mentioned was prevailing or whether they were caused by a combination of both. The decrease generally found in StO 2 , ½O 2 Hb, and [tHb] as well as the increase in [HHb] during all six interventions may at first glance be interpreted as if the NC was overpowered by the CO 2 R, i.e., a hyperventilation induced hypocapnia, which causes a cerebral vasoconstriction as the main effect. However, our results show that the relationships between changes in P ET CO 2 and hemodynamics and oxygenation are not linear [see Fig. 3(b)]. If the hypocapnia was the sole relevant effect, we would expect linear relationships (at least in the first approximation) as demonstrated by other investigations. 33 But in the present study, a task evoking a significantly higher change in P ET CO 2 compared to another task is not   accompanied by a larger change in tHb [see Fig. 3(b)]. In addition, the response of the cerebral blood vessels to changes in P ET CO 2 is known to be robust and much stronger than other physiological parameters, such as oxygenation or blood pressure. Thus, even a small change in P ET CO 2 , which does not reach significance in our results, may have a relevant and significant effect on cerebral hemodynamics. 34 For both reasons, the observed effects in hemodynamics and oxygenation cannot solely be explained by a hypercapnia. Thus, we consider a combination of NC and CO 2 R. The NC characteristics are taskdependent, i.e., they counteract the CO 2 R to different degrees leading to an apparent nonlinearity between P ET CO 2 and hemodynamics/oxygenation. As already indicated in Ref. 9, it is reasonable that the different speech tasks are associated with different characteristics of brain activity. It is, for example, known that mainly stress [35][36][37] and specific types of cognitive processes (particularly memory retrieval and multitasking) 38 are modulating factors for the activity of the PFC. Thus, we deem it likely that two overlapping and counterbalancing effects (NC and CO 2 R) are causal in our study, where the strength of each effect appears to be task-dependent. For inner speech and heard speech (person) of a hexameter, the CO 2 R was quite different, yet the changes in StO 2 were the same. This indicates that the NC during inner speech was stronger and counteracted the CO 2 R stronger, which therefore resulted in similar changes in StO 2 . We also would expect that inner speech, which includes hearing and reciting the verses, necessitates a more pronounced effort than simply hearing the verses and, consequently, leads to larger brain activation. This is in line with the results of the measurements, and thus, it appears to be reasonable to assume that the effects elicited by speech cannot be explained by the CO 2 R alone. For a visualization of the interplay between NC and CO 2 R during a speech task, please refer to Fig. 4 in Ref. 5. The observation from this figure that P ET CO 2 changes quite early after the start of the tasks indicates that changes in P ET CO 2 can also be elicited during relatively short task intervals-a fact that might be relevant for all experimental protocols involving speech task in general.
The increase of StO 2 and ½O 2 Hb in the postbaseline period after the tasks (i.e., intervals 5 to 8) (see Fig. 2) cannot simply be explained by a change in regional CBF (rCBF) due to relaxation and entering a state of consciousness on the transition to sleep since a reduction of rCBF in the PFC was found during light sleep, 39 which would lead to a decrease in StO 2 and ½O 2 Hb.
The finding that hexameter verses influenced changes during the tasks (interval 4) and alliteration verses only influenced changes during the recovery phase (interval 7) (see Fig. 2) suggests that the different flow of speech of these verses might induce different patterns in brain activity and different dynamics of the interplay between NC and CO 2 R.
The results obtained by the present study are in general agreement with a previous fNIRS speech study. We demonstrated that inner speech (recitation of hexameter and prose) induces a CO 2 R associated with characteristic changes in cerebral hemodynamics and oxygenation. 9 The nonlinearity between P ET CO 2 changes and changes in the fNIRS parameters was also observed and attributed to a task-dependent intensity of the brain activity and thus NC. In an fNIRS study involving reading aloud as a task, Fallgatter et al. 40 observed a decrease in ½O 2 Hb and an increase in [HHb], which is in line with our observation. We are not aware of functional magnetic resonance imaging (fMRI) studies with a similar study design as in our study. Studies investigating language processing and speech production, e.g., Refs. 11 and 12, reported changes in the blood-oxygen-level-dependent (BOLD) signal but not if these changes reflect an actual increase or decrease in hemodynamics/oxygenation. In general, the comparison of fMRI and fNIRS results is also not straight forward since the CO 2 R depends on the specific type of tissue compartment and characteristic (arterial, venous, arterio-venous, small versus large vessel radius), 41 and since fMRI and fNIRI are sensitive to different tissue structures: fNIRS is sensitive to the microvasculature comprising arterioles, venules, and capillaries, i.e., small vessels, 42 whereas fMRI is more sensitive to larger vessels 43 and especially, large draining veins. 44 Because the CO 2 R is especially high in arterioles and venules, 41 fNIRS seems generally to be more sensitive to task-evoked CO 2 changes compared to fMRI.
A novel finding of the present study is that even simply hearing speech causes a weak and often not significant change in P ET CO 2 , partly interfering with NC and thus leading to changes in fNIRS signals that were not expected when not considering CO 2 R and only NC as the cause of the observed changes.
The generally stronger decrease in StO 2 and increase in [HHb] observed at the right PFC compared to the left PFC during the speech tasks may indicate that the task-related NC in the left PFC is stronger than in the right PFC, counterbalancing the CO 2 R effect. This conclusion is in agreement with fMRI findings that overt and inner speech cause a left-hemispheric dominance of activity in the PFC. 45 A higher activity in the left part of the cortex was also observed by fNIRS. 46 A dichotic listening test revealed hemispheric lateralization of speech sound perception, where the largest ½O 2 Hb increase and [HHb] decrease was found in left superior temporal gyrus. Furthermore, Ref. 47 found in an fMRI study involving a word generation task a maximum brain activity in the left hemisphere, mainly in the frontal lobe (Broca's area, premotor cortex, and dorsolateral PFC).
The large intersubject variability of the analyzed signals in our study is in agreement with findings in numerous other fNIRS studies (e.g., Refs. [48][49][50][51]. However, on the group level, fNIRS studies provide reproducible results. 52 When interpreting our results it would also be worth considering systemic changes that might have interfered with the two main effects (NC and CO 2 R): (1) changes in the activity state of the autonomic nervous system (ANS), which have an influence on cerebral hemodynamics and oxygenation, 53-55 and (2) changes in mean blood pressure (MBP). 56,57 However, our fNIRS signals are reasonably immune regarding superficial MBP changes (due to the FDMD method used), 15 and the cerebral autoregulation in healthy adult subjects is expected to reduce the effect of systemic MBP changes on cerebral hemodynamics. However, due to the transient changes in P ET CO 2 during the tasks (see Fig. 4), cerebral autoregulation might not remain unaffected, and thus the fNIRS signals might also contain a component originating from MBP changes. Further studies should investigate this aspect more closely.
In addition to the results of the present study, when analyzing our last three studies (i.e., the present study and the studies reported in Refs. 5 and 9) combined using regression models, we found significant relations between the participants' age as well as baseline (i.e., interval 1) P ET CO 2 63 were in agreement with our results since the observed hypocapnia in our study is associated with a mild cerebral hypoxia. An age-related decrease in resting-state CBF was also shown in studies using arterial spin labeling, 64 positron emission tomography, [65][66][67][68] and single photon emission computed tomography. 69 Regarding the observed significant relation between the baseline P ET CO 2 values and the changes in cerebral hemodynamics and oxygenation during and after the tasks, similar effects were found in other studies. For example, Blockley et al. 70 showed that the amplitude of the fMRI BOLD response depends on the baseline physiological state (hematocrit, oxygen extraction fraction, and CBV) of the subject. Other studies demonstrated that the magnitude of the BOLD response 71,72 and the strength of neural activity 73,74 depend on the PaCO 2 level.
Regarding possible confounding factors in our study design, two factors should be discussed: (1) possible different characteristics and (2) different personal perception of the texts. We controlled for the first factor by choosing texts with similar substance and emotional content. Concerning the second factor, we reduced its influence by measuring a large number of subjects and thus compensating for personal differences.

Implications for Further Research on the Topic
The results obtained in the present study and the combined analysis of our last three studies with regression models allow four implications to be drawn for further research on the topic: (1) The impact of changes in PaCO 2 should be considered in the interpretation of fNIRS studies involving speech tasks, including audible, inner, and even heard speech. (2) The use of capnography in combination with fNIRS is recommended. (3) Signal processing techniques should be developed and applied to distinguish between CO 2 R-related and NCrelated changes in fNIRS signals. (4) In order to investigate the influence of the ANS state on the fNIRS signals, the measurement and analysis of skin conductance and heart rate variability changes during speech tasks might also be important for a proper interpretation of fNIRS signals. In addition, measurements of MBP changes in future studies would contribute to interpreting the fNIRS results.
In conclusion, we found that changes in brain activation and breathing during different speech tasks affected cerebral hemodynamics and oxygenation. We showed that inner speech causes changes in PaCO 2 , which have an impact on cerebral hemodynamics and oxygenation. A new finding is that even during heard speech, a CO 2 R takes place, leading to characteristic changes in cerebral hemodynamics and oxygenation. In the left PFC, we found a significant difference between the alliteration and hexameter verses. Our analysis also showed that hexameter verses influenced changes during the tasks, while alliteration verses influenced changes only during the recovery phase, indicating that the two different types of verses seem to evoke different physiological reactions. Furthermore, we found significant relations between changes in ½O 2 Hb, [HHb], [tHb], or StO 2 and the participants' age, the baseline P ET CO 2 , or certain speech tasks.
To the best of our knowledge, the present study is the first to investigate the impact of PaCO 2 changes on cerebral hemodynamics and oxygenation during speech listening tasks measured with fNIRS and capnography simultaneously. We highlight that the measurement of P ET CO 2 during functional speech tasks appears to be an important parameter for reliable and correct interpretation when using fNIRS. Thus, we recommend that P ET CO 2 changes be measured in future fNIRS and possibly also fMRI neuroscientific studies involving speech tasks.