Near-infrared spectroscopy (NIRS) is a noninvasive optical imaging technique that measures changes in the oxygenation of brain tissue, i.e., changes in regional cerebral blood flow (CBF) and cerebral oxygenation. NIRS has been used to monitor changes in cerebral oxygenation in a variety of motor, cognitive, and perceptual tasks,1, 2, 3, 4 and there is growing consensus that NIRS data are consistent with those obtained with other brain imaging techniques, such as positron emission topography (PET) and functional magnetic resonance imaging.5, 6, 7 Compared to these techniques, a number of factors add to NIRS’ promise as a clinical tool: it is relatively low cost, the equipment is mobile, and measurements are fairly tolerable to bodily movements of participants. A number of NIRS studies have indeed been performed in a clinical setting with premature or medically at-risk infants and adults.8, 9, 10
In the present study we used NIRS to monitor cortical activation in response to auditory stimuli consisting of a series of pure tones alternating in frequency. In experiment 1, listeners judged the rhythm of the series while NIRS measurements were made. In experiment 2, they either judged the rhythm of the stimuli or listened to them passively during measurements. The stimuli were so-called auditory streaming stimuli,11, 12, 13 which typically consist of repetitive and alternating high (H) and low (L) pure tones of 100 in duration (Fig. 1 ). When the frequency difference between the tones is smaller than about a quarter of an octave, a single, integrated “stream” of tones can be heard (e.g., H-L-H-L-H-L, etc.). When the frequency difference between H-L tones increases, however, the series splits into two streams (H-H-H- and L-L-L-). This often occurs when the frequency difference exceeds half an octave.12 Ambiguous percepts are often reported with frequency differences in between one quarter and half an octave between H and L.
To further explore future clinical applications of NIRS, the streaming stimuli were used for the following purposes. In the auditory domain, NIRS has been successful in assessing brain responses to speech, both in developing brains14, 15, 16 and adult brains.17 Only a few NIRS studies exist, however, on brain activity in response to less complex sounds.18 As a first aim, we therefore explored whether changes in the stimuli’s frequency and perceived rhythm would be accompanied by spatiotemporal changes in cortical hemodynamics in areas that potentially respond to these features. If so, assessing cortical functioning and development of listeners who do not (yet) possess a sufficiently developed language system, such as children, might be feasible with the use of nonverbal stimuli as well.
Second, we tested whether the listener’s attentional engagement to the stimuli, i.e., active-response listening by requirement of rhythm judgment, would affect cortical hemodynamic activity differently than passive listening to the stimuli. The bulk of NIRS studies on auditory functioning described so far concerns the latter listening mode. With the use of other imaging techniques, it has been shown that active-response listening results in increased cortical activity compared to passive listening, also in areas outside the auditory cortex.19, 20, 21 With NIRS, enhancing effects of attentional engagement (alertness) have been shown so far in the visual domain, with tasks that required a psychophysical response such as a visual reaction task22 and a visual search task.23 Whether attention modulates hemodynamic activity levels also in the auditory domain when a psychophysical response is required has not been investigated yet. NIRS’ usefulness as a clinical tool, however, would be greatly enhanced if it could visualize whether a listener is actually attending to a stimulus rather than merely receiving it in an otherwise conscious state, especially when the listener is unable to verbalize what he/she perceives or is attending to.
Seven students and researchers of cognitive psychology, three females and four males, participated in experiment 1. Three additional participants (three males) joined experiment 2. The participants were of age, right-handed, and had normal hearing. Informed consent was obtained from all participants after an explanation about the workings of the NIRS equipment and the procedure of each experiment. The latter was approved by the Ethics Committee of Kanazawa University Hospital and followed the Declaration of Helsinki.
Stimuli and Design
Figure 1 schematically shows the three stimulus types used in experiment 1. The stimuli consisted of pure tones of in duration. The tones had a rise and fall time of with cosine-shaped ramps. The tones were grouped in H-L-H or in L-H-L triplets, with tone H having a higher frequency than tone L. The pause in between the tones within a triplet was , whereas the pause in between each triplet was . Including the pauses, each triplet thus had a duration of . A single stimulus condition lasted and consisted of 20 triplets presented in succession. Because of the difference between the “within” and “between” triplet pause duration, each stimulus condition could potentially have a galloping rhythm.
As described by Van Noorden,12 whether a galloping rhythm in auditory streaming stimuli is actually perceived depends on the frequency separation between the H-L tones. The frequency separation between H-L in the present study was calculated by using a measure of the equivalent rectangular bandwidth (ERB) from the stimuli’s center frequency. The ERB is an approximation of an auditory filter and is often used to express a frequency separation between two sounds [e.g., Ref. 24]. Here we used , following Glasberg and Moore,25 with cf denoting the center frequency in between H-L.
Three different frequency separations were used in experiment 1: a gallop condition, a two-streams condition, and an ambiguous condition. In the gallop condition, the frequency separation between the H-L tones was 0.2 ERB of the center frequency. In the ambiguous and two-streams conditions, the frequency separation was 2 and 6 ERBs, respectively. The values of 0.2, 2, and 6 ERB correspond roughly to frequency separations of 0.5, 4.5, and 14.5 semitones, and should facilitate the perception of gallop, an ambiguous rhythm, and two streams, respectively, according to Van Noorden.12
The three ERB values were calculated from a total of five different center frequencies of 600, 800, 1000, 1200, and . As an example, in a stimulus with a center frequency of and a frequency separation of 2 ERB , the high tones H had a frequency of , and the low tones L had a frequency of . The main reason for having five different center frequencies was to make the stimuli not too monotone for the participants and to counteract possible repetition effects on the oxy-Hb and deoxy-Hb changes, which are known to plateau or even become less pronounced when a sound is presented with a repetition rate over .26 The main reason for having center frequencies in the range is that tones within this frequency range are generally perceived as equally loud (see for example Equal Loudness Contours in Ref. 27).
Psychophysical Task and Procedure
In a darkened room, the listener was seated in front of the black screen of a computer (Epson Endeavor NJ1000), which was used to present the sound stimuli and for gathering the psychophysical data. The sound stimuli were presented through two speakers (Dell A215) placed behind the computer screen. The level of the stimuli was on average, as measured with a Rion (Tokyo, Japan) NL-32 sound level meter. The background noise was on average and mainly came from the NIRS equipment. Before the start of the experiment the listener was given random examples of the stimuli and asked whether he/she could distinguish a galloping rhythm in some of the stimuli. All listeners responded that they could. For the actual experiment, the listener was instructed to judge the rhythm of the stimuli (active-response listening) by pressing one of two buttons on the computer’s keyboard: one for stimuli heard as galloping and the other for stimuli with a different rhythm. Judgments had to be made within after the end of each stimulus. The listener was instructed to try to limit bodily movements to those necessary for key pressing, and to maintain a stable head position with the use of a head-and-chin rest.
In experiment 1, 12 blocks of stimuli were presented, with each block regarded as a single session. Each block consisted of 15 sound stimuli, with the three stimulus types (gallop, ambiguous, and two streams) represented in five stimuli, one for each center frequency (600, 800, 1000, 1200, and ). For practical reasons, the order of presentation within each block was pseudorandomized in that a certain stimulus type was never presented twice in succession. The stimuli’s center frequency, though, was completely randomized. In total, 60 judgments were obtained for each of the three stimulus types. The many repetitions were given to obtain a relatively stable average of oxy- and deoxy-Hb changes in each individual brain.
In experiment 2, only the ambiguous stimuli were used. The listeners performed four blocks of 15 sound stimuli. In two blocks, similar to experiment 1, the listener was asked to judge the rhythm of each stimulus by button pressing (active-response listening condition). In the other two blocks, the listener was asked to passively listen to the sound stimuli and randomly press one of the two buttons after the end of each stimulus. During both active-response listening and passive listening, the listener was asked to keep his/her eyes open.
Near-Infrared Spectroscopy Measurements
NIRS measurements (for details, see Ref. 28) were made with a continuous wave system (ETG-4000, Hitachi Medical Company, Japan ) with two optode probe sets. Each set consisted of five light emitters and four photodetectors placed in a silicone rubber frame, comprising 12 channels in total. Oxy-, deoxy-, and total (oxy–deoxy) Hb values were obtained from channels 1 through 12 covering the left hemisphere and channels 13 through 24 covering the right hemisphere. The light emitted by the NIRS system had wavelengths of 695 and (each ), and the frequency was modulated for wavelength and the number of channels. The unabsorbed light that left the brain was received by the photodetectors and amplified for each particular frequency. Because the optical path length cannot be measured by a continuous wave system, as used here, from here on the (de)oxy-Hb changes as measured are indicated by the scale unit (molar concentration times the unknown path length). The measurable depth with the interoptode distance was beneath the scalp, following Hoshi 29
Measurements were made over the right and left frontotemporal areas of the listener’s brain. Epochs of per stimulus were recorded, including for the rest period, with a sampling rate of . Each set of 12 channels was symmetrically placed on one side of the brain, with channels 1 through 12 covering the left hemisphere and channels 13 through 24 covering the right hemisphere. Following the international 10-20 system for EEG (Ref. 30; see Ref. 31, for correspondence with NIRS measurements), the penultimate posterior optode row was placed on the imaginary line connecting the electrode positions C3-T3 for the left hemisphere and C4-T4 for the right hemisphere (shown later).
Both sets subtended a fixed distance of centered around electrode position , with channels 2, 4, 5, and 7 approximately surrounding motor area C3 (left hemisphere), and channels 13, 15, 16, and 18 covering motor area C4 (right hemisphere). We opted to cover the motor areas, because studies have shown that the classical auditory system of the temporal lobe (i.e., areas T3 and T4) does not play a major role in rhythm perception.32 Rather, the areas involved in rhythm perception are those involved in stimulus prediction (the lateral and mesial premotor areas), as well as areas involved in motor activity and preparation, among others.33, 34, 35, 36 Activity related to button pressing, required after stimulus end, was expected in (pre)motor areas as well.
The raw oxy- and deoxy-Hb data were digitally low-pass filtered at to remove measurement artifacts, i.e., abrupt value changes that could have occurred because of bodily movements of the listener. After baseline correction, for each of the 24 channels, the oxy- and deoxy-Hb data were averaged over four time windows of (during stimulus) and over four time windows of after stimulus end (poststimulus). A similar division of the time scale has been performed in other NIRS studies (e.g., Ref. 18).
As a first analysis, we compared the averaged data for each time window, both during and poststimulus, with the averaged (de)oxy-Hb values of a baseline before the start of each stimulus. For each participant, stimulus condition, center frequency, and channel, we subtracted the mean (de)oxy-Hb values for the baseline from those obtained during each of the four during-stimulus and the four poststimulus windows. We used the results to check for a significant change in (de)oxy-Hb by means of multiple t-tests against zero. The Bonferroni correction was applied to account for the number of comparisons per condition and window (24 being the number of channels). This lowered the alpha-level to
Subsequently, the stimulus minus baseline (de)oxy-Hb values were subjected to three-way analyses of variance (ANOVAs) with repeated measures (participants and center frequency). The first two ANOVAs were on the oxy-Hb data and on the deoxy-Hb data obtained during stimulus presentation, with stimulus condition (3), time window (4), and hemisphere (2) as main factors. Values for the latter were obtained by averaging the (de)oxy-Hb values relative to the baseline for the 12 channels on the left hemisphere (LH) and the 12 channels on the right hemisphere (RH). The poststimulus (de)oxy-Hb data were subjected to the same three-way ANOVAs. Posthoc tests were performed with Tukey honestly significant difference (HSD) tests .
Besides the global time course of (de)oxy-Hb throughout stimulus presentation and judgment by means of time windows, the temporal aspects of the oxy-Hb data were further analyzed by specifying the average point in time where oxy-Hb reached its peak for each stimulus condition over both hemispheres. Both a during-stimulus and a poststimulus two-way ANOVA was performed for the data of experiments 1 and 2.
Figure 2 shows the results of the two alternative forced choice (2AFC) task of experiment 1. As can be derived from the 95% confidence intervals, the 0.2-ERB conditions resulted in significantly more galloping percepts than the 2-ERB and 6-ERB conditions. The 6-ERB conditions caused significantly more two-streams percepts than the ambiguous 2-ERB conditions. A two-way ANOVA with repeated measures (7 sessions) and posthoc Tukey tests confirmed the existence of significant differences between all the conditions [ , ]. In experiment 2, only 2-ERB stimuli were used. The average proportion of two-streams percepts for these stimuli was 0.62 , showing that the listeners heard the stimuli as having an ambiguous rhythm in this experiment as well.
(De)oxygenated Hemoglobin—Temporal Characteristics
Figure 3 shows the average time courses of the oxy-Hb changes obtained in experiment 1 in response to the three stimulus types. The figure shows that oxy-Hb slowly increased after the start of a stimulus, reached a first peak, on average at over LH and at over RH after stimulus onset, and gradually decreased as the stimulus reached its end. Two-way ANOVA showed that the difference in the temporal peak of oxy-Hb between LH and RH was significant [ , ]. The oxy-Hb peak during stimulus presentation, however, did not differ in time of occurrence for the 0.2-, 2-, and 6-ERB conditions .
With regard to the overall time course of the oxy-Hb changes, three-way ANOVA showed a significant main effect of time window for oxy-Hb during stimulus [ , ]. Posthoc tests showed that oxy-Hb was significantly higher during windows 2 and 3 ( after stimulus onset) as compared to window 1 . Oxy-Hb during window 3 was also significantly higher than during window 4 .
A second increase in oxy-Hb occurred after stimulus end, when the listeners made the rhythm judgment. This oxy-Hb increase peaked on average at over LH and at over RH after stimulus end. Again, this difference was significant, with oxy-Hb peaking faster over RH than LH this time [ , ]. Here too, stimulus condition had no influence on the speed with which oxy-Hb peaked . Three-way ANOVA showed a significant main effect of window [ , ]. Posthoc tests showed that oxy-Hb was significantly higher in window 7 ( after stimulus offset) as compared to windows 5 and 6 ( after stimulus offset) and window 8 ( after stimulus offset). Oxy-Hb in window 6 ( after stimulus offset) was higher than that in window 5 ( after stimulus offset).
In experiment 1, the effect of window on deoxy-Hb was not significant during stimulus , but was significant for the poststimulus period [ , ]. Posthoc tests showed that average deoxy-Hb was significantly higher during window 8 ( after stimulus offset) compared to window 7 ( after stimulus offset).
Figure 4 shows the average time courses of the oxy-Hb changes obtained in experiment 2 in the active-response and passive listening conditions. In experiment 2, oxy-Hb during stimulus peaked on average at 4.95 over LH and at 4.93 over RH. This difference was not significant . With regard to the overall time course of oxy-Hb, three-way ANOVA revealed a significant main effect of window [ , ], with posthoc tests showing that oxy-Hb was higher during window 2 ( after stimulus onset) than during window 1 . After stimulus end, oxy-Hb peaked again on average at 5.97 over LH and at 5.47 over RH after stimulus end, which was significant [ , ]. The effect of window was also significant for the poststimulus three-way ANOVA of experiment 2 [ , ]. Here window 7 ( after stimulus offset) showed higher oxy-Hb than windows 5 and 8 (0 to 2.4 and after stimulus offset). Window 6 ( after stimulus offset) also showed higher oxy-Hb than window 5 ( after stimulus offset). For deoxy-Hb, no significant effect of window appeared in experiment 2.
(De)oxygenated Hemoglobin—Spatial Characteristics
Figure 5 shows the results of the t-tests ( , with Bonferroni correction) against zero for both experiments 1 and 2. The upper panel shows that significant oxy-Hb changes in experiment 1 occurred most often over channel 4, followed by channel 7 and 10 on LH. Channels 4 and 7, along with channels 2 and 5, approximately surrounded motor area C3; channel 10 is located inferior to channel 5. Most significant oxy-Hb changes on RH were found over channels 13, 16, 18, and 20. The first three channels, along with channel 15 approximately surrounded motor area C4; channel 20 is located inferior to channel 15. Significant deoxy-Hb changes occurred less frequently, but could be found generally over the same areas as those over which significant oxy-Hb changes appeared.
Figure 5 suggests that generally more channels signaled a significant hemodynamic change over RH as compared to LH. In experiment 1, a significant effect of hemisphere was found in the during-stimulus ANOVA for oxy-Hb [ , ] and a significant hemisphere by window interaction [ , ]. Posthoc tests showed that oxy-Hb was larger in RH than LH for time windows 3 and 4 (the last of the stimulus). No effect of hemisphere on oxy-Hb was found in experiment 2. Hemispheric differences with regard to deoxy-Hb were also not found in experiments 1 and 2.
(De)oxygenated Hemoglobin—Perceived Rhythm
Neither the during-stimulus nor the poststimulus ANOVA showed a significant effect of frequency separation on oxy-Hb in experiment 1. Deoxy-Hb also did not vary with stimulus condition ( and for the during-stimulus and poststimulus ANOVA, respectively). The significant influence of frequency separation on perceived rhythm, as found in the behavioral data of experiment 1 (Fig. 2), thus was not accompanied by significant changes in (de)oxy-Hb values.
(De)oxygenated Hemoglobin—Active Versus Passive Listening
Figure 6 show the oxy-Hb levels as obtained in experiment 2 during active-response listening and passive listening. The figure was made through MRI-fusion software (Shimadzu, Kyoto, Japan), with the oxy-Hb data plotted over the brain of an average-sized male head. NIRS optode positions (both emitters and receivers) were as during measurements and, along with nasion, and ear references, obtained with a 3-D digitizer (Fastrak, Polhemus, Incorporated, Colchester, Vermont) after the experiment. The during-stimulus ANOVA showed a significant main effect of listening mode [ , ], as well as a significant three-way interaction of listening mode by hemisphere by window [ , ]. Posthoc tests revealed that active listening caused significantly larger oxy-Hb changes over the left hemisphere during the final of the stimulus. The poststimulus ANOVA also showed a significant effect of listening mode [ , ]. Here too, the three-way interaction was significant [ , ]. A significant difference in oxy-Hb for active and passive listening was not found over the right hemisphere during the final of the stimulus—other windows showed significantly higher oxy-Hb during active listening. An effect of listening mode was not found for deoxy-Hb values, both during and after stimulus presentation of experiment 2.
Discussion and Conclusions
In the present study, we explored whether cortical hemodynamics related to the perception and judgment of sound stimuli consisting of simple, rhythmic tones could be visualized with NIRS. With our (potentially) rhythmic stimuli we mainly targeted the (pre)motor areas of the brain, because these are known to be more involved in rhythm perception than the primary auditory areas32 and are likely to mediate the actual act of button pressing, through which listeners were asked to judge the rhythm of the stimuli. The present study showed that hemodynamic response patterns were typical to those in response to sensory stimulation in general with regard to their time course.29 Oxy-Hb levels increased gradually, peaked during stimulus, decreased, and picked up again after the stimulus and decreased again. Deoxy-Hb levels in general dipped slightly below zero and stayed relatively flat after stimulus onset.
Oxy-Hb levels during stimulus presentation were partly lateralized, with increased activity mainly over (pre)motor areas of RH during the last of the stimulus in experiment 1. This is in line with earlier research showing that these areas respond to rhythmic, unstressed sounds,37 which the stimuli basically are. It has to be noted, though, that the LH-RH difference did not appear in experiment 2 in the active-listening condition as used in experiment 1. Furthermore, the temporal peak of the oxy-Hb levels occurred faster over LH than RH in experiment 1. We further expected lateralized hemodynamic changes in response to the button pressing after stimulus end. Contralateral hemispheric dominance has been reported for a number of simple motor tasks and handedness.38, 39 Because listeners made their judgment by pressing a button with their right hand, one thus can expect relatively larger hemodynamic changes over the (pre)motor areas of LH. Rather surprisingly, though, these were not found. Oxy-Hb in the active-response listening mode of experiment 2 in particular was unexpectedly high over RH after stimulus end. We have no plausible explanation for this.
We also did not observe an effect of frequency separation on hemodynamics, neither during nor after stimulus presentation. Although the behavioral data showed that changes in the sounds’ frequency separation had significant effects on perceived rhythm, different rhythm percepts were not accompanied by significant differences in (de)oxy-Hb. It is possible that such differences become apparent when primary auditory areas (T3 and T4) are targeted as well. Studies have shown that the actual process of stream segregation is reflected in primary cortical activity,40, 41 although nonprimary auditory areas also seem involved.42 More specifically, the perception of two segregated streams is accompanied by activity that is more sustained and larger in magnitude than that reflecting the perception of a single stream.43 Future NIRS research is necessary to remedy the present study’s limitations with regard to cortical areas covered. The oxy-Hb levels also did not differ between the three stimulus conditions with regard to their temporal peak after stimulus onset. Perceptually, the two-streams percept takes several seconds to build up, whereas the galloping rhythm is more quickly established.13 Clearly though, hemodynamic response patterns do not reflect this buildup, if theoretically possible.
Experiment 2 showed that hemodynamic activity significantly increased during active-response listening as compared to passive listening. The main effect of listening mode occurred both during and after stimulus presentation. The enhancing effect of active-response listening was significant over LH, except for the first of the stimulus. After the stimulus, the enhancing effect occurred over both hemispheres. The increase was not merely due to motor activity connected to button pressing, since such a response was also required in the passive listening condition. Other studies have also shown that sustained and/or selective attention necessary during active listening has a widespread influence on neural activity. Auditory attention is known to cause increased activity not only over the auditory cortex,44, 45 but also over a broad range of other cortical areas, including frontal, prefrontal, parietal and supplementary motor areas.46, 47
In view of the enhancing effect of active-response listening, future NIRS research might address possible effects of attention on auditory streaming with more specific listening instructions. It is assumed that the frequency separation between the tones in auditory streaming stimuli may not be the only catalyst of different rhythm percepts. A number of studies have suggested that focused attention to either the low or high tones can facilitate the buildup of the two-streams percept and the integration of successive tones within the streams.48, 49 Within the active listening mode, one could ask listeners to adopt such analytical listening. Experiment 1 indicated that the perception of different rhythms does not cause significant differences between hemodynamic response patterns, nor in oxy-Hb peak latency. However, the effort of focused attention, i.e., analytical listening to either the high or low tones, might bring out different hemodynamic response patterns. Neuromagnetic data obtained during analytical listening have suggested that focused attention enhanced cortical responses in addition to physical manipulations of frequency separation.42 Furthermore, in view of the fact that listeners can often exert control over percepts in the ambiguous stimulus,12 asking listeners to engage in active switching between rhythm percepts might cause different hemodynamic patterns as well.
This study was supported by the COE program Innovative Brain Science for Development, Learning and Memory of Kanazawa University and grants from the Ishikawa High-Tech Sensing Cluster (Knowledge Cluster Initiative from the Japanese Ministry of Education, Culture, Sports, Science and Technology). We thank Koichiro Miyaji and Shuichiro Taya for their technical assistance, and two anonymous reviewers for their help with the manuscript.