Neuroimaging evidence of visual-vestibular interaction accounting for perceptual mislocalization induced by head rotation

Abstract. Significance A fleeting flash aligned vertically with an object remaining stationary in the head-centered space would be perceived as lagging behind the object during the observer’s horizontal head rotation. This perceptual mislocalization is an illusion named head-rotation-induced flash-lag effect (hFLE). While many studies have investigated the neural mechanism of the classical visual FLE, the hFLE has been hardly investigated. Aim We measured the cortical activity corresponding to the hFLE on participants experiencing passive head rotations using functional near-infrared spectroscopy. Approach Participants were asked to judge the relative position of a flash to a fixed reference while being horizontally rotated or staying static in a swivel chair. Meanwhile, functional near-infrared spectroscopy signals were recorded in temporal-parietal areas. The flash duration was manipulated to provide control conditions. Results Brain activity specific to the hFLE was found around the right middle/inferior temporal gyri, and bilateral supramarginal gyri and superior temporal gyri areas. The activation was positively correlated with the rotation velocity of the participant around the supramarginal gyrus and negatively related to the hFLE intensity around the middle temporal gyrus. Conclusions These results suggest that the mechanism underlying the hFLE involves multiple aspects of visual-vestibular interactions including the processing of multisensory conflicts mediated by the temporoparietal junction and the modulation of vestibular signals on object position perception in the human middle temporal complex.


Introduction
Motion can distort the position perception of objects.2][3] One of the best-known and most-studied motioninduced mislocalizations is the flash-lag effect (FLE), [4][5][6] where a flash would be perceived as lagging behind a moving object even if they are actually aligned.In addition to the classical FLE, Schlag et al. 7 found an extension of the FLE without stimulation of visual motion: a flash would be perceived as lagging behind an object that remains in one spot relative to the head when the observer is making head rotations.To account for this novel head-rotation-induced FLE (herein abbreviated as hFLE), Schlag et al. followed the "motion extrapolation" hypothesis proposed by Nijhawan 5 that the neural system tends to predict the position of a moving object to compensate for the delay caused by the transmission and processing of neural signals.Although there was no retinal motion in this hFLE, Schlag et al. proposed that extraretinal motion could also be extrapolated.
As to the neural origin for the FLE, Maus et al. 8 and Wang et al. 9 proved the causal role of human middle temporal complex (hMT+) in FLE with TMS and tDCS, respectively.It is well known that hMT+ plays an essential role in forming the perception of visual motion and object position.This area not only represents the physical motion or position but also encodes implied motion, such as the motion aftereffect, [10][11][12][13][14] and the perceived position of the observer. 15,16][19] By contrast, to our knowledge, the brain activity in the hFLE has not been investigated.As a multisensory illusion, it can be foreseen that the hFLE involves even more complicated processing than its classical visual version, but still surprisingly little is known about it.Elucidating the mechanism underlying the hFLE may shed light on the complex visual-vestibular interactions that occur every day in natural viewing conditions.][22][23] In previous work, we pointed out that the motion extrapolation hypothesis could not provide an adequate explanation of the forming mechanism of hFLE. 24Instead, the account of visualvestibular interaction was more preferred.We notice that vestibular stimulations [25][26][27] and visualvestibular integration 28 can also activate hMT+.Moreover, hMT+ is functionally connected to several regions in the vestibular cortical network. 29,30Considering the contribution of hMT+ to the classical FLE and the functions of hMT+ concerning both motion processing and visualvestibular integration, we hypothesized that hMT+ would be activated when observer experienced the hFLE.
To test this hypothesis, one should seek an appropriate neuroimaging tool first.Unfortunately, this is considerably challenging owing to the technical limitations for common tools (e.g., fMRI and EEG).Compared with fMRI and EEG that are both susceptible to motion artifacts, functional near-infrared spectroscopy (fNIRS) is more tolerant to motion and has thus been broadly employed in research on exercises, under natural scenes, or with special participants like infants.In this study, we measured with fNIRS the cortical activity around hMT+ of the participants who performed horizontal head rotations to experience the hFLE.

Participants
Twenty-eight participants (11 males and 17 females) were recruited for the research, whose ages ranged from 19 to 32 years (M AE SD ¼ 25 AE 3 years).The sample size was determined using G*Power 31 after a pilot experiment with 11 participants.All the participants had normal or corrected-to-normal visual acuity and reported no history of motion sickness [including simulator sickness with VR or three-dimensional (3D) display] nor dysfunction of visual or vestibular system.
This study protocol was approved by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences.All the participants gave their informed consent and were paid after the experiment.

Apparatus
The experiment program was written and run with MATLAB R2013a (MathWorks Inc.) and Psychtoolbox v3.0.12 32 on a Dell XPS 8700 PC (Dell Inc.), and the visual stimuli were presented to the participants on a Sony HMZ-T3 head-mounted goggle (Sony, Japan) with a field of view of 50 (horizontal) × 28 (vertical) degrees (deg) of visual angle, resolution of 1280 × 720 pixels, and refresh rate of 60 Hz.The goggles were worn outside the fNIRS headcap.Participants sat on a barber's chair to achieve rotation with their feet off the ground, and a TSS-WL 3-Space wireless motion sensor (Yost Labs Inc.) was fixed to one arm of the chair.
The fNIRS recording was implemented by a LABNIRS continuous-wave fNIRS system (Shimadzu Inc., Japan) with the near-infrared wavelengths of 780, 805, and 830 mm.Optode localization was carried out with a Fastrak 3D tracking and digitizing system (Polhemus Inc.).

fNIRS Acquisition
4][35] On each side was a 3 × 3 square layout with five emitters and four detectors yielding 12 channels [Fig.1(a)].The raw sampling rate of fNIRS data was 55.6 Hz.After each participant completed the main task, a localization was carried out for each optode as well as for anatomical landmarks of the vertex (Cz), nasion (Nz), and bilateral preauricular points (AL/AR).7][38] The average locations of the channels on the brain template are shown in Fig. 1(b).The cortical subdivisions they fell on were then estimated based on the LONI Probabilistic Brain Atlas (LPBA40) 39 and Brodmann areas (BA).

Stimuli and Procedure
We assumed that the relevant brain activation during the hFLE was the result of highly independent components caused by three types of neural signals, namely: (i) response to visual stimulus (i.e., the flash bar), (ii) response to vestibular stimulus (i.e., the horizontal body rotation), and (iii) the signal of visual-vestibular interaction leading to the hFLE.Based on this assumption, we designed five experimental conditions to decompose the brain activation during the hFLE.

Short flash + moving (sF+M)
In light of previous findings where passive movement could also induce the hFLE, 40 in this study, participants were seated in a swivel barber's chair with their feet off the ground.The horizontal rotation was performed by the experimenter to reduce the influence of cervical movement on the fNIRS recording.The experimenter received a tone cue through earphones to start or stop the rotation.The display on the LCD monitor was identical to that on the goggles, allowing the experimenter to read the instruction.
Throughout the whole experimental session, the background of the display was pure black with a centered, red fixation point with 0.15 deg diameter.Before each block started, two lines of texts displayed the number of upcoming blocks and an instruction "Rotate!" to remind participants that they needed to rotate during the block.After participants were ready with their feet off the ground, the experimenter pressed a key to start the block.For the first 5 s, only a central reference bar and the fixation point were presented while a countdown with beep tones through earphones reminded the experimenter to prepare to rotate the chair.The reference bar was white with a size of 5 deg (height) × 0.5 deg (width).
The experimenter rotated the chair once per trial based on the auditory cues from the earphones.A high tone (1500 Hz) indicated starting the rotation, and a low tone (1000 Hz) indicated ceasing, each lasting for 50 ms.Each rotation lasted for 2 s with 1-s intertrial intervals (Fig. 2).The experimenter purposely controlled the ends of rotation trajectories on approximately the same positions to keep the rotation amplitude roughly constant across trials.Each block consisted of 15 trials with alternating rotating directions between leftward and rightward.C).The participant was asked to judge whether the flash bar was to the left or the right of the reference bar although the two bars were actually vertically aligned.The rotation was stopped after the low-pitch tone, which was 2 s later than the high-pitch tone.These auditory cues were played to the experimenter through earphones, so the participant could not hear them.The figure only showed the case where the participant was rotated leftward, but the rotating direction alternated between trials actually.All the distances and sizes of the visual stimuli in the figure are for illustration purpose only and not in real scale.
After a random delay ranging from 0.6 to 1.2 s starting from the beginning of each rotation, a flash bar identical to the reference bar would appear 5 deg right above the reference bar for 1 frame (16.7 ms) and then vanish.Although the flash bar and target bar were vertically aligned physically, the participant was informed in advance that there was always a slight misalignment between the two bars and required to perform a 2AFC task discriminating whether the flash bar was to the left or the right of the reference bar by pressing the left or the right arrow key while maintaining fixation at the central fixation point.The reference bar remained centered on the display throughout each block.
The mean velocity across 0.5 s right before each flash onset was recorded as the rotation velocity.The average rotation velocities for conditions with rotation (i.e., conditions A, C, and E) are listed in Table 1.

Short flash + static (sF+S)
Condition B was designed to measure the brain activation caused by the visual stimulus alone in condition A. Therefore, participants in condition B were not rotated.The other procedures and tasks were the same as those in condition A, except that the instruction before the block read "Do NOT rotate!" instead.

Long flash + moving (lF+M)
2][43][44] Furthermore, the upper limit of the flash duration that could induce the FLE was 80 to 500 ms. 41,42,44,45Thus, in condition C we lengthened the duration of the flash bar to 1 s.Hopefully, this modification might suppress the hFLE while maintaining similar visual stimuli to those in condition A. In terms of timing, the random delay before the flash was altered to 0.3 to 0.8 s (Fig. 2).Note that the rotation velocity recorded in condition C was lower than that in condition A (see Table 1) due to this shorter delay.The other procedures and tasks were identical to those in condition A, and participants were required to make responses after the flash bar disappeared.

Long flash + static (lF+S)
Condition D was almost the same as condition C except that participants in this condition were not rotated.The instruction before the block also read "Do NOT rotate!" instead of "Rotate!"

No flash + moving (nF+M)
Condition E served as a control to measure the brain activation caused solely by the self-rotation.Therefore, the flash bar was not presented and participants did not have to make any response.
All the participant had to do was to keep gazing at the fixation point.The other procedures were the same as those in condition A. In particular, the program sent a trigger to the LABNIRS system to mark a virtual "flash-onset" event at a random time point of each trial to maintain a consistent data structure with the other conditions for the fNIRS data analyses.
The conditions are summarized and compared in Fig. 3 and Table 1.Each condition consisted of five blocks, resulting in a total of 25 blocks for each participant, with the order randomized according to a Latin square design.Participants received a minimum of 15-s rest between blocks during which they could take a break without large head or body movements.

Behavioral Data Analysis
For conditions A and C, the proportion of responses that were in the opposite direction to selfrotation (e.g., reporting the flash bar to the left of the reference bar when rotating rightward and vice versa) was calculated to indicate the intensity of hFLE.If this proportion was significantly higher than 50%, it would indicate an hFLE.As for conditions B and D where the participants did not rotate, we simply calculated the proportion of responses reporting the flash bar to the left to measure response bias.A 2 × 2 repeated-measures ANOVA (short/long flash × moving/static) was then conducted on these response proportions to compare the intensity of hFLE across different conditions.

fNIRS Data Analysis
Both the absorbance (ABS) data of each wavelength and the relative changes in concentration of oxygenated, deoxygenated, and total hemoglobin (denoted as [HbO], [HbR], and [tHb], respectively) were obtained from the fNIRS system.These data were processed and analyzed using MATLAB 2017a (MathWorks Inc.) with the NIRS-KIT 46 and SPM12 47 packages.

Pre-processing
The pre-processing consisted of several steps.Figure 4 demonstrates the pre-processing pipeline step by step with an instance.
(I) Downsampling: We first downsampled the data from the sampling period of 18 to 54 ms (i.e., sampling rate from 55.56 to 18.52 Hz) using the resampling function built in the LABNIRS system.Then, both the ABS data and [Hb] data were exported as the raw data.(II) Motion artifact correction: The head-motion-related artifacts were corrected with the temporal derivative distribution repair (TDDR) method 48 built in the NIRS-KIT toolbox.
Based on iteratively reweighting the temporal derivatives of the fNIRS signal, this parameter-free algorithm uses a robust regression approach to remove large fluctuations such as spikes and baseline shifts attributed to motion artifacts while leaving smaller, Then, we examined the following three types of contrasts: ][50][51] To confirm the validity of the TDDR method, we conducted a repeated-measures ANOVA on the signal-to-noise ratio (SNR) of ABS data with three factors (wavelength, channel, and correction: raw versus TDDR-corrected).Focusing on the factor of correction, the results showed a significant main effect [Fð1;27Þ ¼ 31.81,p < 0.001, η 2 p ¼ 0.54], with an SNR increment (ΔSNR) of 3.42 dB after the TDDR correction.Also, the wavelength × correction interaction was significant [Fð2;54Þ ¼ 8.31, p ¼ 0.001, η 2 p ¼ 0.24], showing that the correction effect varied across wavelengths but all valid (ΔSNR 780 nm ¼ 3.61 dB, ΔSNR 805 nm ¼ 3.29 dB, ΔSNR 830 nm ¼ 3.34 dB, ps < 0.001).The channel × correction interaction was not significant [Fð23;621Þ ¼ 1.17, p > 0.3, η 2 p ¼ 0.04], indicating a relatively homogeneous correction effect across channels (ΔSNR range = [2.25,5.70] dB, ps < 0.014).The three-way interaction was not significant either [Fð46;1242Þ ¼ 0.80, p > 0.6, η 2 p ¼ 0.03].These results proved that the TDDR method was suitable and effective for the current data.(III) Detrending.The data were detrended using a second-order polynomial regression model.(IV) Filtering.The data were filtered using a third-order IIR filter with a passband set at 0.01 to 0.39 Hz since the actual interstimulus intervals between flashes presented were all longer than 2.6 s.  6) the standard deviation (SD).Among them, the indices (2) to (6) were specially defined to automatically detect a certain type of low-quality data: upon visual inspection of the data, we noticed that many abnormal data involved a common pattern of varying between several limited values or remaining fixed for a period of time, which might not be screened by SNR because their SNR often seemed normal due to abnormally small SD.SNR was calculated for the average ABS of the three wavelengths, and the others for [HbO].First, SNR, NDL, and MCDL were calculated directly on the basis of raw data.Blocks with SNR < 20 dB, NDL > 6, or MCDL > 2 were first rejected.Then, R, SD, and RR of the remaining blocks were assessed by their deviation within the distribution.Considering the high skewness of these distributions, we adopted the adjusted boxplot method 53 with the LIBRA toolbox 54,55 instead of the ordinary normalization.Blocks with R or SD beyond ½Q • IQR were further rejected (Q 1 , first quartile; Q 3 , third quartile; IQR, interquartile range; MC, medcouple; see Ref. 53).All the criteria except for SNR were determined to distinguish as many abnormal data described above from the rest as possible by visual inspection.
Finally, only those channels that had all five conditions each containing at least one survival block could enter the further analyses.That is, if on a channel any condition had all its corresponding blocks rejected, then the whole channel would be excluded.Altogether, about 4.2% of all the blocks were rejected.

First-level (individual) analysis
0][61] However, the results for [HbR] will also be reported as a comparison.
The data were analyzed using a block design approach.The onset of the first flash and the offset of the last one in each block were defined as the beginning and end, respectively, and the duration between them was considered as the block duration.After pre-processed, the data of each participant were fitted to a GLM based on the canonical (two-gamma) hemodynamic response function provided by the SPM12 toolbox to obtain a β-value for each condition (β a to β e ).Then, we focused on three types of contrasts below: According to the assumption mentioned above, β a contained all three types of signals while β b represented only the effect of the visual stimulus, so ðβ a − β b Þ represented the vestibular component and the vestibular-visual interacting component (i.e., the hFLE).The case was similar for β c and β d , except that the long flash was supposed not to induce the hFLE, so ðβ c − β d Þ represented only the vestibular component.Therefore, Δβ 1 should represent a relatively pure effect of the hFLE.We also calculated Δβ 2 by replacing ðβ c − β d Þ with β e .And to examine whether the long flash was capable of inducing the hFLE, Δβ 3 was also calculated.

Second-level (group) analysis
We conducted one-sample t-tests to compare Δβ on each channel with a baseline of 0 in order to examine the differential brain activation across conditions.A significant deviation from 0 for Δβ 1 or Δβ 2 would indicate a potential locus that represented the hFLE on that channel.Besides, we analyzed the correlation between Δβ and rotation velocity or behavioral performance to relate neural activation to stimulus intensity or perception.As the analyses were conducted separately on each channel, multiple comparison corrections were implemented using the false discovery rate (FDR) method 62 within either hemisphere.
For all the statistics above and below, effect sizes are reported as Cohen's d (d for short) for t-tests or η 2 p for ANOVA.All the p-values reported in the ANOVA have been adjusted where necessary using Greenhouse-Geisser correction, and post hoc tests were conducted with the Tukey-Kramer method.

Behavioral Performance
According to the one-sample t-test results (Fig. 5), when the participants remained static (i.e., conditions B and D), their responses for the flash bar appearing on the left side did not deviate significantly from 50% [B: M AE SD ¼ ð47.36 AE 21.32Þ%; D: ð52.31 AE 23.93Þ%; ps > 0.5], indicating little response bias.However, when the participants were moving (conditions A and C), responses for the flash bar appearing in the opposite side of the rotating direction were significantly greater than 50% of the total [A: ð72.69 AE 12.64Þ%, tð27Þ ¼ 9.50, p < 0.001, d ¼ 1.80; C; ð59.98 AE 15.19Þ%, tð27Þ ¼ 3.48, p ¼ 0.002, d ¼ 0.66], which suggested an evident hFLE in both flash durations.
As for the ANOVA, the main effect of flash duration was significant [Fð1;27Þ ¼ 9.68, p ¼ 0.004, η 2 p ¼ 0.26] with a higher response proportion for short flashes than for long ones, and so was the main effect of motion state [Fð1;27Þ ¼ 12.01, p ¼ 0.002, η 2 p ¼ 0.31] with a higher response proportion when moving than when static.The interaction between flash duration and motion state was also significant [Fð1;27Þ ¼ 22.33, p < 0.001, η 2 p ¼ 0.45]: the response proportion was higher when moving than when static (p < 0.001) but only for short flashes.In addition, the response proportion for short flashes was much higher than that for long flashes when moving (p < 0.001) but slightly lower when static (p ¼ 0.043).These results showed that although either short or long flashes could induce the hFLE, the effect was relatively weaker for long flashes than for short flashes.

fNIRS
The actual sample sizes in the fNIRS analyses were less than the number of participants since some channels of some participants were excluded in the pre-processing stage, which will be reported below.
We then decided to focus on Ch. #3, #10, #20, and #22, which passed the FDR correction, as the main results.Although Ch. #23 also passed the FDR correction for Δβ 2 , it was excluded with caution because it did so for the control Δβ 3 as well, which was not specific enough to the hFLE.Due to a similar reason, Ch. #17 and #24 were leniently selected because they showed relatively large raw Δβ 2 but no Δβ 3 , which was consistent with the anticipation of the experiment.The results of localization for them are listed in Table 3.
The correlation between β and rotation velocity in the two conditions with rotation (i.e., A and C) was then explored.As shown in Fig. 7 and Table 4, Ch.#3 showed significant positive correlation in condition A (N ¼ 26, r ¼ 0.52, p ¼ 0.006) and Ch.#15 in condition C (N ¼ 28, r ¼ 0.39, p ¼ 0.041).Besides, both channels showed the trend of positive correlation on the other condition (Ch.#3, condition C: N ¼ 26, r ¼ 0.38, p ¼ 0.055; Ch. #15, condition A: N ¼ 28, r ¼ 0.33, p ¼ 0.086).In fact, the posterior temporal-parietal area was found to have a slightly higher correlation than other tested regions (Fig. 8).
Finally, we tested the correlation between Δβ and the corresponding hFLE intensity in conditions A and C. None of the focused channels were found to have a significant correlation, showed significant positive correlation between Δβ 3 and the response proportion in condition C (Fig. 9).However, these results did not pass the FDR correction.The regions involved for these channels are listed in Table 5.As shown in Fig. 10, the negative correlation was mainly distributed across posterior part of the medium temporal gyrus in condition A, whereas the lateral middle occipital area showed positive correlation in condition C.However, no significant correlation was observed directly between rotation velocity and behavioral performance (condition A: r ¼ −0.21, p > 0.2; condition C: r ¼ 0.04, p > 0.8).

Discussion
By using fNIRS, this study measured for the first time the brain activity corresponding to the hFLE.We provided the neuroimaging evidence for the engagement of left supramarginal gyrus (SMG)/superior temporal gyrus (STG) area (BA 22), the posterior part of bilateral middle temporal gyri (MTG)/inferior temporal gyri (ITG) areas (BA 37), and lateral occipital areas (BA 18/19).To manipulate the hFLE while minimizing the alteration of the visual stimuli, we extended the duration of the flash bar based on previous literature on classical FLE. 41,42,44However, it turned out that the hFLE was reduced but not fully eliminated when the flash duration was extended to 1 s, as shown by the behavioral results that the hFLE for the long flash was also significant though weaker than that for the short flash and not significantly different from the control condition when the participants were static.Although it is difficult to propose an explicit account for this finding with only current data, we believe that it might reflect a certain difference between the mechanisms for the hFLE and classical FLE, which requires further empirical evidence.For example, the formation of hFLE might involve higher-level sensory processing such as multisensory interaction compared to the classical FLE and thus took a longer time span, which could cover a longer flash stimulus.On the other hand, the current behavioral task forced participants to discriminate the mislocalization between two physically aligned bars, resulting in virtually no signal to detect.Therefore, intrinsic neural noise might have a significant impact on perception, 63,64 leading to the noticeable individual response biases in both static control conditions (B and D).This could also explain the observed hFLE for the long flash.
Consistent with our hypothesis that the hMT+ activates during the hFLE, Ch. #22 and #24 were identified by the analyses of Δβ contrasts of interest.6][67][68] The hMT+ and its homolog in the macaque have been well recognized as key areas for processing visual motion information, both in macaques and humans.[69][70][71][72][73][74] Della-Justina et al. 28 presented participants with visual stimulus of flickering checkboards, vestibular stimulus of galvanic vestibular stimulation (GVS), and combined stimulus, and observed activation of ITG to single visual stimulus and visual-vestibular combined stimulus.Furthermore, Indovina et al. 30 used graph-theoretical network analysis to find that BA 37 was structurally connected to the posterior insula cortex (PIC), a key hub in the vestibular cortical network, and that BA 37 itself also showed some hubness in the vestibular cross-modal network.
Based on these previous findings, we reckon that the activity of BA 37 is related to visualvestibular integration processing that occurs in hMT+.The negative correlation between the activity in BA 37 and the behavioral performance may help clarify its role.Given that there was no visual motion stimulus in this study, it is likely that the activity of hMT+ represented the afferent retinal motion signals generated by eye movements and intrinsic neural noises of the participant.One possible explanation for this is that weaker activation of the hMT+ in a participant reflects fewer eye movements or neural intrinsic noises, resulting in weaker feedforward visual signals.This would lead to less weight of visual signals in the visual-vestibular integration and ultimately leave the estimation of object position more susceptible to the modulation of vestibular signals.Alternatively, the hMT+ could be subject to the inhibition by vestibular signals or by the feedback from visual-vestibular integration.Indeed, several past studies have observed mutual inhibition in visual-vestibular integration, 29,75,76 especially when the stimulus from the suppressed modality was weak.This suppression might aim to reduce neural noise from the weak modality and resolve sensory conflict. 75In any case, the intensity of hFLE reflects the relative disadvantage of the visual modality compared to the vestibular modality in their integration during self-motion in our study.
On the other hand, Ch. #3 and #17 (and #14 in the correlation analysis) are located around SMG and STG (especially the posterior part, BA 22), two areas that also participate in visualvestibular integration, especially the representation and processing of sensory conflicts. 27,33,770][81] Previous neuroimaging research studies have found that the temporoparietal areas, including the SMG and STG, respond to caloric vestibular stimulation or GVS 22,82 and activate in body-related tasks such as standing balance or inner verticality where the cues from other sensory modalities than vestibular were deprived, 83,84 and the STG response increased with the GVS intensity. 82In an fNIRS study using horizontal rotation as the vestibular stimulation, Nguyen et al. 33 found that TPJ was sensitive to the incongruency between visual and vestibular stimuli, and that the activation of dorsal SMG negatively correlated with the intensity of subjective vertigo reported by the participant.
Although both hMT+ and TPJ engage in visual-vestibular integration, there are still some differences.Compared with hMT+, which is more of a visual related area, TPJ is closer to vestibular processing in visual-vestibular integration.This could also explain the different correlation results.The brain activation to the hFLE on Ch. #3 and #14 (TPJ) was positively correlated to the velocity probably because TPJ was more active in solving bigger visual-vestibular conflicts when the vestibular stimulation was stronger.In contrast, the brain activity on Ch. #6 and #8 identified in the correlation analysis on β and located in hMT+ influenced behavioral responses, which reflected the visual perception and subsequent sensory decision making affected by the visual-vestibular interaction.
As to Ch. #10 and #20 (and #15 and #18 in the correlation analysis), they fell in the lateral part of the medial occipital area (BA 18/19, extrastriate visual cortex V2/V3).The brain activity here might be reflective of the remaining effect from the visual stimulus of the flash, which was not fully eliminated in Δβ 2 and Δβ 3 .This also explains the two distinct types of correlations between Δβ and the hFLE intensity in conditions A and C: in condition A with a weak visual input and strong hFLE, the performance was mainly determined by the cross-modal interaction in hMT+ as stated above, whereas in condition C where the visual input was relatively strong and the illusion was weak, the perception was likely influenced much by the response bias, which reflected the intrinsic neural noise of the extrastriate visual cortex V2/V3, as discussed at the beginning of the section.
Finally, despite that the brain activity was observed to correlate with both the rotation velocity and the hFLE intensity, no significant correlation was found directly between the latter two, which was consistent with our previous findings. 24In that study, an hFLE was significantly induced both when the participants actively rotated the head (experiment 1, HM condition) and when they were passively rotated in the chair by the experimenter (experiment 1, BM condition), but the hFLE intensity significantly correlated to the rotation velocity only in the HM and not in the BM condition, which is interesting especially because these two conditions did not show significant difference in the hFLE intensity.And that finding in the BM condition was replicated in this study.][87] Note that our findings were not symmetric across both hemispheres, with activation in the right hemisphere being generally stronger.This might be attributed to the fluctuations and noise in neural activity and data acquisition, but another possibility could be lateralization.Many studies have shown that at least part of the vestibular network was lateralized both anatomically and functionally toward a right-sided dominance, 23,29,30,[88][89][90] including when processing horizontal rotating directions. 91This lateral dominance was proved dependent partly on the handedness and often showed in the non-dominant hemisphere, 81,92 which might account for the current findings, but other factors and mechanisms are also involved. 22,30,88,935][96] These different roles may even relate to unique physiological characteristics such as elongated hemodynamics response. 22And the current results should be interpreted cautiously and further evidence is still needed due to several limitations.First, the spatial resolution of fNIRS is much lower than fMRI, and fNIRS can only detect cortical surface activation due to its imaging principle.Therefore, the results may not be sufficiently accurate or complete.Besides, the brain atlas used here is mainly anatomical rather than functional, making it challenging to match the current findings to past fMRI research about visual-vestibular integration.Future studies can use neuronavigation techniques to help access more precise ROIs and improve data reliability [97][98][99] and neuromodulation tools such as TMS to confirm whether these areas play a causal role.Also, a motion platform 100 can be used to implement finer and stricter manipulations of the motion profiles.

Conclusion
This study observed the activation of right SMG/STG area (BA 22), bilateral MTG/ITG areas (BA 37), and bilateral MOG/IOG areas (BA 18/19) in participants experiencing the hFLE.The activation of SMG was positively correlated with the rotation velocity while the activation of MTG was negatively correlated with the intensity of the hFLE.These findings demonstrate the role of visual-vestibular interaction in the formation of hFLE and provide indirect support to the visual-vestibular interaction account of the hFLE, 24 suggesting that multiple visual-vestibular interactions help form the hFLE including the processing of multisensory conflicts in TPJ and the biasing of position perception by vestibular information in hMT+.Another minor but intriguing observation was that the hFLE could tolerate a longer duration of the flash stimulus compared to the classical visual FLE, indicating distinct mechanisms of visual processing between these two phenomena.This study also lends more support to the application of fNIRS in multisensory neuroimaging investigations that allows vestibular stimulations of real motion.

Fig. 1
Fig. 1 (a) The optode layout used in the study.Emitters are marked with red (T) and detectors with blue circles (R).Each adjacent pair of emitter and detector were 3 cm apart from each other to form a channel (underlined).There were 12 channels on each hemisphere and 24 in total.The distances marked are geodesic along the scalp.All the distances and sizes diagramed in the figure are for illustration purpose only and not in real scale.(b) Average locations of the channels on the brain template.

Fig. 2
Fig.2Illustration of a single trial in condition A or C. The experimenter started rotating the chair on hearing a high-pitch tone.0.6 to 1.2 s (condition A) or 0.3 to 0.8 s (condition C) later, the flash bar appeared above the reference bar for 16.7 ms (condition A) or 1 s (condition C).The participant was asked to judge whether the flash bar was to the left or the right of the reference bar although the two bars were actually vertically aligned.The rotation was stopped after the low-pitch tone, which was 2 s later than the high-pitch tone.These auditory cues were played to the experimenter through earphones, so the participant could not hear them.The figure only showed the case where the participant was rotated leftward, but the rotating direction alternated between trials actually.All the distances and sizes of the visual stimuli in the figure are for illustration purpose only and not in real scale.
52 (V) Outlier rejection.The data at each channel of each participant were then segmented and analyzed block-wise, based on the time series from the first flash onset until the last flash offset of a block.The data were assessed with six indices: (1) SNR (¼ 20 log 10 M SD ); (2) number of data loss (NDL), the number of the local data sequences with a fixed value (≥ 2 consecutive data points); (3) maximum continuous data loss (MCDL), the maximum length of the local data sequences with a fixed value; (4) repetition rate (RR), the proportion of repetitive values (= 1-number of distinct values/data length); (5) the range (R); and (

Fig. 4
Fig. 4 Demonstration of a sample of raw [HbO] time series (second row; downsampled already) undergoing the pre-processing procedure step by step (third to bottom row).This signal was obtained from Ch. #3 of participant #5.The corresponding raw [HbR] time series (top row) is also plotted as a comparison.All the signals are in arbitrary unit.

Fig. 7
Fig. 7 Correlation between β on Ch. #3 (upper row) and #14 (lower row) and rotation velocity in conditions A (left column) and C (right column).

Fig. 8
Fig. 8 Distribution of the correlation between β and rotation velocity in conditions (a) A and (b) C.

Fig. 9
Fig.9Correlation between Δβ and the corresponding response proportion.

Fig. 10 (
Fig. 10 (a) Distribution of the correlation between Δβ 2 and the behavioral performance in condition A and (b) between Δβ 3 and the behavioral performance in condition C.

Table 1
Summary of all conditions.

Table 2
Results of the examinations on Δβ with uncorrected ps < 0.05.

Table 3
Brain regions corresponding to the focused channels.

Table 4
Channels with correlation to the rotation velocity.

Table 5
Channels with correlation to the behavioral performance.