## 1.

## Introduction

Near-infrared (NIR) light, with a wavelength between
$650\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
and
$950\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
, is capable of penetrating deeply through biological tissues. This is because NIR light is weakly absorbed by biological chromophores such as hemoglobin, myoglobin, and cytochrome c oxidase.^{1} The relatively deep penetration depth of NIR light in the human brain makes it possible to measure brain activities associated with regional changes of oxy- and deoxy hemoglobin concentrations.^{2} This spectroscopic technique using NIR light for monitoring brain activities is known as functional near-infrared spectroscopy (NIRS).^{3}

NIRS is regarded as a promising neuroimaging modality owing to a number of advantages over other neuroimaging modalities such as positron emission tomography (PET) and functional magnetic resonance imaging (fMRI).^{3} For example, there is no theoretical limitation on temporal resolution (while of course there are practical limits arising from, for example, the speed of the analog-to-digital converter). The temporal resolution of NIRS, therefore, is sufficient to investigate hemodynamic responses due to brain activations as well as other fast varying physiological conditions. Furthermore, NIRS does not require that the subject lie on his/her back in a confined environment during experiments, thereby making it possible to investigate subjects that would normally be difficult to examine using fMRI or PET, including infants, children, and patients with psychological issues. Moreover, NIRS is highly flexible, portable, and relatively low cost.

However, there remain several theoretical and practical difficulties for quantitative analysis of NIRS data. For example, the differential path length factor (DPF)^{4} depends on various parameters such as the age of the subject,^{5} wavelength of the imaging system,^{6} and position within a brain.^{7} Time-resolved or frequency domain NIRS equipment may be used to estimate the mean optical path length; however, continuous wave (CW) systems are more commonly used due to several practical concerns such as cost of implementation.^{7} Other subject-dependent parameters such as depth of the skull and optical properties of hair may influence measurement of optical parameters.

Recently, many researchers have been developing statistical analysis toolboxes for NIRS based on the generalized linear model (GLM).^{8, 9, 10, 11} The GLM is a statistical linear model that explains data as a linear combination of explanatory variables plus an error term. Since the GLM analysis relies on the temporal variational pattern of signals, it is more robust to differential path length factor (DPF) variation, optical scattering, or poor contact. Furthermore, statistical parameter mapping (SPM) using the GLM is a standard method for analyzing fMRI data,^{12} and thus integration of NIRS and fMRI within the same SPM framework may offer the advantage of modeling both types of data in the same mathematical framework to make inferences. Based on these observations, we have developed a new public domain statistical toolbox called NIRS-SPM.^{11} By incorporating the GLM with the
$p$
-value calculation using Sun’s tube formula,^{13, 14} NIRS-SPM not only enables calculation of activation maps of oxy-, deoxy, and total hemoglobin but also allows for super resolution localization, which is not possible using conventional analysis tools.^{11}

Note that the GLM often fails when there exist global drifts in the NIRS measurements due to various reasons, such as subject movement, blood pressure variation, and instrumental instability. This global trend causes a low-frequency bias in NIRS measurements. Moreover, the amplitude of the global drift is often comparable to that of the signal from brain activation, which degrades the signal-to-noise ratio. In order to eliminate the global trend and thus improve the signal-to-noise ratio, high-pass filtering is often used in practice.^{15} However, the signals from brain activations are often degraded by a simple filtering, since the frequency response of the hemodynamic response can also be affected during high-pass filtering.

In order to overcome this problem, this paper investigates a wavelet-based detrending algorithm for fMRI^{16} and adapts it for NIRS applications. Specifically, the wavelet transform was applied to decompose NIRS measurements into global trends, hemodynamic signals, and noise components at distinct scales. However, unlike fMRI data, the NIRS time series is considerably long due to the fast sampling frequency. Hence, we observed that direct application of the model order selection rule in Ref. 16 leads to erroneous results in NIRS. To remedy this problem, the minimum description length (MDL) principle with universal prior of integers^{17} is found to be suitable for the NIRS time series, as it avoids over- and underfitting of the global trend estimate, thanks to the asymptotic optimality of MDL. Experimental results confirm that the new detrending algorithm outperforms the conventional approaches. We have therefore incorporated the proposed Wavelet-MDL detrending algorithms within our NIRS-SPM framework, which will soon be publicly available at the website of the authors (http://bisp.kaist.ac.kr/NIRS-SPM). We observed that the Wavelet-MDL detrending method within NIRS-SPM provides more specific localizations of the neuronal activation than the standard high-pass filtering approach based on the discrete cosine transform (DCT).

## 2.

## NIRS Measurement Model

The modified Beer-Lambert law (MBLL), which describes optical attenuation in a highly scattering medium such as biological tissue,^{4} provides a relation between raw optical density (OD) data and changes of chromophore concentrations. According to the MBLL, the change in
$\mathit{OD}(\lambda ,r,t)$
for the wavelength
$\lambda $
at the cerebral cortex position
$r\u220a{\mathfrak{R}}^{3}$
at time
$t$
due to the
${N}_{c}$
number of chromophore concentration changes
${\left[\Delta {c}^{\left(i\right)}(r,t)\right]}_{i=1}^{{N}_{c}}$
is described as

## Eq. 1

$\Delta \mathit{OD}(\lambda ,r,t)=-\mathrm{ln}\left(\frac{{I}_{F}}{{I}_{o}}\right)=\sum _{i=1}^{{N}_{c}}{a}_{i}\left(\lambda \right)\Delta {c}^{\left(i\right)}(r,t)d\left(r\right)l\left(r\right),$## Eq. 2

$\left[\begin{array}{c}\Delta \mathit{OD}(r,t;{\lambda}_{1})\\ \Delta \mathit{OD}(r,t;{\lambda}_{2})\end{array}\right]=d\left(r\right)l\left(r\right)\left[\begin{array}{cc}{a}_{1}\left({\lambda}_{1}\right)& {a}_{2}\left({\lambda}_{1}\right)\\ {a}_{1}\left({\lambda}_{2}\right)& {a}_{2}\left({\lambda}_{2}\right)\end{array}\right]\left[\begin{array}{c}\Delta {c}_{\mathit{Hb}O}(r,t)\\ \Delta {c}_{\mathit{Hb}R}(r,t)\end{array}\right]+\left[\begin{array}{c}w(r,t;{\lambda}_{1})\\ w(r,t;{\lambda}_{2})\end{array}\right],$## Eq. 3

$\left[\begin{array}{c}{y}_{\mathit{Hb}O}(r,t)\\ {y}_{\mathit{Hb}R}(r,t)\end{array}\right]=d\left(r\right)l\left(r\right)\left[\begin{array}{c}\Delta {c}_{\mathit{Hb}O}(r,t)\\ \Delta {c}_{\mathit{Hb}R}(r,t)\end{array}\right]+\left[\begin{array}{c}{\u03f5}_{\mathit{Hb}O}(r,t)\\ {\u03f5}_{\mathit{Hb}R}(r,t)\end{array}\right],$^{7}this information is not obtainable in commonly available CW systems. Furthermore, NIRS data acquisition is considerably affected by a variety of measurement conditions, such as the color of hair and the scalp depth, which introduces position- and subject-dependent scattering effects. For these reasons, analyzing NIRS data using the magnitude of chromophore concentration changes is often problematic.

## 3.

## General Linear Model for NIRS

In the fMRI domain, the validity of the GLM has been extensively tested, and the GLM has been established as a standard analyzing method. Statistical parametric mapping (SPM)^{18} and analysis of functional neuro images (AFNI)^{19} are widely used programs based on the GLM. Analysis based on the GLM consists of three steps: model specification, parameter estimation, and statistical inference.^{15} In this section, we review the GLM approach for NIRS applications.^{11}

The GLM describes a measurement ${y}_{\mathit{Hb}X}(r,t)$ (i.e., ${y}_{\mathit{Hb}O}$ or ${y}_{\mathit{Hb}R}$ ) in terms of a linear combination of $L$ explanatory variables plus an error term:

## Eq. 4

${y}_{\mathit{Hb}X}(r,t)={x}_{1}\left(t\right){\beta}_{1}+\cdots +{x}_{L}\left(t\right){\beta}_{L}+{\u03f5}_{\mathit{Hb}X}(r,t).$## Eq. 5

$\mathit{y}={[{y}_{\mathit{Hb}X}(r,{t}_{1})\phantom{\rule{0.3em}{0ex}}{y}_{\mathit{Hb}X}(r,{t}_{2})\phantom{\rule{0.3em}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}}{y}_{\mathit{Hb}X}(r,{t}_{N})]}^{T},$## Eq. 6

$\mathbf{\u03f5}={[{\u03f5}_{\mathit{Hb}X}(r,{t}_{1})\phantom{\rule{0.3em}{0ex}}{\u03f5}_{\mathit{Hb}X}(r,{t}_{2})\phantom{\rule{0.3em}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}}{\u03f5}_{\mathit{Hb}X}(r,{t}_{N})]}^{T}.$^{15}

For fMRI signals, Boynton showed that the BOLD signal can be approximated as a convolution model between a stimulus function and a hemodynamic response function (HRF).^{20} Based on a similar argument, several statistical analysis toolboxes using the GLM are currently available for NIRS.^{8, 9, 10, 11} The stick function or the boxcar function is typically used for the stimulus function. For the HRF, there are a number of possible models. In this paper, we follow fMRI approaches and employ the so-called canonical HRF, which is composed of two gamma functions.^{15} Additionally, the derivatives of the HRF with respect to delay and dispersion can be used to mitigate the problem that the precise shape of the HRF varies across the brain.^{21} An adaptive estimation of HRF using multiple gamma functions can also be used in NIRS to account for oxygen species–dependent hemodynamics variation.^{11}

After the model specification, the least-squares parameter estimator is derived using the ordinary least squares. If the design matrix $\mathit{X}$ is of full rank, the least-squares estimate is:

## Eq. 8

$\widehat{\mathbf{\beta}}={\left({\mathit{X}}^{T}\mathit{X}\right)}^{-}{\mathit{X}}^{T}\mathit{y},$## Eq. 9

${c}_{1}{\widehat{\beta}}_{1}+\cdots +{c}_{L}{\widehat{\beta}}_{L}={\mathit{c}}^{T}\widehat{\mathbf{\beta}},$^{15}Similar to the fMRI-SPM analysis,

^{15}the error $\mathbf{\u03f5}$ in Eq. 7 is assumed to be normally distributed with a temporal covariance matrix $\mathbf{\Sigma}$ ; hence, we have

## Eq. 10

${\mathit{c}}^{T}\widehat{\mathbf{\beta}}\sim N[{\mathit{c}}^{T}\mathbf{\beta},{\mathit{c}}^{T}{\left({\mathit{X}}^{T}\mathit{X}\right)}^{-}{\mathit{X}}^{T}\mathbf{\Sigma}\mathit{X}{\left({\mathit{X}}^{T}\mathit{X}\right)}^{-}\mathit{c}].$## Eq. 11

$t=\frac{{\mathit{c}}^{T}\widehat{\mathbf{\beta}}}{{\left[{\mathit{c}}^{T}{\left({\mathit{X}}^{T}\mathit{X}\right)}^{-}{\mathit{X}}^{T}\mathbf{\Sigma}\mathit{X}{\left({\mathit{X}}^{T}\mathit{X}\right)}^{-}\mathit{c}\right]}^{1\u22152}},$^{15}

## Eq. 12

$\mathit{df}=\frac{\mathit{tr}{\left[\mathit{R}\mathbf{\Sigma}\right]}^{2}}{\mathit{tr}\left[\mathit{R}\mathbf{\Sigma}\mathit{R}\mathbf{\Sigma}\right]},\phantom{\rule{1em}{0ex}}\mathit{R}={\mathit{I}}_{N\times N}-\mathit{X}{\left({\mathit{X}}^{T}\mathit{X}\right)}^{-}{\mathit{X}}^{T},$The estimate of the temporal correlation matrix
$\mathbf{\Sigma}$
hence affects the overall
$t$
-value and corresponding inferences. For a detailed discussion on the estimation of
$\mathbf{\Sigma}$
, readers can refer to our previous work on this issue.^{11} If the calculated
$t$
-value is larger than a certain threshold value, then the inference steps abandon the null hypothesis and we declare the area to be activated. The threshold value is calculated by fixing a
$p$
-value in the range of 0.001 to 0.05. A smaller
$p$
-value provides a higher threshold value. The nonlinear relationship between the
$p$
-value and threshold can be explicitly represented using the tube formula, as described in our companion paper.^{11}

## 4.

## Wavelet-MDL Detrending

In this section, we develop a novel detrending algorithm that is designed to address the global-drift issues described in the Introduction. Our point of departure is an algorithm developed for detrending the fMRI time series.^{16}

## 4.1.

### Notation

We first introduce the notation associated with a discrete wavelet transform by following the standard conventions.^{22} Let
$\psi \left(t\right)$
denote a wavelet associated with the multiresolution analysis.^{22} Let
$\Phi \left(t\right)$
,
$h$
, and
$g$
be the scaling function, the low-pass filter, and the high-pass filter associated with this wavelet transform, respectively. With a slight abuse of notation, let
$\mathbf{\theta}=\left\{\theta \left[n\right]\right\},n=0,\dots ,N-1$
be a discrete version of a continuous signal
$\mathbf{\theta}\left(t\right)$
. For simplicity, we assume
$N={2}^{J}$
, where
$J$
is the maximum level of wavelet decomposition. The wavelet coefficients composed of approximation coefficients
${\left\{a{\theta}_{j}\left[k\right]\right\}}_{j,k}$
and detail coefficients
${\left\{d{\theta}_{j}\left[k\right]\right\}}_{j,k}$
are defined by the following recursions:^{22}

## Eq. 13

$\mathit{W}\mathbf{\theta}={\{a{\theta}_{J}\left[0\right],{\mathit{d}}_{J},{\mathit{d}}_{J-1},\dots ,{\mathit{d}}_{1}\}}^{T},$## 4.2.

### Modified GLM with Baseline Drift

In the wavelet detrending algorithm for fMRI,^{16} the baseline drift is included as part of the GLM:

## Eq. 15

$\left[\begin{array}{c}\Delta \mathit{OD}(r,t;{\lambda}_{1})\\ \Delta \mathit{OD}(r,t;{\lambda}_{2})\end{array}\right]=d\left(r\right)l\left(r\right)\left[\begin{array}{cc}{a}_{1}\left({\lambda}_{1}\right)& {a}_{2}\left({\lambda}_{1}\right)\\ {a}_{1}\left({\lambda}_{2}\right)& {a}_{2}\left({\lambda}_{2}\right)\end{array}\right]\left[\begin{array}{c}\Delta {c}_{\mathit{Hb}O}(r,t)\\ \Delta {c}_{\mathit{Hb}R}(r,t)\end{array}\right]+\left[\begin{array}{c}w(r,t;{\lambda}_{1})\\ w(r,t;{\lambda}_{2})\end{array}\right]+\left[\begin{array}{c}\stackrel{\u0303}{\theta}(r,t;{\lambda}_{1})\\ \stackrel{\u0303}{\theta}(r,t;{\lambda}_{2})\end{array}\right],$## Eq. 16

$\left[\begin{array}{c}{y}_{\mathit{Hb}O}(r,t)\\ {y}_{\mathit{Hb}R}(r,t)\end{array}\right]=d\left(r\right)l\left(r\right)\left[\begin{array}{c}\Delta {c}_{\mathit{Hb}O}(r,t)\\ \Delta {c}_{\mathit{Hb}R}(r,t)\end{array}\right]+\left[\begin{array}{c}{\u03f5}_{\mathit{Hb}O}(r,t)\\ {\u03f5}_{\mathit{Hb}R}(r,t)\end{array}\right]+\left[\begin{array}{c}{\theta}_{\mathit{Hb}O}(r,t)\\ {\theta}_{\mathit{Hb}R}(r,t)\end{array}\right],$## Eq. 17

${\mathit{y}}_{\mathit{Hb}X}={\mathit{X}}_{\mathit{Hb}X}{\mathbf{\beta}}_{\mathit{Hb}X}+{\mathbf{\u03f5}}_{\mathit{Hb}X}+{\mathbf{\theta}}_{\mathit{Hb}X}.$The global trend signal varies smoothly in most cases. Based on this observation, conventional detrending algorithms use filtering to remove the low-frequency trend signal. The problem of this approach, however, is that the hemodynamic signal often has a low-frequency varying component that can be erroneously removed during the filtering. In order to deal with this artifact, in our wavelet detrending, the unknown trend is modeled as a signal restricted in a subspace spanned by coarse scale wavelets.^{16} More specifically, the trend is modeled as:^{16}

## Eq. 18

$\mathbf{\theta}\left(\mathit{t}\right)=a{\theta}_{J}\left[0\right]\Phi \left({2}^{-J}t\right)+\sum _{j={J}_{0}}^{J}\sum _{k=0}^{{2}^{-j}N-1}d{\theta}_{j}\left[k\right]\psi ({2}^{-j}t-k),$Using the discrete wavelet transform (DWT) matrix $\mathit{W}$ defined in Eq. 13, we can represent the wavelet transform of a global trend signal as follows:

## Eq. 19

$\mathit{W}\mathbf{\theta}={[a{\theta}_{J}\left[0\right],{\mathit{d}}_{J},\dots ,{\mathit{d}}_{{J}_{0}},0,\dots ,0]}^{T}.$^{16}

## Eq. 20

$\widehat{\mathbf{\xi}}={\left[{\mathit{A}}^{T}{\mathbf{\Sigma}}^{-1}\mathit{A}\right]}^{-1}{\mathit{A}}^{T}{\mathbf{\Sigma}}^{-1}\mathit{W}\mathit{y},$Note that
$\mathbf{\Sigma}$
in Eq. 20 is the covariance matrix of noise in wavelet domain. Even if the original time series is highly correlated, the array of wavelet coefficients exhibits much less correlation.^{23} This decorrelating (or whitening) feature of the wavelet transform has been studied in Refs. 23, 24, 25. Especially for the fractional Brownian motion (fBM) process,^{26} which is a popular model for
$1\u2215f$
-type noise, the decay rate of the correlation of wavelet coefficients in
$j$
’th scale is derived as:^{24}

## Eq. 21

$E\{d{\theta}_{j}\left[m\right],d{\theta}_{j}\left[n\right]\}=O\left({\mid {2}^{-j}(m-n)\mid}^{2(H-p)}\right),$## Eq. 22

$\mathbf{\Sigma}=\mathrm{diag}\{{\sigma}_{J}^{2},{\sigma}_{J}^{2},\dots ,{\sigma}_{1}^{2}\},$^{23, 27}

## Eq. 23

${\widehat{\sigma}}_{j}^{2}=\text{Median}\{d{\theta}_{j}\left[0\right],\dots ,d{\theta}_{j}[{2}^{-j}N-1]\}\u22150.6745,$^{23, 27}

In the wavelet detrending method for a fMRI time series,^{16} the complexity of the unknown global bias is solely determined by
${J}_{0}$
. In other words, there is an implicit assumption that all coefficients at the same scale are all zero or all nonzero simultaneously. However, in the case of NIRS, the number of data points
$N$
is much larger than that of the fMRI time series, and thus this simple scheme may cause over- or underfitting. Therefore, for a fine-tuned estimate of model order, the wavelet coefficients at the same scale are sorted in order of descending magnitude and are included one by one until a suitable complexity is found by the model order selection criterion. Note that this procedure is similar to MDL thresholding in signal restoration.^{28, 29}

The wavelet detrending method has several advantages over the conventional filtering approach. Note that the conventional filtering cannot prevent the removal of the hemodynamic signal, since it decides the cutoff frequency using only the paradigm repetition frequency, and the nonnegligible amount of the hemodynamic signals can often be filtered out. This is especially true for the event-related paradigm^{30} since it does not have a well-defined cutoff frequency. Compared with the conventional method, the wavelet-based detrending algorithm includes the design matrix
$\mathit{X}$
in the least-squares estimation process described in Eq. 20. Hence, the canonical hemodynamics time series is fully utilized in the estimation process, and the algorithm is more robust. Furthermore, the wavelet-based approach is more effective in removing relatively fast varying trends thanks to the optimality of wavelet transform in describing the transient changes of fast varying signals.^{22}

## 4.3.

### Minimum Description Length Principle for Model Order Selection

In Eq. 20, we have shown a simple one-step method for simultaneous estimation of the global trend
$\mathbf{\theta}$
and the signal strength
$\mathbf{\beta}$
for a *given* hyperparameter
${J}_{0}$
in terms of maximum likelihood estimation. The remaining issue is the determination of the hyperparameter
${J}_{0}$
. Even though
${J}_{0}$
is a single integer valued parameter, it affects the whole behavior of the estimated trend signal, since
${J}_{0}$
determines the number of wavelet coefficients
${n}_{0}$
that describe the trend. More specifically, if
${n}_{0}$
is an inappropriately large number, the hemodynamic response is distorted due to the overfitted trend estimate, whereas a smaller
${n}_{0}$
value may not capture the unknown trend properly. Therefore, the accuracy of the order estimate
${J}_{0}$
(or
${n}_{0}$
) ultimately determines the quality of the trend estimation.

The problem of estimating the order
${J}_{0}$
(or
${n}_{0}$
) is formally called the model order selection problem.^{17, 31, 32} A good model order selection criterion should satisfy two conflicting goals simultaneously: goodness of fit and concision of the model. Balancing these two goals is crucial in preventing the over- or underfitting. Akaike information criterion (AIC),^{33} Schwartz information criterion (SIC),^{32} and minimum description length (MDL) suggested by Rissanen^{17} are the most popular criteria currently available. The best model is then selected as the one that gives the smallest cost among several plausible models.

These three methods assume that the probability density function (pdf) is a function of unknown model order
${n}_{0}$
. Thus, the distance between the true pdf and the estimated pdf induced from the parameter estimate
${\widehat{n}}_{0}$
plays an important role. More specifically, AIC and SIC are closely related to the Kullback-Leibler (K-L) distance,^{34} defined as follows:

## Eq. 24

$I(f,g)=\int f\left(\mathit{y}\right)\mathrm{log}\left[\frac{f\left(\mathit{y}\right)}{g(\mathit{y}\mid {n}_{0})}\right]\mathrm{d}y={E}_{f}\left\{\mathrm{log}\left[\frac{f\left(\mathit{y}\right)}{g(\mathit{y}\mid {n}_{0})}\right]\right\},$^{31}and denoted by ${\mathrm{AIC}}_{C}$ . In many cases, ${\mathrm{AIC}}_{C}$ outperforms the conventional AIC.

^{35}The ${\mathrm{AIC}}_{C}$ and SIC cost functions for detrending are then summarized as:

## Eq. 25

${\mathrm{AIC}}_{C}=-\mathrm{log}\phantom{\rule{0.2em}{0ex}}P(\mathit{y}\mid {n}_{0})+\frac{1}{2}\frac{N+{n}_{0}}{N-{n}_{0}-2},\mathrm{SIC}=-\mathrm{log}\phantom{\rule{0.2em}{0ex}}P(\mathit{y}\mid {n}_{0})+\frac{1}{2}{n}_{0}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\phantom{\rule{0.2em}{0ex}}N,$## Eq. 26

$-\mathrm{log}\phantom{\rule{0.2em}{0ex}}P(\mathit{y}\mid {n}_{0})=-\mathrm{log}\left[{\left(2\pi {\sigma}^{2}\right)}^{-N\u22152}\phantom{\rule{0.2em}{0ex}}\mathrm{exp}(-\frac{{\Vert \mathit{y}-\mathit{X}\mathbf{\beta}-\mathbf{\theta}\Vert}^{2}}{2{\sigma}^{2}})\right],$## Eq. 27

${\widehat{\sigma}}^{2}\left({n}_{0}\right)=\frac{1}{N}{\Vert \mathit{y}-\mathit{X}\mathbf{\beta}-\mathbf{\theta}\Vert}^{2},$## Eq. 28

$-\mathrm{log}\phantom{\rule{0.2em}{0ex}}P(\mathit{y}\mid {n}_{0})=\frac{N}{2}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\phantom{\rule{0.2em}{0ex}}{\widehat{\sigma}}^{2}\left({n}_{0}\right)+\mathit{const},$*const*stands for a term that is not related to the model selection problem. Then, by substituting Eq. 28 into Eq. 25, we can find the optimal model order ${n}_{0}$ that minimizes the cost functions ${\mathrm{AIC}}_{C}$ and SIC, respectively.

The MDL principle follows a different philosophy compared with AIC and SIC. Formally, it is based on the Kolmogorov’s descriptive complexity,^{36, 37} which defines the amount of complexity as the length of the shortest binary computer program that is able to describe an object. In more intuitive terms, the MDL principle is based on the so-called Occam’s razor, interpreted as “If there are many explanations consistent with the observed data, choose the simplest.”^{37} However, it is hard to compute to Kolmogorov’s descriptive complexity.^{37} Rissanen suggested that the description length can be regarded as the number of binary bits used for data transmission between a hypothetical pair of an encoder and a decoder.^{17} This idea is heavily supported by Shannon’s source coding theorem, since the expected description length of data is minimum on average when the true model parameter of the pdf is used.^{37, 38} Note that this interpretation is parallel to K-L distance, which is zero if and only if the model parameter is true.

According to MDL, our task is to measure the total expected codelength to encode the measurement and the model. More specifically, the total code is composed of two parts: one for encoding measurements based on a model and the other for encoding the model itself. Hence, the total code length is described as

## Eq. 29

$\mathrm{MDL}\left({n}_{0}\right)=-{\mathrm{log}}_{2}\phantom{\rule{0.2em}{0ex}}P(\mathit{y}\mid {n}_{0})+\mathit{L}\left({n}_{0}\right),$^{39}the code length $\mathit{L}\left({n}_{0}\right)$ for MDL criterion can be written as:

^{28}

## Eq. 30

$\mathit{L}\left({n}_{0}\right)=\frac{1}{2}{n}_{0}\phantom{\rule{0.2em}{0ex}}{\mathrm{log}}_{2}\phantom{\rule{0.2em}{0ex}}N+{n}_{0}\phantom{\rule{0.2em}{0ex}}{\mathrm{log}}_{2}\phantom{\rule{0.2em}{0ex}}N=\frac{3}{2}{n}_{0}\phantom{\rule{0.2em}{0ex}}{\mathrm{log}}_{2}\phantom{\rule{0.2em}{0ex}}N,$^{28}we call this Saito’s MDL.

Note that the code length for encoding positions of coefficients in Saito’s MDL can be written as

## Eq. 31

$(1\u22152){n}_{0}\phantom{\rule{0.2em}{0ex}}{\mathrm{log}}_{2}\phantom{\rule{0.2em}{0ex}}N={n}_{0}[-{\mathrm{log}}_{2}\left(\frac{1}{N}\right)],$Therefore, we consider an alternative *a priori* distribution for the wavelet coefficients, whose probability varies across scale. Note that the new code should satisfy Kraft’s inequality, defined as:

^{37, 38}This is a basic constraint for pdf to be used in the MDL framework. Among a number of pdfs that have scale dependent probabilities, we select the universal prior for integers proposed by Rissanen:

^{39}

## Eq. 33

${P}_{u}\left(n\right)={2}^{-{L}_{u}\left(n\right)},\phantom{\rule{1em}{0ex}}n>0,\phantom{\rule{1em}{0ex}}{L}_{u}\left(n\right)={\mathrm{log}}_{2}^{*}\phantom{\rule{0.2em}{0ex}}n+{\mathrm{log}}_{2}\phantom{\rule{0.2em}{0ex}}c,c\approx 2.865064,$By extending this concept, it is reasonable to assign the same code length for wavelet coefficients within the same scale. A slight modification is required in order to give an identical probability to coefficients within the scale. Specifically, suppose ${m}_{j}$ denotes the number of wavelet coefficients in the $j$ ’th scale. The code length $\widehat{L}\left(j\right)$ from the modified universal prior corresponding to the $j$ ’th scale is then given by

where ${s}_{j}=1+{\sum}_{k=J}^{j+1}{m}_{k}$ denotes the starting index of the $j$ ’th scale wavelet coefficients, $J$ denotes the coarsest scale, and ${L}_{u}(\cdot )$ is defined in Eq. 33, respectively. This modified universal prior is illustrated in Fig. 1, which exhibits a staircaselike increasing behavior in the code length. The final MDL criterion can therefore be summarized as follows:## Eq. 35

$\mathit{MDL}\left({n}_{0}\right)=\frac{N}{2}\phantom{\rule{0.2em}{0ex}}{\mathrm{log}}_{2}\phantom{\rule{0.2em}{0ex}}{\widehat{\sigma}}^{2}\left({n}_{0}\right)+\frac{1}{2}{n}_{0}\phantom{\rule{0.2em}{0ex}}{\mathrm{log}}_{2}\phantom{\rule{0.2em}{0ex}}N+\sum _{j={J}_{0}}^{J}{m}_{j}\widehat{L}\left(j\right),$We have observed that both
${\mathrm{AIC}}_{C}$
and SIC tend to give an overfitted model for the NIRS signal, as described in the experimental results. This is because the number of temporal NIRS sequences is much larger than that of fMRI data, and it is known that the probability of overfitting is more than zero under quite general conditions, as
$N\to \infty $
in these priors.^{35} However, the MDL criterion does not exhibit such overfitting thanks to its asymptotic optimality.

## 4.4.

### Implementation Issues

In order to make at least one wavelet coefficient free of boundary artifacts, the maximum level of decomposition $J$ should allow at least $M$ coefficients at the last decomposition level, where $M$ denotes the support length of the wavelet. Under this constraint, the maximum level of wavelet decomposition is given by:

where $\lfloor X\rfloor $ denotes an operator that truncates $X$ to the nearest integers toward zero. In case the number of wavelet coefficients at the ${J}_{p}$ scale is larger than $M$ , we allow an additional level of wavelet decomposition by boundary extension of the data so that the number of wavelet coefficients at the coarsest level always becomes $M$ . This constraint allows us to use the same form of the modified universal prior of integers regardless of the length of the NIRS time series. More specifically, if the total number of data elongations is ${N}_{ext}=(2M-{m}_{{J}_{p}})\times {2}^{{J}_{p}}$ , where ${m}_{{J}_{p}}=\lfloor N\times {2}^{-{J}_{p}}\rfloor $ denotes the number of coefficients in the ${J}_{p}$ ’th scale, then we have ${m}_{({J}_{p}+1)}=\lfloor N\times {2}^{-({J}_{p}+1)}\rfloor =M$ . Therefore, the maximum decomposition level of the NIRS signal becomes $({J}_{p}+1)$ , and the number of wavelet coefficients at the finest scale is $M$ . For the specific boundary extension, we employ a symmetric extension.^{40}

In order to implement wavelet detrending, the wavelet should be compact. Daubechies wavelets and Symlets^{22} with small orders are therefore reasonable candidates. However, due to the small vanishing moment of these wavelets, the trend estimate often violates the assumption of smoothness. Hence, we selected a CDF
$9\u22157$
biorthogonal filter^{41} with a vanishing moment of 9, which gives a reasonable trade-off between the vanishing moment and the wavelet support. Note that CDF
$9\u22157$
is a standard wavelet filter in lossy compression of JPEG 2000 (Ref. 42), due to the compactness and sufficient vanishing moments.

## 5.

## Experimental Results

We performed NIRS experiments using Oxymon Mk III (Artinis, Netherlands), which has eight laser diodes and four detectors. In this system, two continuous wave lights ( $856\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ and $781\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ ) are emitted at each source fiber. A suitable arrangement using 24 pairs of a source and a detector are illustrated in Fig. 2a . Sources and detectors were attached by optical fibers on the scalp covering the motor cortex, as illustrated in Figs. 2b and 2c. The distance between the source and detector was $3.5\phantom{\rule{0.3em}{0ex}}\mathrm{cm}$ . A 3.0T MRI scanner (ISOL, Korea) was also used to simultaneously measure the BOLD signal. The echo planar imaging (EPI) sequence was used with $\mathrm{TR}\u2215\mathrm{TE}=3000\u221535\phantom{\rule{0.3em}{0ex}}\mathrm{ms}$ , flip $\text{angle}=80\phantom{\rule{0.3em}{0ex}}\mathrm{deg}$ , 35 slices, and $4\text{-}\mathrm{mm}$ slice thickness. In the subsequent anatomical scanning session, T1-weighted structural images were acquired using the same system.

To evaluate the proposed detrending algorithm, nine healthy, right-handed male adults were examined using a right-finger tapping (RFT-1) task. The $21\text{-}\mathrm{s}$ periods of activation were alternated with $30\text{-}\mathrm{s}$ periods of rest. During the activation periods, subjects were instructed to tap right-hand fingers. Total recoding time was $552\phantom{\rule{0.3em}{0ex}}\mathrm{s}$ . In addition, one subject was examined under the same conditions; however, in this time, the subject did not receive any stimulus during the recording period, which we call the baseline. This baseline data is used for a subsequent simulation study. No subject had any history of neurological disorders. All subjects were informed about the whole experiment process. The investigation was approved by the Institutional Review Board of Korea Advanced Institute of Science and Technology (KAIST).

## 5.1.

### Simulation Study

For comparison, two detrending methods based on the conventional filtering^{8} were implemented. First, finite impulse response (FIR) filters were designed using the Kaiser window method, whose cutoff frequencies were appropriately adjusted for each case.^{43} In addition, we also compared our method with DCT-based filtering, a standard detrending technique in SPM.^{15} DCT coefficients that are below the cutoff frequency were declared as trend estimates. Note that we utilized the symmetric extension at both ends of the NIRS data to reduce boundary distortion during filtering.

A simulated NIRS time series was constructed to have a global bias, a simulated task-related time series, and AR(1) noise (auto-regressive noise of order 1). Note that the autoregressive (order 1) plus white noise model is a standard model for serial correlations in fMRI-SPM (Refs. 15, 44) from the empirical perspectives, and the AR(1) model has successfully explained the short-range correlations.^{44} Hence, this simulation also employs the AR(1) model to account for the short-range correlations in NIRS time series. The global bias was extracted from the real NIRS experiment under the baseline condition using a low-pass filter whose stopband edge was
$0.02\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$
. The hemodynamic response was modeled by the canonical HRF with its derivatives. The extracted trend was normalized to [0,2], and the magnitude of the hemodynamic response was set to 1.05. The AR(1) noise
$\mathbf{\u03f5}$
was generated by
$\mathbf{\u03f5}\left[m\right]=\alpha \cdot \mathbf{\u03f5}[m-1]+\mathit{e}\left[m\right]$
, where
$\alpha $
was set to 0.8, and
$\mathit{e}$
is the vector whose elements are zero mean white Gaussian variables with variance
${\sigma}^{2}=3.6\times {10}^{-3}$
. The noise covariance matrix is then given by:

In Figs. 3a and 3b, the simulated NIRS time series and its components are illustrated. Black lines in Figs. 3c, 3d, 3f show global trend estimates using various detrending methods. The same cutoff frequency for simulating the ground-truth trend was used in Fig. 3d, which represents the case where the frequency content of a global trend is exactly known. In Fig. 3e, a FIR filter with a stopband edge of $0.015\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$ is used. Figure 3f is the result of DCT-based filtering whose cutoff frequency is $0.015\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$ . Even in the case where the exact cutoff frequency is known, as illustrated in Fig. 3d, it was hard to distinguish between the hemodynamic signal and the global trend when both signals have the same frequency contents. Indeed, the hemodynamic signal is an extremely low frequency signal considering the sampling frequency of NIRS, and hence it is sometimes similar to the global bias in its frequency content. We can easily see that the Wavelet-MDL-based detrending method (Wavelet-MDL) gives a much closer estimate for the unknown bias, as illustrated in Fig. 3c. For a quantitative analysis, Table 1 shows the $t$ -score using Eq. 11, which confirms that the proposed detrending method shows the best performance in estimating the hemodynamics signals.

## Table 1

t -scores with distinct detrending methods.

Wavelet-MDL | Kaiser-A | Kaiser-B | DCT | |
---|---|---|---|---|

$t$ -score | 73.22 | 16.89 | 47.57 | 45.73 |

Recall that the $\alpha $ value in the AR(1) model denotes the degree of correlation between the neighborhood samples. Even though this paper chose $\alpha =0.8$ to represent relatively strong correlation between the neighboring samples, similar behaviors have been obtained for a wide range of $\alpha $ values, and our Wavelet-MDL framework has been observed to outperform the other methods for all cases.

## 5.2.

### Experimental NIRS Data

We now apply our algorithm to real NIRS measurements. For calculating
$t$
-values, we used NIRS-SPM software, which has been developed by our group.^{11} NIRS-SPM allows the estimation of the temporal correlation and determines a
$p$
-value using the tube formula. For a detailed discussion on the estimation of the temporal correlation,
$p$
-value, and related threshold value, readers can refer to our previous work on this issue.^{11} In NIRS-SPM, the conventional filtering based on DCT is implemented. We set the cutoff frequency as
$1\u221560\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$
, which is below the frequency of RFT paradigm repetition frequency of
$0.018\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$
.

In Fig. 4a, the deoxy-hemoglobin (HbR)
$t$
-map from NIRS-SPM using Wavelet-MDL is illustrated. For clear comparison, the magnified figure is illustrated in Fig. 4b, whereas Fig. 4c shows the result of the conventional DCT filtering approach. The temporal covariance was estimated using a prewhitening method.^{11} While the task-related activation is observable in both cases, Wavelet-MDL provides a higher
$t$
-value centered at the target area. The maximum
$t$
-values were 13.95 and 13.16 for Wavelet-MDL and the DCT-based filtering, respectively. The higher
$t$
-values indicate that the proposed method provides a statistically more significant estimate of the activation map.

An oxyhemoglobin time series and trend estimates are illustrated in Fig. 5 . Since the design matrix is explicitly incorporated during the detrending process of Wavelet-MDL, as described in Eq. 20, the oxyhemoglobin signals were almost conserved, as shown in Fig. 5a. However, in the case of the conventional approach, some components of oxyhemoglobin signals were classified as global trends, as illustrated in Fig. 5b. Figure 6 shows another oxyhemoglobin signal that contains a rapidly varying bias. If we use the conventional detrending method, the residual of the fast-varying transient trend can be classified as an oxyhemoglobin signal. However, Wavelet-MDL can estimate the unknown trend even if it has high-frequency transient signal, since it is capable of capturing the transient signal and adjusting the complexity of a global trend automatically, as illustrated in Fig 6a. However, in Fig. 6b, the fast varying trend was not fully captured by the conventional approach, and the hemodynamic signal was damaged.

Last, we compared MDL with ${\mathrm{AIC}}_{C}$ , SIC, and Saito’s MDL. Figures 7b, 7c, 7d show that the estimated trends based on ${\mathrm{AIC}}_{C}$ , SIC, and Saito’s MDL result in overfitting, whereas our algorithm correctly captures the trends. This implies that the modified universal prior of the integer is effective in model order selection.

## 5.3.

### Group Analysis

Group analyses were performed using NIRS-SPM software. NIRS data as well as simultaneously recorded fMRI data from nine subjects were used. Figure 8 shows group activation maps for the HbR signal at $p=0.05$ , which is overlayed on the fMRI activation map at $p=0.005$ . The Wavelet-MDL detrending method gives more specific localization of the target motor cortex compared to the conventional DCT-based high-pass filtering. To quantify the accuracy of localization, we calculate the receiver operating characteristic (ROC) for the spatial correlation of the NIRS activation map with that of fMRI BOLD. An ROC curve is a graphical plot of “sensitivity” versus “1-specificity” for a binary classifier. In ROC, (0,1) is the point of an ideal classifier, and the diagonal line represents the performance of random guessing. Therefore, the area under the ROC curve is a good indicator of the performance of a classifier. We evaluated our detrending method by measuring the spatial correlation between NIRS activation maps and the fMRI BOLD map. The BOLD map with $p=0.005$ was assumed as the ground truth in Fig. 9 . The sensitivity and specificity were calculated by changing the threshold values. The ROC analysis in Fig. 9 shows that the proposed detrending algorithm outperforms the conventional high-pass filtering since the area under the ROC curve is largest for Wavelet-MDL.

## 6.

## Discussion

## 6.1.

### Temporal Covariance in MDL

In Eq. 26, we assume that the noise is uncorrelated, independent, and normally distributed. For the correlated noise, the covariance matrix
$\mathbf{\Sigma}$
should be considered in Eqs. 26, 27, 28. Specifically, consider the negative log-likelihood for measurement in the wavelet domain. Let
$\mathbf{r}\left({n}_{0}\right)$
denote the residual vector for model order
${n}_{0}$
in the wavelet domain, i.e.,
$\mathbf{r}\left({n}_{0}\right)=\mathit{W}(\mathit{y}-\mathit{X}\mathbf{\beta}-\mathbf{\theta})$
. Since the dependency for
$\mathbf{r}$
on the model order
${n}_{0}$
is clear in the context, we omit the letter
${n}_{0}$
in the sequel. If we model the correlated noise using the fractional Brownian motion process^{26} as in the fMRI case,^{16} the pdf of measurement
$\mathit{y}$
in the wavelet domain is given by

## Eq. 37

$p(\mathit{W}\mathit{y}\mid {n}_{0})=\frac{1}{{\left(2\pi \right)}^{2N}{\mid \mathbf{\Sigma}\mid}^{1\u22152}}\phantom{\rule{0.2em}{0ex}}\mathrm{exp}(-\frac{1}{2}{\mathit{r}}^{T}{\mathbf{\Sigma}}^{-1}\mathit{r}),$## Eq. 38

$-\mathrm{log}\phantom{\rule{0.2em}{0ex}}p(\mathit{W}\mathit{y}\mid {n}_{0})=\frac{1}{2}\sum _{j=1}^{J}[{N}_{j}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\phantom{\rule{0.2em}{0ex}}{\sigma}_{j}^{2}\left({n}_{0}\right)+{\sigma}_{j}^{-2}\left({n}_{0}\right){\mathit{r}}_{j}^{T}{\mathit{r}}_{j}]+c,$## Eq. 39

${\widehat{\sigma}}_{j}^{2}\left({n}_{0}\right)=\frac{1}{{N}_{j}}{\mathit{r}}_{j}^{T}{\mathit{r}}_{j},$## Eq. 40

$-\mathrm{log}\phantom{\rule{0.2em}{0ex}}p(\mathit{W}\mathit{y}\mid {n}_{0})=\sum _{j=1}^{J}\frac{{N}_{j}}{2}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\phantom{\rule{0.2em}{0ex}}{\widehat{\sigma}}_{j}^{2}\left({n}_{0}\right)+{c}^{\u2033},$^{24, 45, 46}where $\alpha $ is the spectral decay rate of the correlated noise model and $\tau $ denotes a constant that reflects the overall noise variance. We can estimated $\tau $ in the zeroth, decomposition level, i.e., in the time domain, as follows:

## Eq. 42

${\widehat{\tau}}^{2}\left({n}_{0}\right)={\widehat{\sigma}}_{0}^{2}\left({n}_{0}\right)=\frac{1}{N}{\Vert \mathit{y}-\mathit{X}\mathbf{\beta}-\mathbf{\theta}\left({n}_{0}\right)\Vert}^{2}.$## Eq. 43

$-\mathrm{log}\phantom{\rule{0.2em}{0ex}}p(\mathit{W}\mathit{y}\mid {n}_{0})=\frac{N}{2}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\phantom{\rule{0.2em}{0ex}}{\widehat{\sigma}}_{0}^{2}\left({n}_{0}\right)+{c}^{\u2034},$## 6.2.

### $t$ -Value versus Activation Map

Recall that
$t$
-value is often used to quantify the performance of the detrending algorithms.^{16} However, more correct quantification should be based on the accuracy of the activation map at the same
$p$
-value. For example, consider a case where a fixed
$p$
-value provides an identical threshold value. Then, in obtaining the activation maps, the overall increase in
$t$
-values may imply that the resulting activation map becomes significantly larger. Hence, to guarantee the accurate localization, the higher contrast of
$t$
-value between activated and background region is more important. In this perspective, our group analysis result confirms that wavelet-MDL detrending is more accurate in localizing the activation maps.

## 6.3.

### Precoloring versus Prewhitening

In Ref. 11, we have proposed two different methods to remove the temporal covariance: precoloring and prewhitening. Precoloring tries to remove the temporal correlation by filtering with HRF filter, whereas prewhitening attempts to estimate
$\mathbf{\Sigma}$
based on the assumption that the noise is the AR(1) process. In this perspective, the prewhitening provides more accurate estimation of
$\mathbf{\Sigma}$
as long as the AR(1) assumption is correct. The
$t$
-value in Fig. 4 was hence calculated using prewhitening. However, as demonstrated in Ref. 11, precoloring was more robust in removing the temporal correlation, and the final activation maps from precoloring were observed to be better due to the limited number of measurements compared to its fMRI counterpart.^{11} Hence, this paper calculates the group activation map using the precoloring method, as in. Ref. 11.

## 7.

## Conclusion

We developed a Wavelet-MDL-based detrending method that is robust under motion and physiological variation. In Wavelet-MDL detrending, a wavelet transform is applied to the NIRS time series to decompose it into bias, hemodynamic signal, and noise components in distinct scales. With the MDL criterion using a modified universal prior for the integer, the optimal model order could be easily estimated, and the unknown drift signal in NIRS data was successfully removed. Experimental results demonstrated that the new detrending algorithm outperforms the conventional approaches.

## Acknowledgments

This work was supported by the IT R&D program of MKE/IITA [2008-F-021-01]. K. E. Jang gratefully acknowledges support from a Kim Bo Jung Scholarship. K. E. Jang is currently at Samsung Advanced Institute of Technology.

## References

**,” Science, 198 1264 –1267 (1977). https://doi.org/10.1126/science.929199 0036-8075 Google Scholar**

*Noninvasive, infrared monitoring of cerebral and myocardial oxygen sufficiency and circulatory parameters***,” Neuroimage, 30 (1), 88 –101 (2006). https://doi.org/10.1016/j.neuroimage.2005.09.016 1053-8119 Google Scholar**

*Dynamic physiological modeling for functional diffuse optical tomography***,” Psychophysiology, 40 511 –520 (2003). https://doi.org/10.1111/1469-8986.00053 0048-5772 Google Scholar**

*Functional near-infrared optical imaging: utility and limitations in human brain mapping***,” Med. Biol. Eng. Comput., 26 289 –294 (1988). https://doi.org/10.1007/BF02447083 0140-0118 Google Scholar**

*System for long-term measurement of cerebral blood and tissue oxygenation on newborn infants by near infra-red transillumination***,” Adv. Exp. Med. Biol., 333 9 –20 (1993). 0065-2598 Google Scholar**

*Wavelength dependence of the differential pathlength factor and the log slope in time-resolved tissue spectroscopy***,” Phys. Med. Biol., 40 295 –304 (1995). https://doi.org/10.1088/0031-9155/40/2/007 0031-9155 Google Scholar**

*Optical pathlength measurements on adult head, calf, and forearm and the head of the newborn infant using phase resolved optical spectroscopy***,” Phys. Med. Biol., 47 (12), 2075 –2093 (2002). https://doi.org/10.1088/0031-9155/47/12/306 0031-9155 Google Scholar**

*Maps of optical differential pathlength factor of human adult forehead, somatosensory motor, and occipital regions at multi-wavelengths in NIR***,” Neuroimage, 21 283 –290 (2004). https://doi.org/10.1016/j.neuroimage.2003.09.054 1053-8119 Google Scholar**

*Towards a standard analysis for functional near-infrared imaging***,” Neuroimage, 35 625 –634 (2007). https://doi.org/10.1016/j.neuroimage.2006.11.028 1053-8119 Google Scholar**

*Model-based analysis of rapid event-related functional near-infrared spectroscopy (NIRS) data: a parametric validation study***,” J. Biomed. Opt., 12 1 –13 (2007). https://doi.org/10.1117/1.2804092 1083-3668 Google Scholar**

*Functional optical signal analysis: a software tool for near-infrared spectroscopy data processing incorporating statistical parametric mapping***,” Neuroimage, 44 (2), 428 –447 (2009). https://doi.org/10.1016/j.neuroimage.2008.08.036 1053-8119 Google Scholar**

*NIRS-SPM: statistical parametric mapping for near-infrared spectroscopy***,” Neuroimage, 2 173 –181 (1995). https://doi.org/10.1006/nimg.1995.1023 1053-8119 Google Scholar**

*Analysis of fMRI time-series: revisited again***,” Ann. Probab., 21 (1), 34 –71 (1993). https://doi.org/10.1214/aop/1176989393 0091-1798 Google Scholar**

*Tail probabilities of the maxima of Gaussian random fields***,” Ann. Stat., 27 (3), 925 –942 (1999). https://doi.org/10.1214/aos/1018031263 0090-5364 Google Scholar**

*The geometry of the Hotellings random field with applications to the detection of shape changes***,” IEEE Trans. Med. Imaging, 22 (3), 315 –322 (2003). https://doi.org/10.1109/TMI.2003.809587 0278-0062 Google Scholar**

*Wavelet-based estimation of a semiparametric generalized linear model of fMRI time-series***,” Automatica, 14 (5), 465 –471 (1978). 0005-1098 Google Scholar**

*Modeling by shortest data description***,” Hum. Brain Mapp, 2 (4), 189 –210 (1995). https://doi.org/10.1002/hbm.460020402 1065-9471 Google Scholar**

*Statistical parametric maps in functional imaging: a general linear approach***,” Comput. Biomed. Res., 29 162 –173 (1996). https://doi.org/10.1006/cbmr.1996.0014 0010-4809 Google Scholar**

*AFNI: software for analysis and visualization of functional magnetic resonance neuroimages***,” J. Neurosci., 16 4207 –4221 (1996). 0270-6474 Google Scholar**

*Linear systems analysis of functional magnetic resonance imaging in human V1***,” Neuroimage, 15 83 –97 (2002). https://doi.org/10.1006/nimg.2001.0940 1053-8119 Google Scholar**

*Detecting latency differences in event-related BOLD responses: application to words versus nonwords and initial versus repeated face presentations***,” J. R. Stat. Soc. Ser. B (Methodol.), 59 (2), 319 –351 (1997). https://doi.org/10.1111/1467-9868.00071 0035-9246 Google Scholar**

*Wavelet threshold estimators for data with correlated noise***,” IEEE Trans. Inf. Theory, 38 (2), 910 –917 (1992). https://doi.org/10.1109/18.119751 0018-9448 Google Scholar**

*Wavelet analysis and synthesis of fractional Brownian motion***,” Neuroimage, 37 (4), 1205 –1217 (2007). https://doi.org/10.1016/j.neuroimage.2007.06.011 1053-8119 Google Scholar**

*WSPM: wavelet-based statistical parametric mapping***,” SIAM Rev., 10 (4), 422 –437 (1968). https://doi.org/10.1137/1010093 0036-1445 Google Scholar**

*Fractional Brownian motions, fractional noises, and applications***,” Biometrika, 81 (3), 425 –455 (1994). https://doi.org/10.1093/biomet/81.3.425 0006-3444 Google Scholar**

*Ideal spatial adaptation by wavelet shrinkage***,” Wavelets Geophys., 299 –324 1994). Google Scholar**

*Simultaneous noise suppression and signal compression using a library of orthonormal bases and the minimum description length criterion***,” IEEE Trans. Inf. Theory, 46 (7), 2537 –2543 (2000). https://doi.org/10.1109/18.887861 0018-9448 Google Scholar**

*MDL denoising***,” Neuroimage, 7 30 –40 (1998). https://doi.org/10.1006/nimg.1997.0306 1053-8119 Google Scholar**

*Event-related fMRI: characterizing differential responses***,” Biometrika, 76 (2), 297 –307 (1989). https://doi.org/10.1093/biomet/76.2.297 0006-3444 Google Scholar**

*Regression and time series model selection in small samples***,” Ann. Stat., 6 461 –464 (1978). https://doi.org/10.1214/aos/1176344136 0090-5364 Google Scholar**

*Estimating the dimension of a model***,” IEEE Trans. Autom. Control, AC-19 (6), 716 –723 (1974). https://doi.org/10.1109/TAC.1974.1100705 0018-9286 Google Scholar**

*A new look at the statistical model identification***,” Ann. Math. Stat., 22 (1), 79 –86 (1951). https://doi.org/10.1214/aoms/1177729694 0003-4851 Google Scholar**

*On information and sufficiency***,” IEEE Signal Process. Mag., 21 36 –47 (2004). https://doi.org/10.1109/MSP.2004.1311138 1053-5888 Google Scholar**

*Model-order selection: a review of information criterion rules***,” Probl. Inf. Transm., 1 4 –7 (1965). 0032-9460 Google Scholar**

*Three approaches to the quantitative definition of information***,” J. Am. Stat. Assoc., 96 (454), 746 –774 (2001). https://doi.org/10.1198/016214501753168398 0162-1459 Google Scholar**

*Model selection and the principle of minimum description length***,” Ann. Stat., 11 (2), 417 –431 (1983). https://doi.org/10.1214/aos/1176346150 0090-5364 Google Scholar**

*A universal prior for integers and estimation by minimum description length***,” Wavelets in Medicine and Biology, 37 –73 1996). Google Scholar**

*A practical guide to the implementation of the wavelet transform***,” Commun. Pure Appl. Math., 45 (5), 485 –560 (1992). https://doi.org/10.1002/cpa.3160450502 0010-3640 Google Scholar**

*Biorthogonal bases of compactly supported wavelets***,” IEEE Signal Process. Mag., 18 (5), 22 –35 (2001). https://doi.org/10.1109/79.952803 1053-5888 Google Scholar**

*A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000***,” Hum. Brain Mapp, 6 (4), 239—249 (1998). https://doi.org/10.1002/(SICI)1097-0193(1998)6:4<239::AID-HBM4>3.0.CO;2-4 1065-9471 Google Scholar**

*Effect of temporal autocorrelation due to physiological noise and stimulus paradigm on voxel-level false-positive rates in fMRI***,” Hum. Brain Mapp, 12 (2), 61 –78 (2001). https://doi.org/10.1002/1097-0193(200102)12:2<61::AID-HBM1004>3.0.CO;2-W 1065-9471 Google Scholar**

*Colored noise and computational inference in neurophysiological (fMRI) time series analysis: resampling methods in time and wavelet domains***,” IEEE Trans. Signal Process., 53 (9), 3436 –3448 (2005). https://doi.org/10.1109/TSP.2005.853207 1053-587X Google Scholar**

*Penalized partially linear models using sparse representations with an application to fMRI time series*