## 1.

## Introduction

Near-infrared diffuse correlation spectroscopy (DCS), also known as diffusing-wave spectroscopy (DWS), is established on the well-known spectral window in the near-infrared (NIR, 650 to 900 nm) range, wherein the relatively low biological tissue absorption enables deep penetration of light. It is an emerging technique for continuous noninvasive measurement of blood flow in biological tissues. In the last decade or so, DCS technology has been developed,^{1}2.^{–}^{3} extensively validated, and vigorously employed to noninvasive probe the blood flow information in deep tissue vasculature, such as brain,^{4}5.6.7.8.9.10.11.12.13.14.15.16.17.18.^{–}^{19} muscle,^{20}21.22.23.24.^{–}^{25} and breast.^{26}^{,}^{27} Compared to other blood flow measurement techniques, such as positron emission tomography (PET), single photon emission computed tomography (SPECT), and xenon-enhanced computed tomography (XeCT), DCS uses nonionizing radiation and needs no contrast agents. In contrast to dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) and arterial spin labeling MRI (ASL-MRI), it has no interference with commonly used medical devices, such as pacemaker and metal implants.^{28} Therefore, it has also been applied to cancer therapy monitoring^{29}30.31.^{–}^{32} and bedside monitoring in clinical settings.^{33}

The typical setup of the DCS technique consists of a laser with along coherence length as the light source, a photon-counting avalanche photodiode (APD) or photomultiplier tube (PMT) as the detector, and a hardware autocorrelator. As one of the critical components, the autocorrelator computes the autocorrelation function of the temporal fluctuation of the light intensity, which was subject to multiple scattering events before emerging from the investigated position on the sample surface. There are many commercial correlators available, such as the ones manufactured by ALV (Langen, F. R. Germany) and by Correlators.com (Bridge-water, New Jersey). To date, most published works have used these hardware autocorrelators for autocorrelation calculations,^{4}^{,}^{5}^{,}^{7}^{,}^{13}^{,}^{20}^{,}^{25}^{,}^{27}^{,}^{32}^{,}^{34} although a few of them had designed a multitau software correlator for dynamic light scattering and validated it by particle size measurement.^{35}^{,}^{36} The hardware correlators utilized multitau scheme, so they could be operated at a fast sampling speed thus offered real-time computing across a wide range of lag times from several nanoseconds to hours. However, they are relatively costly and not flexible since the fixed number of bits per channel results in a fixed lag time scale. Meanwhile, the software autocorrelator mentioned earlier was capable of measuring correlation functions over a time scale of $\sim 5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{\mu s}$ in real time,^{35}^{,}^{36} but it was not capable of recording the raw photon-count signal and therefore had limitations in terms of preprocessing of the raw photon count and post-measurement sanity checking.

In this paper, we demonstrate an alternative approach of DCS analysis using a software correlator based on fast Fourier transform (FFT). It is cost-effective, flexible, and can be easily implemented with other technologies. Furthermore, the ability of recording raw photon-count signal holds the potential of signal preprocessing before autocorrelation calculation, such as filtering out the noise caused by fluctuation of the laser source. It also provides room for investigating the methods other than autocorrelation to extract flow information from raw signals. To achieve data acquisition, recording, and autocorrelation calculation, we used the graphical programming language LabVIEW (National Instruments), which is one of the most popular and powerful tools for signal acquisition and processing. The FFT, which is one of the most efficient algorithms known in the history of signal processing, was applied for autocorrelation calculation. The sampling rate of the system can reach up to $\sim 400\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{kHz}$ and thus enables the minimum lag time of $\sim 2.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{\mu s}$, which is significantly shorter than the decay constant (between tens to hundreds of microseconds) in DCS applications. Through comparison with a simulated hardware autocorrelator that works the same way as a hardware correlator (correlator.com) does, we evaluated the performance of the FFT-based software autocorrelator. For the purpose of validation, in-house flow phantom experiment, a human arm cuff occlusion experiment as well as a photodynamic therapy (PDT) response monitoring experiment were conducted.

This paper is organized as follows. In Sec. 2, we recall the theoretical background of DCS. In Sec. 3, a detailed description of our FFT-based software autocorrelator, as well as comparison with a simulated hardware autocorrelator are presented. The experimental setup for both phantom and *in vivo* experiments are described in Sec. 4, followed by an interpretation of the experimental results in Sec. 5. The concluding remarks are presented in Sec. 6.

## 2.

## Theoretical Background of DCS

The biological tissue can be characterized by its optical properties, such as the absorption coefficient ${\mu}_{a}$ and the reduced scattering coefficient ${\mu}_{s}$’. When diffusing photons scatter from moving scatterers, they experience phase shifts and cause the speckle fluctuation at the detector side. The motion information below the tissue surface is carried in the electric field of diffuse light $E(\overrightarrow{r},\tau )$ and can be extracted from the electric field autocorrelation function, which is defined as ${G}_{1}(\overrightarrow{r},\tau )=\phantom{\rule{0ex}{0ex}}\langle E(\overrightarrow{r},t){E}^{*}(\overrightarrow{r},t+\tau )\rangle $. It has been shown that ${G}_{1}(\overrightarrow{r},\tau )$ satisfies a correlation diffusion equation, i.e.,^{1}^{,}^{3}^{,}^{4}^{,}^{37}

## (1)

$$[-\frac{1}{3{\mu}_{s}^{\prime}}{\nabla}^{2}+{\mu}_{a}+\frac{1}{3}\alpha {\mu}_{s}^{\prime}{k}_{0}^{2}\langle \mathrm{\Delta}{r}^{2}(\tau )\rangle ]{G}_{1}(\overrightarrow{r},\tau )=S(\overrightarrow{r}),$$^{38}

^{,}

^{39}where ${D}_{B}$ is the effective diffusion coefficient of the scatterers. Meanwhile, the random flow model assumes the random ballistic motion of scatterers and $\langle \mathrm{\Delta}{r}^{2}(\tau )\rangle =\langle {V}^{2}\rangle {\tau}^{2}$ (Refs. 40 and 41), where $\langle {V}^{2}\rangle $ represents the mean square velocity of the scatterers. Experimentally, the diffuse light electric field autocorrelation function is usually derived from the measured normalized intensity autocorrelation ${g}_{2}(\overrightarrow{r},\tau )=\langle I(\overrightarrow{r},t)I(\overrightarrow{r},t+\tau )\rangle /{\langle I\rangle}^{2}$ by using the Siegert relation:

^{4}

^{,}

^{13}

^{,}

^{42}

## (2)

$${g}_{2}(\overrightarrow{r},\tau )=1+\beta \frac{{|{G}_{1}(\overrightarrow{r},\tau )|}^{2}}{{\langle I(\overrightarrow{r},t)\rangle}^{2}},$$^{4}

^{,}

^{43}

## (3)

$${G}_{1}(\rho ,\tau )=\frac{3{\mu}_{s}^{\prime}}{4\pi}(\frac{{e}^{-{k}_{D}{r}_{1}}}{{r}_{1}}-\frac{{e}^{-{k}_{D}{r}_{2}}}{{r}_{2}}),$$^{4}

^{,}

^{7}8.

^{–}

^{9}

^{,}

^{11}12.13.

^{–}

^{14}

^{,}

^{17}to muscle

^{20}

^{,}

^{23}and tumor models.

^{27}

^{,}

^{29}

^{,}

^{30}

^{,}

^{32}Therefore, the measurements of ${G}_{1}(\overrightarrow{r},\tau )$ can be fitted to yield a blood flow index ($\mathrm{BFI}=\alpha {D}_{B}$) to parameterize the relative blood flow.

The BFI and $\beta $ were obtained by minimizing the difference between the analytical model of the autocorrelation in the reflectance geometry ${g}_{1,m}(\overrightarrow{r},\tau )$ and the measured autocorrelation function ${g}_{1,\mathrm{exp}}(\overrightarrow{r},\tau )$. The optimization of the objective function ${\chi}^{2}=\sum _{}^{}{[{g}_{1,m}(\overrightarrow{r},\tau )-{g}_{1,\mathrm{exp}}(\overrightarrow{r},\tau )]}^{2}$ was done by using Matlab (Mathwork, Inc.). The relative blood flow (rBF), which is the deviation of BFI with respect to the baseline, is reported in the unit of percent.

## 3.

## Data Acquisition and Processing

The DCS system consisted of a continuous-wave laser with a long coherence length ($>10\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{m}$) operating at 785 nm (DL785-100-S, $\sim 100\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mW}$, CrystaLaser, Reno, Nevada, USA) as the source and a photon-counting APD (SPCM-AQRH-13-FC, Perkin-Elmer, Vudreuil-Dorion, Quebec, Canada), whose output was a stream of transistor-transistor logic (TTL) pulses, as the detector. The light from the source was coupled to a multi-mode optical fiber (200-*μ*m diameter) and contacted with a sample surface. The detecting fiber was single-mode operated at 785 nm (9-*μ*m diameter), which was located several centimeters away from the source fiber and fed to the APD. The TTL output generated by the APD was connected to the 32-bit, eight-input channel counter/timer board with the maximum input rate of 80 MHz (PCI-6602, National Instruments). The counter/timer board was connected to a PC (CPU: Intel Core 2 Duo, RAM: 3-Gbyte) through a shielded I/O connector block for DAQ devices (SCB-68, National Instruments).

## 3.1.

### Data Acquisition Principle

The performance was controlled by LabVIEW (National Instruments), which is one of the most popular and powerful program tools and suitable for interactive data acquisition and signal processing. The TTL output of the detector was counted over sampling time $\tau $ by the counter/timer board. The PCI-6602 has a timebase of 80 MHz, which means the fastest input pulses that can be counted are 12.5 ns ($1/80,000,000\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{s}$) apart. The PCI-6602 is the 32-bit counter and can count up to 4,294,967,295 (${2}^{32}-1$) before it rolls over. The counter can count the events continuously and store the data in the buffer. Due to the limitation of buffer size, time resolution of anything less than 2 *μ*s would fill up the buffer in less than 1 s. For the purpose of stable data acquisition, we conducted experiments using 2.5 *μ*s. In the data acquisition program, we applied “queue” structure to link the “producer” loop and the “consumer” loop, which were in charge of reading photon counts and writing results to a file, respectively. By using this kind of structure, the two loops were able to work in parallel so that a faster speed can be achieved. Simultaneously, another LabVIEW program detected the latest written photon-counting file and calculated the autocorrelation function by FFT algorithm. The result was then displayed and saved to a specified folder. Running on our current system, one core of CPU was fully occupied to acquire the data, while the other core was used for the autocorrelation calculation.

## 3.2.

### Data Processing

In this section, the working principle of the hardware correlator board will be briefly described, based on the instruction manual of a most commonly used commercial product (Flex99R-12) from Correlator.com. The algorithm behind the hardware correlator will be simulated by the software approach, and its performance will be compared against our new FFT-based approach in a later section.

## 3.2.1.

#### Simulating hardware autocorrelator board

### Inside the hardware autocorrelator board

The typical hardware correlators such as the ones manufactured by ALV and Correlator.com are based on a multitau scheme, and have a multiple tier structure which contains a certain amount of registers as shown in Fig. 1. If we take the one from Correlator.com for instance, the first tier has 32 registers, and higher tiers have 16 registers each. The incoming photon count starts filling up the registers in the first tier from the left with a given delay time ${T}_{0}$ (typically 160 ns). When it reaches the right end, the second tier fills up from the left at a twice as slow rate, since the sum of two register values in the right end of the first tier constructs the first register value of the second tier. Likewise, the first register value of nth tier is constructed by sum of the last two register values in ($n-1$)th tier, which means the registers of $n$’th tier is updated every ${2}^{n-1}{T}_{0}$. The hardware correlator can have upto 31 tiers, which means there are $512\text{\hspace{0.17em}}[=(1\times 32)+(30\times 16)]$ registers with 512 different delay times. However, registers at the 31st tier updates every ${2}^{30}{T}_{0}$, which is too long with a practical ${T}_{0}$ value of $\sim 100\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ns}$, so later tiers are hardly used in practice.

For each shift that occurs in the register, the unnormalized intensity autocorrelation coefficient for the ith register, ${G}_{2}({t}_{i})$, is calculated as the average value of ${n}_{i}\xb7{n}_{0}$, where ${n}_{i}$ indicates the photon count in the ith register, ${n}_{0}$ is the photon count at zero delay time with the same bin width as the ith register, and ${t}_{i}$ is the delay time between ${n}_{i}$ and ${n}_{0}$.^{28} The delay time ${t}_{i}$ is calculated as the summation of all the bin widths on the left of the ith register. With a few hundred register channels (512 channels for a 31 tier system, for example), a delay time that ranges from hundred of nanoseconds to minutes can be achieved, greatly reducing the computational load compared to the linear autocorrelator.

### Software implementation

The above mentioned hardware correlator algorithm is implemented in Matlab, in order to examine its behavior with a known input signal, generated either by simulation or experiment. The number of tiers, $N$, was given as a variable (normally set at around 10), and a cell array variable with a size of $16(N+1)$ [32 for 1st tier, and $16(N-1)$ afterwards] was allocated for register channels. The Matlab script initiates a loop, where the register channels in the first tier shifts to the right by one register per iteration and a new datum is read in the left-most register. When the iteration number reaches ${2}^{n+1}$, registers in $n$’th tier become shifted to the right by one register, after which a new value [the sum of two register values at the end of ($n-1$)’th tier] is assigned to the left-most register in $n$’th tier. Figure 2 shows a snapshot of the register profile in different tiers, where the register values are represented in a logarithmic scale to better visualize the wide-range of the values.

When the register fully fills up, the autocorrelation coefficient between ith register and the 0th register (with the same bin width, as described previously in the hardware correlator section) is calculated. The autocorrelation curve for whole register is calculated for each shift event in the last tier, and they are averaged over hundreds of trials before being plotted.

## 3.2.2.

#### FFT-based software autocorrelator

The unnormalized intensity autocorrelation, represented as ${G}_{2}(\tau )=\langle I(t)I(t+\tau )\rangle $, resembles the convolution operation, except that in the convolution calculation, the second term being multiplied has to be the time-reversal of the original function $I(t)$, namely $I(-t)$. Therefore, the same autocorrelation can be obtained by convolution between two functions $I(t)$ and $I(-t)$. Then, by the convolution theorem, one can obtain the autocorrelation function based on the Fourier transform as follows:

## (4)

$${G}_{2}(\tau )=(I(t)*I(-t))(\tau )={F}^{-1}[F\{I(t)\}F\{I(-t)\}]\phantom{\rule{0ex}{0ex}}={F}^{-1}\{\tilde{I}(\upsilon )\tilde{I}(\upsilon {}^{*})\}={F}^{-1}\{{|\tilde{I}(\upsilon )|}^{2}\},$$Since the Fourier transform can be very efficiently implemented by the fast Fourier transform (FFT), one can obtain the linear autocorrelation function in a very fast and stable manner using above equation.

Figure 3 shows the comparison between the two normalized intensity autocorrelation ${g}_{2}(\tau )$ calculations, one from the multi-tau approach simulating the hardware correlator board, and the other from the FFT-based linear approach. The data used in Fig. 3(a) was from a flow phantom experiment, acquired by the NI counter board (PCI-6602) at the sampling rate of 400 kHz. For the FFT-based approach, the data size used was about $4\times {10}^{5}$, covering about 1 s. For the hardware simulating approach, the delay time was only calculated upto about 30 ms due to the small number of tiers (10 tiers were used), but it was averaged over 500 autocorrelation calculations, and the total length of data used is about the same. The FFT-based calculation always ends up showing a spike near the highest delay time, but that is due to the cyclic nature of the FFT-based result, and our visualization in Fig. 3 only shows the first half, thereby removing the unwanted peak. Both of them show a nice convergence to the value of 1 for long delay times, and they nicely overlap with each other. For the experimental data in general, we were able to observe a trend that the hardware (multitau) algorithm tends to show higher fluctuation for small delay times, whereas the FFT-based algorithm always show the smoothest behavior for small delay times due to its single-tau calculation.

The comparison was also performed using simulated data based on a Markov chain with a Gaussian noise, and the result was always overlapping unless the average count value of the data was too low. When the average was low, we were able to regain the equivalence between the two methods by binning the data, thereby increasing the average photon count value. Typical autocorrelation curves for simulated data are shown in Fig. 3(b).

## 4.

## Experiment

## 4.1.

### In-House Flow Phantom Experiment

Phantom, which is the tissue-simulating object to mimic the properties of human or animal tissues, has played an important role in the evolvement of diagnostic systems, and most physical therapeutic interventions. It can be used for the purposes of initial testing of system design, optimizing signal to noise ratio (SNR) in existing system, and so on.^{44} In order to validate and assess the ability of our system, a flow phantom was built as shown in Fig. 4.

The phantom making procedure can be found elsewhere.^{45} Briefly, RTV 12A was the base compound, RTV 12C (Momentive Performance Materials) was the curing agent, while carbon black and ${\mathrm{TiO}}_{2}$ were mixed to adjust the absorption coefficient and scattering coefficient, respectively. The optical properties of the solid flow phantom were ${\mu}_{s}^{\prime}=8\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{cm}}^{-1}$ and ${\mu}_{a}=0.03\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{cm}}^{-1}$ (measured by OxiplexTS, ISSInc, at 830 nm), which satisfied the ${\mu}_{s}^{\prime}\gg {\mu}_{a}$ criterion, and the photon diffusion equation remains valid.

The schematic of the DCS experimental setup is shown in Fig. 5. The Lipofundin N 20% (B.BraunMelsungen AG, Germany) with a concentration of 0.6% was pumped through the cylindrical tube, with flow speeds of $0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ml}/\mathrm{s}$, $0.05\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ml}/\mathrm{s}$, $0.125\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ml}/\mathrm{s}$, $0.2\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ml}/\mathrm{s}$, and $0.325\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ml}/\mathrm{s}$. The source-detector pair was centered over the solid phantom surface at a distance of 1 cm from the flow tube surface. Reflection geometry was adopted in which incident light was injected into the phantom by the source fiber and detected 3.0 cm away along with the cylindrical tube direction.

## 4.2.

### In Vivo Cuff Occlusion Experiment on a Human Arm

After the flow phantom experiment, we conducted the validation study on a human subject. The *in vivo* cuff occlusion experiment was done on the arm of a 28 year-old healthy male subject. In order to vary the blood flow speed, a cuff on the subject’s upper arm was constricted for 15 s for temporary arterial occlusion. The source-detector separation was 1.5 cm and the probes were located on the forearm. The cuff inflation pressure was 200 mmHg.

## 4.3.

### Photodynamic Therapy Response Monitoring

Besides the flow phantom and *in vivo* human arm cuff occlusion experiments, we also applied our instrument to a lab-based PDT experiment for evaluation. Since the antivascular effects of PDT is a key indicator of tissue response to treatment,^{29} we aim to provide instantaneous hemodynamic feedback on treatment response. The *in vivo* PDT experiment was approved by the Institutional Animal Care and Use Committee of Singapore Health Services Pte Ltd and carried out at the National Cancer Centre Singapore. A Balb/c mouse bearing xenograft tumors on the flanks was used as an animal model. The mouse received chlorin e6 (Ce6) as the photosensitizer at a dose of $5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mg}/\mathrm{kg}$. Approximately 4 h after injection, the tumors were irradiated with laser light. The irradiation lasted 2450 s, consisting of alternating 250-s light exposures and 300-s dark intervals. The treatment light for PDT was delivered by a 665-nm medical-grade laser (Biolitec, Germany). The light dose used was $100\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{J}/{\mathrm{cm}}^{2}$, delivered at a rate of $80\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mW}/{\mathrm{cm}}^{2}$ via an optical fiber. The treatment light was expanded to a 2.5-cm diameter on the tumor surface. Our DCS measurement was carried out before and immediately after each exposure, as well as at 25 and 50 min after the last treatment. Source-detector separation was 1 cm, which fitted the size of the tumors.

## 5.

## Results and Discussion

## 5.1.

### In-House Flow Phantom Experiment

Figure 6 shows the normalized intensity autocorrelation functions in a semilogarithm plot with varying flow speed. It can be observed that the normalized intensity autocorrelation function ${g}_{2}(\tau )$ showed an initial constant plateau, but decayed exponentially with increasing delay time. As speed increased, the decay rate of the autocorrelation curve increased. If we recall the definition of autocorrelation, the finite data length will always cause the copy of the data shift out of the data window, thus eventually the autocorrelation function will go down to zero naturally. However, by employing an FFT, one can actually get the cyclic autocorrelation, thus it avoids the edge effect caused by data finiteness and offers a faster calculating speed. Ideally, the zero flow ($0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ml}/\mathrm{s}$) was not supposed to show any decay, but we observed a decay that is mainly attributed to the intrinsic fluctuation of the laser source and the room temperature Brownian motion of the Lipofund in remaining in the tubing. The autocorrelation function calculating speed was less than 20 milliseconds for 800,000 data points with the type of double-precision integer, which is faster than the data writing speed (new file generating speed) and enables the display of the autocorrelation curve in almost real-time. We note from the autocorrelation curves, that increasing the flow causes the plateau region of the curves to be shorter and makes it difficult to determine the $\beta $ value. Therefore, it might cause inaccuracy in measuring high flow, but one can obtain stable plateaus for most of the relevant flow ranges *in vivo*.

Since FFT-based autocorrelation calculation provided smoother starting and ending plateaus, it was obviously able to improve curve fitting. For better comparison between the Brownian motion and random flow model, we performed curve fitting by using both models. The Levenberg-Marquardt algorithm was applied in Matlab (Mathwork, Inc., *nlinfit* function). As indicated by Fig. 7, which was the autocorrelation function of the flow with speed of $0.325\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ml}/\mathrm{s}$, the Brownian motion model fit the experimental data better than the random flow model did. This result also suggested that our self-designed phantom is able to mimic the capillary flow of the human body. Therefore, we conducted curve fitting by using the Brownian motion model, and the blood flow index ($\mathrm{BFI}=\alpha {D}_{B}$) was obtained. The fitting result in general depends highly on the methods of data sampling. In order to obtain better fitting with a uniform weightage in the logarithmic time scale, we sampled the entire autocorrelation function equally on 2-base logarithm scale as shown in Fig. 7. This subsampling helps with faster fitting speed without compromising its accuracy.

In order to compare the goodness of fitting of the FFT-based software autocorrelator with that of the simulated hardware autocorrelator, we calculated a 95% confidence interval of fitted parameters by using an *nlparci* function in Matlab (Mathwork, Inc.,). As shown in Fig. 8, by using a speed of $0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ml}/\mathrm{s}$ as a base, relative $\alpha {D}_{B}$ and $\sqrt{\alpha \langle {V}^{2}\rangle}$ with their standard deviations at different flow speeds were calculated in both software and hardware cases. It can be found that in either the Brownian motion or random flow model, the standard deviations of the fitted parameter in FFT-based software autocorrelator were always smaller than in the hardware one. Concurrently, in the same autocorrelator, the parameter fitted for the Brownian motion had a smaller standard deviation than that for the random flow model and it also can be observed from individual fitting as shown in Fig. 7. Moreover, we performed a linear regression of the relative $\alpha {D}_{B}$ and $\sqrt{\alpha \langle {V}^{2}\rangle}$. It showed higher linearity to flow speed in the software autocorrelator case than in the hardware one. In addition, as mentioned previously, the autocorrelation calculating speed is faster than the data writing speed, our instrument holds the potential to make full use of this time to accomplish data fitting and provide the rBF values in real-time.

## 5.2.

### In Vivo Human Arm Cuff Occlusion Experiment

Figure 9(a) shows the normalized intensity autocorrelation function ${g}_{2}(\tau )$ of three stages of the *in vivo* cuff occlusion experiment, namely baseline, cuff inflation, and cuff deflation. The baseline curve was averaged over the first 15 s, the inflation curve was averaged over the 15th to 35th seconds, while the deflation curve was the average of only the 35th to 45th seconds. The decay rate of ${g}_{2}(\tau )$ shows significant change. Figure 9(b) reveals rBF, which was obtained by fitting ${g}_{2}(\tau )$ to the solution of the photon diffusion correlation equation. For the first 15 s, the blood flow was constant as no constriction was applied. Once the constriction was applied to the cuff, the blood flow dropped down sharply and when constriction was released, the blood flow immediately showed an overshoot followed by a slow decrease to the baseline due to homeostasis.

## 5.3.

### PDT Monitoring Experiment

We plotted the DCS monitoring result during the PDT treatment in Fig. 10. The color bars in the figure represent PDT light intervals.

The mean tumor rBF decreased by $\sim 50\%$ immediately after the whole PDT, and decreased to $\sim 45\%$ of the baseline 60 min after treatment. During each PDT dark interval, blood flow changes showed minor fluctuations. They indicate impermanent vessel occlusion during PDT light intervals. These temporary vessel occlusions can be reversible, and it allows flow recovery during the dark intervals. A similar trend was observed using laser Doppler measurements, although different animal models were used in each case.^{46} However, the results after 60 min suggest that permanent vessel occlusion had occurred due to PDT.

## 6.

## Conclusion

In this paper, we have demonstrated an implementation of DCS system with an FFT-based software autocorrelator. The operation of the system was controlled by LabVIEW. The software autocorrelator has undergone the validation and assessment by an in-house flow phantom experiment, *in vivo* cuff occlusion experiment on a human arm, as well as a PDT response monitoring experiment. It is easy to operate and relatively cost-effective. Besides, though the data sampling speed is moderate, the minimum lag time is much shorter than the decay constant, thus the extraction of the desired parameter is still valid. Furthermore, smoother starting and ending plateaus improve the accuracy in data fitting, and hence in the blood flow index. In addition, the system still holds the potential of being a real-time flow indicator and can be used for future DCS systems in research and clinical settings.

## Acknowledgments

This work is supported by the Singapore Ministry of Education under the Academic Research Fund Tier1 Grant RG37/07 and partially supported by a SingHealth Foundation Grant (SHF/FG385P/2008). The PDT experiment was performed in collaboration with the National Cancer Centre Singapore. We thank Dr. Hamid Dehghani for useful discussions, and Mr. Bobby Chow and Mr. Peng Zu for helpful technical assistance. We also thank Ms. Hui Jin Toh and Mr. Chuan-Sia Tee of the National Cancer Centre Singapore for assistance with the PDT experiments. Support from Ms. Melda and Dr. Yongjin are also gratefully acknowledged.