## 1.

## Introduction

Ultrasound-modulated optical tomography (UOT), also known as acousto-optic imaging, is an emerging biomedical imaging technique that promises to provide a greater spatial resolution than that provided by diffuse optical imaging.^{1, 2} UOT’s principle is the use of focused ultrasound to change the local refractive index and path length of the tissue in the focus region thereby “tagging” those photons that propagate through this focus region. The “tagged” photons are known to have passed through the ultrasound focus region and can therefore be exploited to provide a highly localized image of the absorption and scattering coefficient^{3, 4} with a spatial resolution determined by the size of the ultrasound focus. Modern ultrasound technology can readily generate focus regions of millimeter scales, and UOT may therefore provide a spatial resolution of a similar order.

UOT is currently a research topic largely confined to the laboratory and has rarely been applied to any major clinical applications. The main reason for this is that the UOT signal is very weak, and its detection poses a major challenge. To this end various detection schemes have been proposed including parallel detection with a CCD camera^{5} or photorefractive crystals.^{6} Another more fundamental challenge is that researchers are still trying to fully understand the underlying mechanisms of the ultrasonic modulation of light. Wang,
^{7, 8} modeled two mechanisms that alter the phase of coherent light under the exposure of ultrasound in a turbid medium.^{9, 10, 11} The first mechanism is the variation of the refractive index of the medium and hence the relative phase of the electric field of light propagating through the medium. The second mechanism is the variation of the relative phase of the electric field due to the displacement of scatterers and, thus, the optical path length. The two mechanisms were simulated with a Monte Carlo (MC) model, which was based on a rather theoretical setting [i.e., ultrasound plane waves], rather than focused ultrasound waves, propagating through a slab. Because of its simplicity, this MC model has been adapted by several researchers to study different applications and theories of UOT (e.g., the measurement of the optical and mechanical properties of tissues^{12} and the nonphase mechanism^{13}). One of the main reasons for the popularity of Wang’s ultrasound-modulated light (UL) model is that it is built on a MC light transport model widely used in biomedical optics, known as MCML, by Wang
^{14} Another advantage is that the MC model is accompanied by an analytical solution^{8, 15} allowing for validation of the simulation. Subsequent MC models by Wang’s group consider more realistic scenarios, allowing optical inhomogenities, and cylindrical^{16} and focused ultrasound^{17} modulation.

In the past few years, MC simulations in general have undergone a dramatic speed improvement unleashed by the availability of low-cost graphics cards utilizing a new architecture known as Compute Unified Device Architecture (CUDA). Using a superset of the C language, CUDA allows the execution of algorithms on highly parallelized graphics processing unit (GPU) hardware. MC simulations perform tens of millions of isolated random walks and are therefore inherently parallelizable. By exploiting the parallel architecture of the GPU, MC simulations of light transport implemented on these platforms report a speed increase of a factor of two to three orders of magnitude in comparison to the same MC simulations implemented on a CPU. To this end, Alerstam implemented a GPU version of MCML, known as CUDAMCML,^{18, 19} while Fang and Boas developed a time-resolved MC light transport model in arbitrary 3-D turbid media on a GPU platform.^{20}

In this paper, we describe and assess the use of GPU to speed up MC simulations of UL that, to the best of our knowledge, has not been reported before. Our GPU MC simulation of UL has been developed based on Wang’s original MC model^{7} by incorporating the simulation of an ultrasound plane wave into the CUDAMCML code. We compare the speed of the GPU implementation against its central processing unit (CPU) counterpart. We also demonstrate the effect of ultrasonic modulation on speckle patterns through animations generated from the results of such MC simulations.

## 2.

## Methods

We have developed two MC simulation codes, one implemented on a CPU and the other on a GPU. Care has been taken to ensure that the two versions impose a similar computational load such that a fair comparison of speed can be performed. The CPU version has been developed by modifying MCML as suggested by Wang.^{7} The basic structure of our GPU code is based on CUDAMCML, which has been thoroughly examined in the literature.^{18, 19} Both CUDAMCML and the original MCML were developed for modeling photon migration, in general, not specifically for coherent light. In this model, each photon packet is characterized by a photon packet weight,
$W$
, which represents the probability of an individual photon reaching the detector after multiple scattering and absorption events. For coherent light, the phase of the electric field of the photon packet is also considered such that each packet is characterized by
$W{e}^{-i\varphi}$
, where
$\varphi $
is the phase of the photon packet as determined by the optical path length.

## 2.1.

### Implementation of Phase Variations

The ultrasonic modulation of light alters the phase,
$\varphi $
, of the packet as it propagates through the turbid medium (it is recognized that alternative UL mechanisms may lead to modulation of the absorption coefficient of the medium and, hence, the weight of the packet, though such effects are not considered here). Leutz and Maret first described the phase variation
${\varphi}_{d,j}\left(t\right)$
caused by changes in the optical path length due to ultrasound-modulated particle displacement after the
$\mathrm{j}\text{th}$
scattering event.^{11} Wang incorporated this phase variation in his MC simulations and also introduced a new mechanism of phase variation,
${\varphi}_{n,j}\left(t\right)$
, which considers the ultrasound-induced change in refractive index over one free path^{7} between scattering events. These two mechanisms have been implemented in our MC simulations; the full definition of
${\varphi}_{d,j}\left(t\right)$
and
${\varphi}_{n,j}\left(t\right)$
can be found in the literature.^{7}

## 2.2.

### Implementation of the Autocorrelation Function

The autocorrelation function is often used in the analysis of optical phases where a coherent light source is used in a turbid medium (e.g., diffuse correlation spectroscopy^{21, 22} and, indeed, UL^{11}). The reason is that optical phases are often randomized in turbid media and the analysis of their statistical properties using second-order statistics, such as the autocorrelation function, is a convenient metric.

In UL, it has been suggested that the autocorrelation function for path length $s$ can be formed by averaging and summing the phase variations $\Delta {\varphi}_{n,j}(t+\tau )={\varphi}_{n,j}(t+\tau )-{\varphi}_{n,j}\left(t\right)$ over $N$ free paths, and $\Delta {\varphi}_{d,j}(t+\tau )={\varphi}_{d,j}(t+\tau )-{\varphi}_{d,j}\left(t\right)$ over $N-1$ scattering events along one whole photon path from the optical source to the detector,

## Eq. 1

$$\u27e8{E}_{s}\left(t\right){E}_{s}^{*}(t+\tau )\u27e9=\u27e8\mathrm{exp}\{-i[\sum _{j=1}^{N}\Delta {\varphi}_{n,j}(t+\tau )+\sum _{j=1}^{N-1}\Delta {\varphi}_{d,j}(t+\tau )]\}\u27e9,$$## Eq. 2

$${G}_{1}\left(\tau \right)=\sum _{m=1}^{M}\phantom{\rule{0.2em}{0ex}}\mathrm{Pr}\left[{s}_{m}\right]\u27e8{E}_{{s}_{m}}\left(t\right){E}_{{s}_{m}}^{*}(t+\tau )\u27e9=\sum _{m=1}^{M}{W}_{m}\u27e8{E}_{{s}_{m}}\left(t\right){E}_{{s}_{m}}^{*}(t+\tau )\u27e9,$$Assuming a sinusoidal ultrasonic excitation, the autocorrelation function
${G}_{1}\left(\tau \right)$
is a cosine waveform from which the maximum variation is given by
$1-{G}_{1}\left(0.5{T}_{a}\right)$
, where
${T}_{a}$
is the acoustic period. The maximum variation of
${G}_{1}\left(\tau \right)$
indicates the strength of the UOT signal,^{15} and the results in Sec. 3 will be presented in terms of this parameter.

Following the same approach as Wang,^{7} the MC simulations were performed on an infinite slab with light illumination on one side and a single-point detector on the other. To speed up the computation, the reciprocity of photon migration was exploited by simulating the injection of photons into the medium at a single point on one side, detecting the exiting photons at all locations on the other.

The parallel architecture of the GPU allows thousands of photon-packet propagations (threads) to be simulated at the same time. The maximum number of simultaneous threads is dependent on how many memory registers are required by the photon propagation algorithm. In the original CUDAMCML code, 26 registers are used in a
$32\text{-bit}$
Windows XP compilation,^{18} which allows a total of 16,128 simultaneous photon-packet propagation threads. Because of the incorporation of ultrasound modulation, our GPU code is more complicated and requires 37 registers; this still allows 10,752 simultaneous photon packet propagation threads. Figure 1
depicts the basic structure of our GPU code. Although the core of the MC code is similar to that of CUDAMCML, our GPU code also involves the calculation and accumulation of phase variations
$\Delta \varphi (t+\tau )=\sum \Delta {\varphi}_{d,j}(t+\tau )+\sum \Delta {\varphi}_{n,j}(t+\tau )$
as photon packets propagate. As a photon packet reaches the detection surface, the final photon weight
$W$
and the real and imaginary parts of the phase variation (i.e.,
$\mathrm{cos}\left[\Delta \varphi (t+\tau )\right]$
and
$\mathrm{sin}\left[\Delta \varphi (t+\tau )\right]$
) are added to the global device memory, which is used to calculate the autocorrelation function, as described in Eq. 2, on completion of the photon propagation simulation.

## 2.3.

### Implementation of the Ultrasound Modulated Speckle Patterns

In practice, the UL signals collectively form a speckle pattern. By exporting the MC simulation results in an appropriate manner, an ultrasound-modulated speckle pattern can be generated in postprocessing. In this scenario, coherent light is applied at a single point on one side of the slab, and the ultrasound-modulated speckle pattern is modeled based on the light exiting on the other side. Despite the differing motivations, this is essentially the same process as discussed. The electric field of the $m\text{th}$ transmitted photon packet is

## Eq. 3

$${E}_{{s}_{m}}(t;{x}_{m},{y}_{m})={W}_{m}\phantom{\rule{0.2em}{0ex}}\mathrm{exp}\{-i[\sum _{j=1}^{N}{\varphi}_{n,j}\left(t\right)+\sum _{j=1}^{N-1}{\varphi}_{d,j}\left(t\right)+\sum _{j=1}^{N-1}{\varphi}_{r,j}-{\mathbf{k}}_{N}\cdot {\mathbf{r}}_{N}]\}.$$*in vacuo*, ${\mathbf{r}}_{\mathbf{j}}$ is the location of the j’th scatterer and ${\mathbf{k}}_{j}$ the optical wave vector for the $j$ ’th free path. Brownian motion is not considered in this model such that scatterers can be regarded as static in the absence of an ultrasound field. The propagation of one million photons was simulated and the $x$ and $y$ coordinates of each exiting photon packet exported alongside their weight and phase. In order to produce a speckle pattern the electric field was spatially discretized by adding the contributions of each photon packet exiting within a certain region,

## Eq. 4

$${E}_{s}(t;{x}_{p},{y}_{q})=\sum _{\begin{array}{c}{x}_{p}-\Delta x\u22152<{x}_{m}\u2a7d{x}_{p}+\Delta x\u22152\\ {y}_{q}-\Delta y\u22152<{y}_{m}\u2a7d{y}_{q}+\Delta y\u22152\end{array}}{E}_{{s}_{m}}(t;{x}_{m},{y}_{m}),$$^{23}

## Eq. 5

$$I(t;{X}_{u},{Y}_{v})={\left|\mathrm{FT}\left\{P({x}_{p},{y}_{q}){E}_{s}(t;{x}_{p},{y}_{q})\right\}\right|}^{2},$$## 3.

## Results

Four aspects of the MC simulation results are presented below, and one million photons were used in each case.

## 3.1.

### Validation of the MC Simulations

Sakadzic and Wang provide an analytical solution for the autocorrelation function^{15} for the same scenario as simulated in this paper; this result has been used here to verify our MC simulations. Figure 2
shows the analytical results against our MC results for maximum variations of
${G}_{1}\left(\tau \right)$
over a range of scattering coefficients
${\mu}_{\mathrm{s}}$
, ultrasound induced displacement amplitudes
$A$
and ultrasound frequencies
${f}_{\mathrm{a}}$
. The following parameters were used in the simulations: wavelength of light *in vacuo*
${\lambda}_{0}=500\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
,
${n}_{0}=1.33$
,
${\mu}_{\mathrm{a}}=0\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$
, anisotropy factor
$g=0$
,
$\eta =0.3211$
(a function of the adiabatic piezo-optical coefficient of the material,
$\partial n\u2215\partial p$
, the density
$\rho $
, and the acoustic velocity of
${v}_{\mathrm{a}}$
:
$\eta =\partial n\u2215\partial p\cdot \rho \cdot {v}_{\mathrm{a}}^{2}$
),^{7} and thickness of the slab
$L=2\phantom{\rule{0.3em}{0ex}}\mathrm{cm}$
. These values were the same as those used by Sakadzic and Wang.^{15} Figure 2 shows agreement between the GPU-based MC results and those produced analytically, thus validating our GPU-based MC code. Similar validation was performed on our CPU-based MC code. It can be seen from Fig. 2 that there is certain discrepancy between the analytical and MC result, which is particularly apparent when
$A$
is large (e.g.,
$A=1.7\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
). Sakadzic and Wang, who derived the analytical solution, noted that an approximation has been made in the derivation of the autocorrelation function and the approximation is only valid when the accumulated phase variation is small
$(\u2aa11)$
.^{15} As
$A$
becomes larger, the phase variations grow and the approximation in the analytical solution becomes less accurate. The MC code presented here does not use any approximation, and hence, a discrepancy is observed.

## 3.2.

### GPU Speedup

Two CPUs and two GPUs were each tested with the corresponding versions of the MC codes. The two CPUs were an Intel Core Quad, Q6700,
$2.66\phantom{\rule{0.3em}{0ex}}\mathrm{GHz}$
(slower) and an Intel Core i7-920 (faster). The two GPUs used were an Nvidia GeForce 9800 GT with 112 cores (slower) and an Nvidia GTS250 with 128 cores (faster). The optical and acoustic parameters used were;
${\lambda}_{0}=500\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
,
${n}_{0}=1.33$
,
${\mu}_{\mathrm{a}}=0\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$
,
$g=0$
,
$\eta =0.3211$
,
$L=2\phantom{\rule{0.3em}{0ex}}\mathrm{cm}$
,
${f}_{\mathrm{a}}=1\phantom{\rule{0.3em}{0ex}}\mathrm{MHz}$
, and
$A=1.7\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
. All other optical and acoustic parameters were the same as those used in a previous study.^{15} The scattering coefficient
${\mu}_{\mathrm{s}}$
was varied over 2, 5, 10, 20, 50, 100, and
$200\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$
. All computation times were taken as a ratio to that of the slower CPU (Core Quad). In comparison to the reference CPU (Core Quad), the faster CPU (Intel Core i7) offered a computational speedup of around 1.57 for all values of
${\mu}_{\mathrm{s}}$
. Figure 3
shows the ratio of the CPU (Core Quad) computation time to the GPU computation times for the same set of parameters. The speedup varied with the choice of
${\mu}_{\mathrm{s}}$
. The GPU version is faster than its CPU counterpart by up to a factor of 125 with
${\mu}_{\mathrm{s}}=20\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$
. As expected, the GTS 250 is, in general, faster than the GeForce 9800 especially for increasing
${\mu}_{\mathrm{s}}$
.

## 3.3.

### Additional Computation Time Due to Incorporation of Ultrasound

Because of the additional complexity introduced by the incorporation of ultrasound propagation the modeling of UL imposes an increased computational load in comparison to the modeling of conventional photon migration. Figure 4
shows the ratio of the GPU computation time with ultrasound to that without ultrasound as a function of
${\mu}_{\mathrm{s}}$
. All other optical and acoustic parameters used were the same as those used in Sec.
3.2. The MC code CUDAMCML developed by Alerstam
^{18} was used to obtain the computation times for conventional photon migration modeling (without ultrasound). The GPU used was the Nvidia GeForce 9800. It can be seen from Fig. 4 that the additional computation effort demanded by the incorporation of ultrasound increased the computation time by a factor of approximately 6 to 17, depending on the value of
${\mu}_{\mathrm{s}}$
.

## 3.4.

### Ultrasound-Modulated Speckle Patterns

Figure 5 is the video of the ultrasound-moudlated speckle pattern generated by postprocessing the MC simulation results, and it has a total of 12 frames played at $4\phantom{\rule{0.3em}{0ex}}\mathrm{fps}$ . Figure 6 depicts the ultrasound-modulated speckle pattern at six points in time over one acoustic cycle $\left(1\phantom{\rule{0.3em}{0ex}}\mu \mathrm{s}\right)$ . The two boxes inset in Figs. 6, 6, 6, 6, 6, 6 highlight two speckle grains that have different phases, i.e., the brightest moment of the speckle grain highlighted by the box on the left happened at (f) $0.83\phantom{\rule{0.3em}{0ex}}\mu \mathrm{s}$ whereas that highlighted by the box on the right happened at (c) $0.332\phantom{\rule{0.3em}{0ex}}\mu \mathrm{s}$ . The simulations were completed in $4\phantom{\rule{0.3em}{0ex}}\mathrm{s}$ using the GPU code compared to $290\phantom{\rule{0.3em}{0ex}}\mathrm{s}$ for the CPU version—a speedup of 72 times.

10.1117/1.3495729.1## 4.

## Discussion and Conclusions

In this paper, we have demonstrated the use of low-cost GPUs ($100–130 (US) for the two GPU cards tested here) to speed up MC simulations of UL. We have validated our GPU-based MC code with an analytical solution over a range of optical and acoustic parameters and shown that the GPU-based simulation is faster than its CPU counterpart by a factor of up to 125. We have also demonstrated the use of the GPU in modeling ultrasound-modulated speckle patterns; in this case, the GPU is faster than the CPU by a factor of up to 72.

Because of the use of plane-wave ultrasound propagation, which offers no spatial information, the GPU-based MC code described here may be of limited practical use in imaging. However, the main contribution of this MC code is to provide a validated and optimized platform on which further extensions can be built. For instance, based on this MC code we have recently developed two new MC models of UL: one with focused ultrasound in a cylindrical geometry^{24} and one with oscillating microbubbles.^{25} We are further developing the MC code for different geometries, including a hemisphere (simulating breast), an embedded tube (simulating blood vessel), and different acousto-optic mechanisms, such as the acoustic radiation force.

It is anticipated that such extensions will inevitably increase the demand for memory and the number of arithmetic operations that may, in turn, reduce the number of threads and hence speed. To this end, our future development will fully exploit various features of the GPU. For instance, rather than calculating nonlinear acoustic pressure during the simulation, one may choose to store representative points of a precalculated pressure distribution and then use the “texture memory”^{26} feature of the GPU to interpolate any points in between.

## 4.1.

### Simulation Efficacy

The curves of Figs. 3 and 4 both demonstrate two distinct trends. Below a particular value of ${\mu}_{\mathrm{s}}$ , (e.g., ${\mu}_{\mathrm{s}}=20\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$ , in Fig. 3, and ${\mu}_{\mathrm{s}}=10\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$ , in Fig. 4) the efficacy of the GPU-based UL simulation increases sharply with ${\mu}_{\mathrm{s}}$ . Above this threshold, its efficacy gradually reduces. These trends are consequences of our GPU implementation rather than the underlying algorithm. Two aspects dominate the observed efficacy as discussed in the following. As ${\mu}_{\mathrm{s}}$ is increased, the first aspect of the implementation leads to an increase in efficacy while the second aspect a decrease. The overall change in efficacy depends on which of these two aspects dominate. The two apects are as follows:

1. On the last step of a photon packet that has reached the detector, a delay of several hundred cycles is introduced by writing the results to the global memory of the GPU.

^{26}As ${\mu}_{\mathrm{s}}$ is increased, the number of these events decreases as a proportion of the total computation for a particular photon packet. This aspect of the implementation leads to an increase in efficacy with ${\mu}_{\mathrm{s}}$ for the GPU-based UL code. In Fig. 3, this trend can be seen in the speedup relative to the CPU UL implementation because the CPU does not incur a performance penalty at this point in the algorithm. In Fig. 4, this trend is seen in the speedup relative to the non-UL GPU code because this code does perform this operation.2. A single simulation on the GPU actually consists of multiple subsimulations that each executes a given number of photon transport steps. This is implemented to avoid a “feature” of the Windows operating system whereby the GPU is deemed to have crashed if an algorithm executes continuously for $>5\phantom{\rule{0.3em}{0ex}}\mathrm{s}$ .

^{19}(It is noted that this restriction can be easily resolved by using one GPU card for processing and one for display.) At the end of each subsimulation, the state of each thread is written to the device’s global memory (this step is necessary so that the state of the photon can be restored in the next subsimulation), incurring the same overhead described in point 1. This aspect of the implementation leads to a decrease in efficacy with ${\mu}_{\mathrm{s}}$ . The CPU UL code does not share this problem, and the reducing efficacy of the GPU UL code is thus demonstrated in the speedup curve of Fig. 3. Although both GPU codes compared in Fig. 4 each have this performance bottleneck, the UL simulation stores a larger amount of state information than its non-UL GPU counterpart; this increased transfer time is therefore seen in the speedup comparison.

## 4.2.

### New Possibilities

The incorporation of ultrasound propagation in photon migration modeling increases the computational effort considerably (e.g., the computation time increased by a factor of at least 6, as shown in Sec. 3.3). The improvement in speed is therefore an important advancement in the modelling of UL. Using such techniques additional acousto-optic mechanisms can be simulated before the simulation speed becomes a limiting factor in such research. Furthermore, numerical approximations (which can be detrimental to both accuracy and the readability of the simulation code) may no longer be required to ensure tolerable computation time. Previously, the simulated geometry, medium, type of acoustic source, and locations of the optical source and detector were often simplified partly due to the issue of computation time. For example, one early scheme (as discussed in Sec. 2.2) required the optical source to be (theoretically) infinitely large and for the detector to be a single point on the other side of the slab. The reciprocity of photon migration was relied on such that the code was sufficiently efficient. With a general speedup in the code, a more realistic scenario can be modeled without incurring an unacceptable increase in computational time.

## 4.3.

### Image Reconstruction

An alternative to MC modeling is the finite element method (FEM), which has been widely adopted in diffuse optical imaging^{27} and has a relatively fast computational speed. The computational speed of forward modeling is especially important in image reconstruction because of the iterative nature of an inverse problem. Sakadzic and Wang have developed a correlation transfer equation for UL^{17} that can potentially be solved using FEM and used for image reconstruction. However, unlike solving the diffusion approximation, as in the case of diffuse optical imaging, solving the correlation transfer equation is not an easy task and may involve some difficult numerical issues. Moreover, the existing correlation transfer equation has been developed for specific acousto-optic mechanisms. This means that the incorporation of an additional mechanism will require a new form of the correlation transfer function to be derived. In comparison, the MC approach is more straightforward and flexible, allowing the easy incorporation of new mechanisms (e.g., the nonphase mechanism^{13}). With the improvement of computational speed using a GPU, forward modeling using MC simulations is now a viable option for UOT image reconstruction. Although a GPU implementation of a carefully designed FEM scheme could potentially outperform its MC counterpart, the absolute time required for the MC implementation would nonetheless remain reasonable [e.g.,
$10\phantom{\rule{0.3em}{0ex}}\mathrm{min}$
with a GPU as compared to
$21\phantom{\rule{0.3em}{0ex}}\mathrm{h}$
with a CPU (assuming a speedup of 125 times)].

## 4.4.

### Ultrasound-Modulated Speckle Pattern

Figures 5 and 6 demonstrate an important characteristic of ultrasound-modulated speckles whereby individual speckle grains are modulated at the ultrasound frequency but with individual absolute phase (i.e., speckle grains blink at the ultrasound frequency but reach the brightest point at different times). In practice, the speckle grains will also be randomly modulated due to Brownian motion and this behavior has often been exploited to estimate particle size within the medium. Brownian motion was not modeled in our MC simulations such that the dynamics of the speckle grains are governed only by the applied ultrasound. We are currently investigating the incorporation of Brownian motion as another mechanism that may affect the UOT signal. One important use of the speckle pattern modeling described here is to help the design of an efficient detection scheme. Speckle patterns can be simulated using different combinations of acoustic and optical parameters, and the results used to test and compare the performance of different detection schemes in a standardized manner.

## Acknowledgments

The authors thank Erik Alerstam, Sava Sakadzic, and Shihong Jiang for the discussion of CUDAMCML, the CPU version of the UL MC codes, and modeling speckle patterns. This research is funded by EPSRC (Grant No. EP/G005036/1).

## References

**,” Dis. Markers, 19 (2–3), 123 –138 (2004). 0278-0240 Google Scholar**

*Ultrasound-mediated biophotonic imaging: a review of acousto-optical tomography and photo-acoustic tomography***,” Acoust. Today, 3 (3), 17 –24 (2007). https://doi.org/10.1121/1.2961154 1557-0215 Google Scholar**

*Illuminating sound: imaging tissue optical properties with ultrasound***,” Opt. Lett., 32 (16), 2285 –2287 (2007). https://doi.org/10.1364/OL.32.002285 0146-9592 Google Scholar**

*Multi-optical-wavelength ultrasound-modulated optical tomography: a phantom study***,” Opt. Lett., 32 (16), 2351 –2353 (2007). https://doi.org/10.1364/OL.32.002351 0146-9592 Google Scholar**

*Imaging optically scattering objects with ultrasound-modulated optical tomography***,” Opt. Lett., 24 (3), 181 –183 (1999). https://doi.org/10.1364/OL.24.000181 0146-9592 Google Scholar**

*Ultrasonic tagging of photon paths in scattering media: parallel speckle modulation processing***,” Opt. Lett., 29 (21), 2509 –2511 (2004). https://doi.org/10.1364/OL.29.002509 0146-9592 Google Scholar**

*Detection of ultrasound-modulated photons in diffuse media using the photorefractive effect***,” Opt. Lett., 26 (15), 1191 –1193 (2001). https://doi.org/10.1364/OL.26.001191 0146-9592 Google Scholar**

*Mechanisms of ultrasonic modulation of multiply scattered coherent light: a Monte Carlo model***,” Phys. Rev. Lett., 87 (4), 043903 (2001). https://doi.org/10.1103/PhysRevLett.87.043903 0031-9007 Google Scholar**

*Mechanisms of ultrasonic modulation of multiply scattered coherent light: an analytic model***,” Proc. Natl. Acad. Sci. U.S.A., 95 (24), 14015 –14019 (1998). https://doi.org/10.1073/pnas.95.24.14015 0027-8424 Google Scholar**

*Ultrasonic tagging of light: theory***,” J. Opt. Soc. Am. A, 14 (5), 1151 –1158 (1997). https://doi.org/10.1364/JOSAA.14.001151 0740-3232 Google Scholar**

*Acousto-optic tomography with multiply scattered light***,” Physica B, 204 (1–4), 14 –19 (1995). https://doi.org/10.1016/0921-4526(94)00238-Q 0921-4526 Google Scholar**

*Ultrasonic modulation of multiply scattered-light***,” J. Biomed. Opt., 11 (3), 34019 (2006). https://doi.org/10.1117/1.2209012 1083-3668 Google Scholar**

*Application of ultrasound-tagged photons for measurement of amplitude of vibration of tissue caused by ultrasound: theory, simulation, and experiments***,” Appl. Opt., 47 (20), 3619 –3630 (2008). https://doi.org/10.1364/AO.47.003619 0003-6935 Google Scholar**

*Modeling of nonphase mechanisms in ultrasonic modulation of light propagation***,” Comput. Methods Programs Biomed., 47 (2), 131 –146 (1995). https://doi.org/10.1016/0169-2607(95)01640-F 0169-2607 Google Scholar**

*MCML—Monte Carlo modeling of light transport in multi-layered tissues***,” Phys. Rev. E, 66 (2), 026603 (2002). https://doi.org/10.1103/PhysRevE.66.026603 1063-651X Google Scholar**

*Ultrasonic modulation of multiply scattered coherent light: an analytical model for anisotropically scattering media***,” Appl. Opt., 43 (6), 1320 –1326 (2004). https://doi.org/10.1364/AO.43.001320 0003-6935 Google Scholar**

*Signal dependence and noise source in ultrasound-modulated optical tomography***,” Phys. Rev. E, 74 (3 Pt 2), 036618 (2006). https://doi.org/10.1103/PhysRevE.74.036618 1063-651X Google Scholar**

*Correlation transfer equation for ultrasound-modulated multiply scattered light***,” J. Biomed. Opt., 13 (6), 060504 (2008). https://doi.org/10.1117/1.3041496 1083-3668 Google Scholar**

*Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration***,” Opt. Express, 17 (22), 20178 –20190 (2009). https://doi.org/10.1364/OE.17.020178 1094-4087 Google Scholar**

*Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units***,” Phys. Med. Biol., 46 (8), 2053 –2065 (2001). https://doi.org/10.1088/0031-9155/46/8/302 0031-9155 Google Scholar**

*In vivo*cerebrovascular measurement combining diffuse near-infrared absorption and correlation spectroscopies**,” Phys. Rev. Lett., 60 (12), 1134 –1137 (1988). https://doi.org/10.1103/PhysRevLett.60.1134 0031-9007 Google Scholar**

*Diffusing wave spectroscopy***,” Proc. SPIE, 7564 756431 (2010). https://doi.org/10.1117/12.842175 0277-786X Google Scholar**

*Optimization of the acousto-optic signal detection in cylindrical geometry***,” Proc. SPIE, 7564 75640 (2010). https://doi.org/10.1117/12.842030 0277-786X Google Scholar**

*Monte Carlo simulations of acousto-optics with microbubbles***,” Med. Phys., 20 (2 I), 299 –310 (1993). https://doi.org/10.1118/1.597069 0094-2405 Google Scholar**

*A finite element approach for modeling photon transport in tissue*