Ultrasound-modulated optical tomography (UOT), also known as acousto-optic imaging, is an emerging biomedical imaging technique that promises to provide a greater spatial resolution than that provided by diffuse optical imaging.1, 2 UOT’s principle is the use of focused ultrasound to change the local refractive index and path length of the tissue in the focus region thereby “tagging” those photons that propagate through this focus region. The “tagged” photons are known to have passed through the ultrasound focus region and can therefore be exploited to provide a highly localized image of the absorption and scattering coefficient3, 4 with a spatial resolution determined by the size of the ultrasound focus. Modern ultrasound technology can readily generate focus regions of millimeter scales, and UOT may therefore provide a spatial resolution of a similar order.
UOT is currently a research topic largely confined to the laboratory and has rarely been applied to any major clinical applications. The main reason for this is that the UOT signal is very weak, and its detection poses a major challenge. To this end various detection schemes have been proposed including parallel detection with a CCD camera5 or photorefractive crystals.6 Another more fundamental challenge is that researchers are still trying to fully understand the underlying mechanisms of the ultrasonic modulation of light. Wang, 7, 8 modeled two mechanisms that alter the phase of coherent light under the exposure of ultrasound in a turbid medium.9, 10, 11 The first mechanism is the variation of the refractive index of the medium and hence the relative phase of the electric field of light propagating through the medium. The second mechanism is the variation of the relative phase of the electric field due to the displacement of scatterers and, thus, the optical path length. The two mechanisms were simulated with a Monte Carlo (MC) model, which was based on a rather theoretical setting [i.e., ultrasound plane waves], rather than focused ultrasound waves, propagating through a slab. Because of its simplicity, this MC model has been adapted by several researchers to study different applications and theories of UOT (e.g., the measurement of the optical and mechanical properties of tissues12 and the nonphase mechanism13). One of the main reasons for the popularity of Wang’s ultrasound-modulated light (UL) model is that it is built on a MC light transport model widely used in biomedical optics, known as MCML, by Wang 14 Another advantage is that the MC model is accompanied by an analytical solution8, 15 allowing for validation of the simulation. Subsequent MC models by Wang’s group consider more realistic scenarios, allowing optical inhomogenities, and cylindrical16 and focused ultrasound17 modulation.
In the past few years, MC simulations in general have undergone a dramatic speed improvement unleashed by the availability of low-cost graphics cards utilizing a new architecture known as Compute Unified Device Architecture (CUDA). Using a superset of the C language, CUDA allows the execution of algorithms on highly parallelized graphics processing unit (GPU) hardware. MC simulations perform tens of millions of isolated random walks and are therefore inherently parallelizable. By exploiting the parallel architecture of the GPU, MC simulations of light transport implemented on these platforms report a speed increase of a factor of two to three orders of magnitude in comparison to the same MC simulations implemented on a CPU. To this end, Alerstam implemented a GPU version of MCML, known as CUDAMCML,18, 19 while Fang and Boas developed a time-resolved MC light transport model in arbitrary 3-D turbid media on a GPU platform.20
In this paper, we describe and assess the use of GPU to speed up MC simulations of UL that, to the best of our knowledge, has not been reported before. Our GPU MC simulation of UL has been developed based on Wang’s original MC model7 by incorporating the simulation of an ultrasound plane wave into the CUDAMCML code. We compare the speed of the GPU implementation against its central processing unit (CPU) counterpart. We also demonstrate the effect of ultrasonic modulation on speckle patterns through animations generated from the results of such MC simulations.
We have developed two MC simulation codes, one implemented on a CPU and the other on a GPU. Care has been taken to ensure that the two versions impose a similar computational load such that a fair comparison of speed can be performed. The CPU version has been developed by modifying MCML as suggested by Wang.7 The basic structure of our GPU code is based on CUDAMCML, which has been thoroughly examined in the literature.18, 19 Both CUDAMCML and the original MCML were developed for modeling photon migration, in general, not specifically for coherent light. In this model, each photon packet is characterized by a photon packet weight, , which represents the probability of an individual photon reaching the detector after multiple scattering and absorption events. For coherent light, the phase of the electric field of the photon packet is also considered such that each packet is characterized by , where is the phase of the photon packet as determined by the optical path length.
Implementation of Phase Variations
The ultrasonic modulation of light alters the phase, , of the packet as it propagates through the turbid medium (it is recognized that alternative UL mechanisms may lead to modulation of the absorption coefficient of the medium and, hence, the weight of the packet, though such effects are not considered here). Leutz and Maret first described the phase variation caused by changes in the optical path length due to ultrasound-modulated particle displacement after the scattering event.11 Wang incorporated this phase variation in his MC simulations and also introduced a new mechanism of phase variation, , which considers the ultrasound-induced change in refractive index over one free path7 between scattering events. These two mechanisms have been implemented in our MC simulations; the full definition of and can be found in the literature.7
Implementation of the Autocorrelation Function
The autocorrelation function is often used in the analysis of optical phases where a coherent light source is used in a turbid medium (e.g., diffuse correlation spectroscopy21, 22 and, indeed, UL11). The reason is that optical phases are often randomized in turbid media and the analysis of their statistical properties using second-order statistics, such as the autocorrelation function, is a convenient metric.
In UL, it has been suggested that the autocorrelation function for path length can be formed by averaging and summing the phase variations over free paths, and over scattering events along one whole photon path from the optical source to the detector,is the electric field of a photon with path length , is the time delay and averaging (in angular brackets) is performed over time and over all photons with path length . The autocorrelation function for all possible path lengths is , where is the probability density function of path length . In discrete form appropriate for MC simulations, the autocorrelation function becomes is the path length for the photon packet, is the probability of detecting a photon packet with a path length of and is the total number of photon packets (typically, million) used in the MC simulations. In the GPU implementation, tens of thousands of photon packets are simultaneously launched in different threads. Each photon packet undergoes a random walk through the medium and may be terminated at different times. The summation of from to in Eq. 2 is performed packet by packet, and the result stored in the global device memory.
Assuming a sinusoidal ultrasonic excitation, the autocorrelation function is a cosine waveform from which the maximum variation is given by , where is the acoustic period. The maximum variation of indicates the strength of the UOT signal,15 and the results in Sec. 3 will be presented in terms of this parameter.
Following the same approach as Wang,7 the MC simulations were performed on an infinite slab with light illumination on one side and a single-point detector on the other. To speed up the computation, the reciprocity of photon migration was exploited by simulating the injection of photons into the medium at a single point on one side, detecting the exiting photons at all locations on the other.
The parallel architecture of the GPU allows thousands of photon-packet propagations (threads) to be simulated at the same time. The maximum number of simultaneous threads is dependent on how many memory registers are required by the photon propagation algorithm. In the original CUDAMCML code, 26 registers are used in a Windows XP compilation,18 which allows a total of 16,128 simultaneous photon-packet propagation threads. Because of the incorporation of ultrasound modulation, our GPU code is more complicated and requires 37 registers; this still allows 10,752 simultaneous photon packet propagation threads. Figure 1 depicts the basic structure of our GPU code. Although the core of the MC code is similar to that of CUDAMCML, our GPU code also involves the calculation and accumulation of phase variations as photon packets propagate. As a photon packet reaches the detection surface, the final photon weight and the real and imaginary parts of the phase variation (i.e., and ) are added to the global device memory, which is used to calculate the autocorrelation function, as described in Eq. 2, on completion of the photon propagation simulation.
Implementation of the Ultrasound Modulated Speckle Patterns
In practice, the UL signals collectively form a speckle pattern. By exporting the MC simulation results in an appropriate manner, an ultrasound-modulated speckle pattern can be generated in postprocessing. In this scenario, coherent light is applied at a single point on one side of the slab, and the ultrasound-modulated speckle pattern is modeled based on the light exiting on the other side. Despite the differing motivations, this is essentially the same process as discussed. The electric field of the transmitted photon packet is, where is the phase change due to the momentum transfer of the photon packet between the ’th and ’th scattering events, is the background refactive index, the optical wavenumber in vacuo, is the location of the j’th scatterer and the optical wave vector for the ’th free path. Brownian motion is not considered in this model such that scatterers can be regarded as static in the absence of an ultrasound field. The propagation of one million photons was simulated and the and coordinates of each exiting photon packet exported alongside their weight and phase. In order to produce a speckle pattern the electric field was spatially discretized by adding the contributions of each photon packet exiting within a certain region, and with and as the indexes of the and coordinates of a region, and and the widths of each region. The speckle pattern was formed by modeling an optical system consisting of a lens with focal length of , a pupil with a radius of , and detectors with numerical aperture of 0.5. The image formed at the focal plane of the lens is thus given by23 is the intensity of the time-varying speckle pattern with and being the indexes of the and coordinates in the focal plane, is a pupil function and FT signifies the 2-D Fourier transform. In this simulation, 13 speckle patterns were generated over one acoustic cycle, the pattern being harmonic in time in the absence of Brownian motion.
Four aspects of the MC simulation results are presented below, and one million photons were used in each case.
Validation of the MC Simulations
Sakadzic and Wang provide an analytical solution for the autocorrelation function15 for the same scenario as simulated in this paper; this result has been used here to verify our MC simulations. Figure 2 shows the analytical results against our MC results for maximum variations of over a range of scattering coefficients , ultrasound induced displacement amplitudes and ultrasound frequencies . The following parameters were used in the simulations: wavelength of light in vacuo , , , anisotropy factor , (a function of the adiabatic piezo-optical coefficient of the material, , the density , and the acoustic velocity of : ),7 and thickness of the slab . These values were the same as those used by Sakadzic and Wang.15 Figure 2 shows agreement between the GPU-based MC results and those produced analytically, thus validating our GPU-based MC code. Similar validation was performed on our CPU-based MC code. It can be seen from Fig. 2 that there is certain discrepancy between the analytical and MC result, which is particularly apparent when is large (e.g., ). Sakadzic and Wang, who derived the analytical solution, noted that an approximation has been made in the derivation of the autocorrelation function and the approximation is only valid when the accumulated phase variation is small .15 As becomes larger, the phase variations grow and the approximation in the analytical solution becomes less accurate. The MC code presented here does not use any approximation, and hence, a discrepancy is observed.
Two CPUs and two GPUs were each tested with the corresponding versions of the MC codes. The two CPUs were an Intel Core Quad, Q6700, (slower) and an Intel Core i7-920 (faster). The two GPUs used were an Nvidia GeForce 9800 GT with 112 cores (slower) and an Nvidia GTS250 with 128 cores (faster). The optical and acoustic parameters used were; , , , , , , , and . All other optical and acoustic parameters were the same as those used in a previous study.15 The scattering coefficient was varied over 2, 5, 10, 20, 50, 100, and . All computation times were taken as a ratio to that of the slower CPU (Core Quad). In comparison to the reference CPU (Core Quad), the faster CPU (Intel Core i7) offered a computational speedup of around 1.57 for all values of . Figure 3 shows the ratio of the CPU (Core Quad) computation time to the GPU computation times for the same set of parameters. The speedup varied with the choice of . The GPU version is faster than its CPU counterpart by up to a factor of 125 with . As expected, the GTS 250 is, in general, faster than the GeForce 9800 especially for increasing .
Additional Computation Time Due to Incorporation of Ultrasound
Because of the additional complexity introduced by the incorporation of ultrasound propagation the modeling of UL imposes an increased computational load in comparison to the modeling of conventional photon migration. Figure 4 shows the ratio of the GPU computation time with ultrasound to that without ultrasound as a function of . All other optical and acoustic parameters used were the same as those used in Sec. 3.2. The MC code CUDAMCML developed by Alerstam 18 was used to obtain the computation times for conventional photon migration modeling (without ultrasound). The GPU used was the Nvidia GeForce 9800. It can be seen from Fig. 4 that the additional computation effort demanded by the incorporation of ultrasound increased the computation time by a factor of approximately 6 to 17, depending on the value of .
Ultrasound-Modulated Speckle Patterns
Figure 5 is the video of the ultrasound-moudlated speckle pattern generated by postprocessing the MC simulation results, and it has a total of 12 frames played at . Figure 6 depicts the ultrasound-modulated speckle pattern at six points in time over one acoustic cycle . The two boxes inset in Figs. 6, 6, 6, 6, 6, 6 highlight two speckle grains that have different phases, i.e., the brightest moment of the speckle grain highlighted by the box on the left happened at (f) whereas that highlighted by the box on the right happened at (c) . The simulations were completed in using the GPU code compared to for the CPU version—a speedup of 72 times.10.1117/1.3495729.1
Discussion and Conclusions
In this paper, we have demonstrated the use of low-cost GPUs ($100–130 (US) for the two GPU cards tested here) to speed up MC simulations of UL. We have validated our GPU-based MC code with an analytical solution over a range of optical and acoustic parameters and shown that the GPU-based simulation is faster than its CPU counterpart by a factor of up to 125. We have also demonstrated the use of the GPU in modeling ultrasound-modulated speckle patterns; in this case, the GPU is faster than the CPU by a factor of up to 72.
Because of the use of plane-wave ultrasound propagation, which offers no spatial information, the GPU-based MC code described here may be of limited practical use in imaging. However, the main contribution of this MC code is to provide a validated and optimized platform on which further extensions can be built. For instance, based on this MC code we have recently developed two new MC models of UL: one with focused ultrasound in a cylindrical geometry24 and one with oscillating microbubbles.25 We are further developing the MC code for different geometries, including a hemisphere (simulating breast), an embedded tube (simulating blood vessel), and different acousto-optic mechanisms, such as the acoustic radiation force.
It is anticipated that such extensions will inevitably increase the demand for memory and the number of arithmetic operations that may, in turn, reduce the number of threads and hence speed. To this end, our future development will fully exploit various features of the GPU. For instance, rather than calculating nonlinear acoustic pressure during the simulation, one may choose to store representative points of a precalculated pressure distribution and then use the “texture memory”26 feature of the GPU to interpolate any points in between.
The curves of Figs. 3 and 4 both demonstrate two distinct trends. Below a particular value of , (e.g., , in Fig. 3, and , in Fig. 4) the efficacy of the GPU-based UL simulation increases sharply with . Above this threshold, its efficacy gradually reduces. These trends are consequences of our GPU implementation rather than the underlying algorithm. Two aspects dominate the observed efficacy as discussed in the following. As is increased, the first aspect of the implementation leads to an increase in efficacy while the second aspect a decrease. The overall change in efficacy depends on which of these two aspects dominate. The two apects are as follows:
1. On the last step of a photon packet that has reached the detector, a delay of several hundred cycles is introduced by writing the results to the global memory of the GPU.26 As is increased, the number of these events decreases as a proportion of the total computation for a particular photon packet. This aspect of the implementation leads to an increase in efficacy with for the GPU-based UL code. In Fig. 3, this trend can be seen in the speedup relative to the CPU UL implementation because the CPU does not incur a performance penalty at this point in the algorithm. In Fig. 4, this trend is seen in the speedup relative to the non-UL GPU code because this code does perform this operation.
2. A single simulation on the GPU actually consists of multiple subsimulations that each executes a given number of photon transport steps. This is implemented to avoid a “feature” of the Windows operating system whereby the GPU is deemed to have crashed if an algorithm executes continuously for .19 (It is noted that this restriction can be easily resolved by using one GPU card for processing and one for display.) At the end of each subsimulation, the state of each thread is written to the device’s global memory (this step is necessary so that the state of the photon can be restored in the next subsimulation), incurring the same overhead described in point 1. This aspect of the implementation leads to a decrease in efficacy with . The CPU UL code does not share this problem, and the reducing efficacy of the GPU UL code is thus demonstrated in the speedup curve of Fig. 3. Although both GPU codes compared in Fig. 4 each have this performance bottleneck, the UL simulation stores a larger amount of state information than its non-UL GPU counterpart; this increased transfer time is therefore seen in the speedup comparison.
The incorporation of ultrasound propagation in photon migration modeling increases the computational effort considerably (e.g., the computation time increased by a factor of at least 6, as shown in Sec. 3.3). The improvement in speed is therefore an important advancement in the modelling of UL. Using such techniques additional acousto-optic mechanisms can be simulated before the simulation speed becomes a limiting factor in such research. Furthermore, numerical approximations (which can be detrimental to both accuracy and the readability of the simulation code) may no longer be required to ensure tolerable computation time. Previously, the simulated geometry, medium, type of acoustic source, and locations of the optical source and detector were often simplified partly due to the issue of computation time. For example, one early scheme (as discussed in Sec. 2.2) required the optical source to be (theoretically) infinitely large and for the detector to be a single point on the other side of the slab. The reciprocity of photon migration was relied on such that the code was sufficiently efficient. With a general speedup in the code, a more realistic scenario can be modeled without incurring an unacceptable increase in computational time.
An alternative to MC modeling is the finite element method (FEM), which has been widely adopted in diffuse optical imaging27 and has a relatively fast computational speed. The computational speed of forward modeling is especially important in image reconstruction because of the iterative nature of an inverse problem. Sakadzic and Wang have developed a correlation transfer equation for UL17 that can potentially be solved using FEM and used for image reconstruction. However, unlike solving the diffusion approximation, as in the case of diffuse optical imaging, solving the correlation transfer equation is not an easy task and may involve some difficult numerical issues. Moreover, the existing correlation transfer equation has been developed for specific acousto-optic mechanisms. This means that the incorporation of an additional mechanism will require a new form of the correlation transfer function to be derived. In comparison, the MC approach is more straightforward and flexible, allowing the easy incorporation of new mechanisms (e.g., the nonphase mechanism13). With the improvement of computational speed using a GPU, forward modeling using MC simulations is now a viable option for UOT image reconstruction. Although a GPU implementation of a carefully designed FEM scheme could potentially outperform its MC counterpart, the absolute time required for the MC implementation would nonetheless remain reasonable [e.g., with a GPU as compared to with a CPU (assuming a speedup of 125 times)].
Ultrasound-Modulated Speckle Pattern
Figures 5 and 6 demonstrate an important characteristic of ultrasound-modulated speckles whereby individual speckle grains are modulated at the ultrasound frequency but with individual absolute phase (i.e., speckle grains blink at the ultrasound frequency but reach the brightest point at different times). In practice, the speckle grains will also be randomly modulated due to Brownian motion and this behavior has often been exploited to estimate particle size within the medium. Brownian motion was not modeled in our MC simulations such that the dynamics of the speckle grains are governed only by the applied ultrasound. We are currently investigating the incorporation of Brownian motion as another mechanism that may affect the UOT signal. One important use of the speckle pattern modeling described here is to help the design of an efficient detection scheme. Speckle patterns can be simulated using different combinations of acoustic and optical parameters, and the results used to test and compare the performance of different detection schemes in a standardized manner.
The authors thank Erik Alerstam, Sava Sakadzic, and Shihong Jiang for the discussion of CUDAMCML, the CPU version of the UL MC codes, and modeling speckle patterns. This research is funded by EPSRC (Grant No. EP/G005036/1).