*in vivo*studies. The results establish that the optimized mesh-based Monte Carlo (mMC) method is a computationally efficient solution for optical tomography studies in terms of both calculation time and memory utilization. The open source code, as part of a new release of mMC, is publicly available at http://mcx.sourceforge.net/mmc/.

## 1.

## Introduction

Optical imaging has been the subject of intense investigation over the past decades, largely due to the fact that light at visible and near-infrared wavelengths is a nonionizing, noninvasive probe with numerous applications in medicine.^{1}^{,}^{2} In fluorescence molecular tomography (FMT), an emerging optical imaging technique for preclinical applications, one can solve for three-dimensional (3-D) *in vivo* fluorescent marker distribution maps using only noncontact surface optical measurements and tomographic image reconstructions.^{3}^{,}^{4} This makes FMT a suitable method for small animal research where fluorophores are designed to label the drugs of interest, and enables the tracking of their delivery process. A successful tomographic reconstruction in FMT typically requires the forward modeling of photon transport inside optically complex tissue. The diffusion equation, an approximation to the more general radiative transport equation (RTE), is usually employed as the forward model for simplicity. However, this approximation becomes invalid when modeling low-scattering and highly absorbing tissue, resulting in potentially inaccurate quantification in small animal imaging where specimens exhibit wide variation in optical properties with possible presence of void regions (lung or bladder, for instance).^{5} Moreover, the diffusion equation cannot accurately represent the propagation of short light pulses in tissue for the portion of photons arriving at the surface early,^{6}^{,}^{7} which has been shown as an effective datatype to improve the resolution in reconstruction when no *a priori* information is available.^{8}9.^{–}^{10}

The Monte Carlo (MC) method is an accurate light propagation modeling approach in dealing with general media such as those in small animals and early photons.^{11}^{,}^{12} This method approximates the RTE solution via random sampling of large numbers of photons, thus it is often used as the gold standard for modeling time-resolved light propagation for all optical properties encountered in these scenarios. Nevertheless, MC-based techniques suffer from low computational efficiency (hours or days of computation)^{13} because numerous photon simulation trials are required to obtain satisfactory statistics. When time-domain data are considered, the photons are split into small time-bins, which leads to an increase in the number of photons to be simulated to achieve reasonable statistics per time window, and make calculations even more time-consuming. Only recently, our group developed a computationally efficient MC-based reconstruction technique for FMT utilizing time-gated data sets.^{14} This voxel-based Monte Carlo (vMC) method allows for computation of functional and fluorescent Jacobians for whole-body tomography within a couple of hours. However, difficulty arises when applying the vMC algorithm, as the commonly employed cubic shape voxels in a regular 3-D grid cannot accurately simulate the curved boundaries. One remedy is to raise the voxel density, but doing so drastically increases the memory and computational burden. This burden is further amplified when using widefield illumination strategies,^{9} where photons interact with a much larger boundary area compared to the conventional punctual light source scheme. Therefore, an MC algorithm that has flexibility in representing the arbitrary domain shape is highly desired.

In this work, we present an optimized software platform to solve the inverse problem in FMT with spatially extended arbitrary sources. The novelty of this approach lies in the accurate boundary representation and rapid calculation when employing the mesh-based Monte Carlo (mMC) method.^{15} Based on a currently available mMC code,^{16} an extended hybrid parallel version was implemented to model arbitrary illumination patterns. A comparison was performed between this method and our previously developed vMC method in terms of quantitative accuracy and computational efficiency for whole-body tomographic reconstruction in time-domain. This method was then validated and evaluated by small animal full-body simulations and experiments in tomographic settings.

## 2.

## Method

## 2.1.

### Widefield Mesh-Based Monte Carlo Method

The mMC method utilizes fast ray-tracing techniques to accelerate calculation and is capable of simulating point illumination on complex geometry.^{15} Here the widefield illumination was developed based on the version 0.8 release of mMC as provided at Ref. 16. In order to simulate a realistic widefield source with spatial intensity variance, the illumination patterns were represented as a two-dimensional (2-D)-grid ($1\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{2}\text{\hspace{0.17em}}\mathrm{pixel}$) images with an intensity value assigned at each grid element [Fig. 1(a)]. The initial positions of the total photons were uniformly distributed over the illumination image by using a uniform random number generator. Any photon falling into one pixel had the pixel’s intensity value assigned as its initial packet weight, allowing for arbitrary illumination settings [Fig. 1(b)]. A simulated photon was then projected along the $z$-axis and entered the mesh at an intersection point on the mesh surface. To identify the surface injection point, we applied a ray-tracing step to obtain its 3-D coordinates by testing all surface triangles using a Havel-Herout ray-tracing algorithm.^{17} In case multiple intersection points were identified [Fig. 1(b) and 1(c)], the intersection with the shortest distance to the initial position [Fig. 1(b)] was selected as the injection point on the surface. After a photon’s injection point was determined, its propagation in the media was simulated following the same rules as in classical MC techniques, until it exited the surface or the total time of interest was reached.

Unfortunately, a complete surface triangle test of this additional ray-tracing step imposed a significant computational overhead, particularly when the surface mesh was dense. For example, the calculation time for the ray-triangle intersection test could be $50\times $ more than that for the photon propagation on a mouse model. In order to accelerate the computation time for the widefield method to match the punctual illumination method, a few algorithmic optimizations were implemented, particularly considering the fact that the $x-y$ coordinate of the injection position is dependent on the photon’s initial position, while the $z$ coordinate is dependent on the intersection of the photon and the surface. First, we confined the initial positions to the region of interest (ROI) and the surface triangles in the ROI were picked out [Fig. 1(c), the dark gray surface]. Second, using the projected triangles on the $x-y$ plane, a subset of triangles was selected and stored in memory. For any launched photon, a culling technique^{18} was then implemented to confine ray-triangle tests only to the triangles with their bounding boxes containing the photon’s $x-y$ coordinate. These operations reduced redundant calculations thus allowed for acceleration in computational time.

## 2.2.

### Inverse Model for Reconstruction

The 3-D distribution of a fluorophore’s effective quantum yield $\eta (r)$ can be obtained by solving an integral equation relating the fluorescence signals at time $t$ and $\eta (r)$:

## (1)

$${U}_{F}({r}_{s},{r}_{d},t)={\int}_{\mathrm{\Omega}}W({r}_{s},{r}_{d},r,t)\eta (r)\mathrm{d}r,$$^{19}thanks to the smaller dataset acquired with widefield illumination strategies.

^{9}

^{,}

^{20}In this method, $W$ is computed by convolving the Green’s functions (the light propagation for impulse sources) and the lifetime decay

^{14}:

## (2)

$$W({r}_{s},{r}_{d},r,t)={\int}_{0}^{t}{e}^{-(t-{t}^{\prime})/\tau}\mathrm{d}{t}^{\prime}{\int}_{0}^{{t}^{\prime}}{G}^{x}({r}_{s},r,{t}^{\prime}-{t}^{\prime \prime})\phantom{\rule{0ex}{0ex}}{G}^{m}(r,{r}_{d},{t}^{\prime \prime})\mathrm{d}{t}^{\prime \prime},$$^{21}and $\tau $ is the lifetime of the fluorophore. The system of linear equations representing the measurements detected at different positions and time can then be solved to obtain the fluorophore distribution.

## 2.3.

### Computational Settings and Efficiency Evaluation

For forward simulations, we incorporated the widefield-related computation in the mMC software platform. In addition to the previously implemented multithreading and single-instruction multiple-data (SIMD) parallelism, we further enhanced the modeling efficiency by integrating Message Passing Interface (MPI)-based parallel computing technique with the code. This allows one to run mMC in a distributed memory system, such as a high-performance cluster. To utilize the maximum computational power of platforms with both multithreads and multinodes, an adaptive hybrid parallelization method using the communication protocol MPI and OpenMP APIs was implemented. The MC approach is highly parallelizable, that is, the large number of photons can be broken up into smaller sets and calculated independently. In terms of distribution, the MPI protocol had photons equally divided and distributed to all nodes, while the OpenMP protocols dynamically set the assignment of photons for the threads on a node. The seed for the random number generator was assigned based on the thread and node ID. A reduce operation was performed to sum up the result to a single node (master node) after all nodes finished computing.

In this particular work, the computational efficiency of the hybrid parallel code was tested using single/dual core CPUs (BlueGene/Opteron on CCNI, RPI). To estimate the scalability of multithreading and multi-CPU calculation, quantitative metrics were computed to measure the performance of the parallel programs. We define the speedup as the ratio of the execution time on a single processor (the sequential version) to that on a multinode cluster. This ratio is defined as

where $n$ is the number of threads/processes used in the parallel simulation, ${t}_{s}$ is the execution time on a single-thread processor, and ${t}_{p}$ is the execution time on a parallel computer. $S(n)$ therefore describes the scalability of the system as the number of threads/processes increases. The efficiency was computed as which describes the fraction of the time that is being used by the processors for the computation.## 2.4.

### Accuracy Comparison

Due to a wider illuminated surface, the simulation result of widefield illumination can be more sensitive to errors of boundary modeling than conventional point illumination used in optical tomography. To quantitatively assess the performance of mMC and vMC dealing with different types of illumination on complex models, a simulation using high-resolution voxel-based model was first set up as the reference. The light propagation profiles using both a mesh-based model and a resampled low resolution voxel-based model with the same number of tetrahedral elements/valid voxels (voxels that are not air) were then computed.

To mimic a preclinical tomographic reconstruction, the high-resolution Digimouse model ($0.1\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{3}$) generated from CT data^{22} was employed to create the mesh model and low-resolution voxelized model (Fig. 2). The software package Iso2mesh^{23} was used to convert the volumetric model to a mesh model containing 4075 node, 22,379 tetrahedron elements. The corresponding low-resolution voxel-based model had a comparable number of 22,433 valid voxels resulting in $1\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{3}$ voxels, which is the typical voxel size in preclinical applications. Notice here the mesh model was homogeneously tesselated with no denser sampling around the boundary or internal organs to assure an impartial comparison with the homogeneously voxelized model. The illumination pattern covered a 5- to 7-cm area with full coverage of the transverse plane. A cross-sectional plot is shown in Fig. 2(b) for mesh- and voxel-based model, respectively. In all simulations above, the numbers of photons were kept at ${10}^{8}$ according to Ref. 19. The optical properties were set to ${\mu}_{a}=0.3\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{cm}}^{-1}$, ${\mu}_{s}^{\prime}=15\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{cm}}^{-1}$, $g=0.9$, $n=1.37$, which are typical values for mouse tissue in the NIR spectral region in the literature.^{24} The time profiles were recorded with a 300 ps gate width and 20 ps time shift between the gates to replicate the RPI imaging platform for optimal experimental performance.^{25}

The time-gated profiles of light propagation using the mesh- and high-resolution models were scaled down to the size of the low-resolution profile, then the percentage error of mesh- and low-resolution models in relation to the high-resolution model was calculated to assess the accuracy quantitatively. Further evaluation of the methods were conducted by calculating the overall root mean square difference (RMS difference) between the results using the two methods and high-resolution standard:

## (5)

$$\mathrm{RMS}(t)=\sqrt{\frac{\sum _{i=1}^{N}{[p(t)-{p}_{0}(t)]}^{2}}{\sum _{i=1}^{N}{p}_{0}{(t)}^{2}}},$$## 2.5.

### Tomographic Reconstruction Settings

## 2.5.1.

#### Simulation reconstructions

The above mentioned models were utilized for a full tomographic reconstruction study. The high-resolution mouse model was employed to generate fluorescence measurements with the pre-segmented kidneys labeled with simulated fluorescent markers (effective quantum yield 0.1; $\tau =0.8\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ns}$). 64 sliding half-space illumination patterns and 64 point detectors were simulated within the abdomen section (x: 50 to 70 mm, y: 8.5 to 28.5 mm, 400 pixels for each pattern) using 512 single-thread nodes on BlueGene. The detectors were evenly spanned over the region of interest with a 2.857 mm separation [Fig. 3(a)]. A 25% rising gate (early gate) was selected and the Born normalization^{26} of the corresponding time point spread functions (TPSFs) was employed as the datatype for reconstruction herein. The conjugate gradient method was used to solve the inverse problem and the iteration was stopped when the relative error was less than 0.02.

## 2.5.2.

#### Experimental validation

The performance of this method was also evaluated with experimental data for FMT. The data were acquired from an *in vivo* experiment using a time-resolved widefield tomography platform (the details of the system can be found in Ref. 25). The system employed a tunable femtosecond laser as the source and a time-gated ICCD camera as the detector. A pico-projector module was used for source pattern generation, allowing for rapid acquisition of spatially and temporally dense measurements. The experiment was performed on a freshly euthanized mouse with a 13-mm-long tube with 1 mm diameter placed in the thoracic cavity [Fig. 4(a)].^{27} The tube contained 14 pmol of IRDye 800CW R(LiCOR) in 1 *μ*L ethanol. 64 sliding bar-shaped illumination patterns were projected over a $33\times 18\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ area (594 pixels to represent each pattern) on the abdomen with 97 point detectors ($0.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{2}$ areas) measuring the transmitted excitation and emission photons. The detectors were empirically selected from the high-resolution CCD camera images over the area where fluorescence signals were detected, with denser detectors at the highly fluorescing region. We acquired a profile covering 4.6 ns with 40 ps offset between two successive gates. The animal was then subsequently scanned by Micro CT (Scanoco Viva CT40) and the CT images were registered with the optical platform to locate the fluorescent tube.

The surface geometry of the region of interest ($40\times 31\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$) was extracted to generate a homogeneously tesselated model with 92,713 elements and 15,581 nodes for the mesh-based weight matrix calculation. The entire model was assigned the average background properties of the small animal acquired by spectroscopic fitting of the excitation measurements (${\mu}_{a}=0.3\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{cm}}^{-1}$ and ${\mu}_{s}^{\prime}=25\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{cm}}^{-1}$). The results of the forward simulations were stored in both node-based format (photon package weights are accumulated at each node) and element-based format (the result is saved for each element) to calculate the weight matrix. The number of nodes was only 16.8% of that of elements, thus the node-based format had advantages in both storing space and speed for data processing. The reconstructions employing weight matrices using the two formats were compared, to assess the possible loss of accuracy of using node-based format. The early-gate at 25% of the maximum value was selected in reconstruction, similar to the synthetic study. For comparison, a reconstruction employing the same measurement dataset on a voxel-based model with 92,831 non-air isotropic $0.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{3}$ voxels was performed.

## 3.

## Results

## 3.1.

### Computational Efficiency

Our implementation allows the generation of spatially complex illumination patterns over arbitrary oriented surfaces to accurately model noncontact widefield illumination strategies. After the optimizations mentioned in Sec. 2.1, the overall computational overhead of simulating extended time-resolved sources in this complex scenario was only marginally larger (5% to 10%) over a point source configuration, while providing great flexibility to define complex sources.

Figure 5(a) shows the speedup of mMC calculation while using different threading settings with ${10}^{7}$ and ${10}^{8}$ photons. For simulations using a larger number of photons, the speedup curve has a marked improvement toward the ideal curve, implying higher efficiency ($E(16)=93.2\%$ for ${10}^{7}$ photons and $E(16)=99.3\%$ for ${10}^{8}$ photons with single-thread computation setting). The difference between the three threading methods is minimal ($<3\%$) when the same number of photons is employed, with pure threading being the closest to ideal. However, the current generation of multicore hardware has a limitation on total number of threads and thus limits the speedup potential of threading-only technique.

Figure 5(b) depicts the time cost for mMC and vMC using different number of processes under single-thread parallel settings while having similar numbers of elements/valid voxels. Under all available processes settings (128 to 4096), mMC remains roughly four to five times faster than vMC, while for large number of processes (4096) the program approaches its scaling bottleneck.

## 3.2.

### Accuracy of Time-Domain Jacobians

Figure 6(a) and 6(b) shows the forward photon propagation profiles at $y=60\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ for the full field and point illumination, respectively. As expected, mMC fits the boundaries much better compared to vMC for widefield case, while both mMC and vMC matches well to the standard for point source case. Figure 6(c) presents the TPSFs for these two illumination settings at the same detector indicated in Fig. 6(a) and 6(b). Both the results using vMC and mMC methods fit the standard well for the point illumination, while the TPSFs for widefield illumination result in increased error for vMC compared to mMC, especially around the rising part (early gates) of the TPSF. A comparison of the different datatype extracted from the time-resolved measurements at these two detectors are listed in Table 1, with the continuous wave (CW) measurement being a percentage error and the other parameters as the absolute errors in time. The error in the first moment of TPSF (CW measurements) reflects the integrated intensity difference and that in the second and third moments (mean flight time and variance) represent a change in position and shape of the TPSF. Overall, the mMC results are more accurate than the vMC ones, while the inaccuracy at the side detector (detector 1) is greater than that at the center detector (detector 2), due to the fact that the side detector is closer to the edge, thus more affected by the mismatched boundary area. Comparing the widefield and point illumination, the error is greater in the widefield case as expected, as a result of an illumination pattern probing larger boundary area where mismatch boundaries exist.

## Table 1

Error table comparing the low-resolution vMC and mMC with the standard. The CW measurement error is in percentage, the others are in absolute time.

CW(%) | Tmax (ps) | Mean flight time (ps) | Variance (ps) | ||||||
---|---|---|---|---|---|---|---|---|---|

Det 1 | Det 2 | Det 1 | Det 2 | Det 1 | Det 2 | Det 1 | Det 2 | ||

Widefield | mMC | −1.29 | −0.81 | 40 | 20 | 20.81 | 15.94 | −3.93 | −2.19 |

vMC | −3.35 | 1.99 | 80 | 0 | 67.83 | 18.81 | −8.51 | −3.00 | |

Point | mMC | −1.62 | −0.38 | 0 | 20 | −6.96 | 20.93 | 1.55 | −3.43 |

vMC | 2.48 | 0.73 | 0 | 20 | 21.12 | 18.81 | −3.30 | −3.00 |

The contours of continuous wave Jacobians using both the mMC and vMC methods are shown in Fig. 7(a) and 7(b) for widefield and point illumination, respectively. In Fig. 7(c) and 7(d), the Jacobian contour comparison at the 25% rising gate is displayed. In both cases, contours using the mMC and vMC methods match the reference contour well for the point-point SD pair. However, the contour using the mMC method fits the reference much better than that using the vMC method in a widefield-point SD setting, especially around the boundaries where the voxel model experiences increased mismatch to the high-resolution model. An average RMS difference of 10.8% and 9.4% is achieved in the point-point SD pair Jacobians using vMC and mMC, respectively, while the RMS difference is 23.5% and 13.4% for the widefield-point SD pair. The values may be compared to a baseline RMS difference of 2.4% and 3.8%, obtained using two high-resolution simulations with ${10}^{9}$ photons and different random seeds. The Jacobian accuracy comparison in time-domain is shown in Fig. 7(c). For the point-point SD pair, the RMS differences using the two methods do not experience significant change with respect to the changes in gate position. However, for the widefield case, the mMC has a considerably lower RMS difference when compared to the vMC for early gates.

## 3.3.

### Simulation Reconstruction Performance

The reconstruction using mesh-based MC methods is shown in Fig. 8. The kidneys are accurately reconstructed in regard to their original discretization. The maximum reconstructed value for the mesh-based MC is 91.0% of the expected value. The centroids of the two kidneys have position error of 1.02 mm and 0.78 mm, respectively. The maximum reconstructed value for the mMC method is 3.4% more accurate toward the expected value compared to the vMC method. The run time for creating the time-domain mesh-based Jacobian was 3.47 h for $64\times 64$ SD pairs (in total 4096) on 512 nodes.

## 3.4.

### Experimental Comparison

The 50% isovolume of the reconstructed effective quantum yield using the mMC node-based results is shown in Fig. 4(a). This isovolume has a maximum length of 11.3 mm and mean diameter of 2.7 mm. Moreover, sample slices of cross-sections overlaid on the corresponding CT slices demonstrates the accurate localization of the inclusion. The element-based (92,713 elements) and voxel-based (15,581 nodes) reconstructions are also shown for comparison in Fig. 4(b) and 4(c), respectively. The node-based and element-based weight matrices result in very close reconstructions in both quantification ($<2\%$ difference in maximum reconstructed value) and position of the inclusion. The centroid position error of the 50% isovolume using vMC is 2.83 mm while that using mMC is 0.56 mm.

## 4.

## Discussion

In this study, we successfully implemented the widefield illumination for calculating the propagation of light through arbitrary 3-D media in mesh-based models. After implementing an optimized ray-test step for determining the injecting point for each photon, computational time for widefield illumination simulations becomes only marginally longer (5% to 10%) than point-source simulations. This is a remarkable improvement compared to the 50 times increase when no optimization was employed in the widefield illumination.

Moreover, the parallelization of the code is a significant advancement of the computation capacity. The aim of developing a mixed mode MPI/OpenMP code was to utilize the full computation availability of single symmetric multiprocessors (SMPs) or a cluster of SMPs, and achieve a performance improvement over a pure MPI code. As expected, when the number of threads/processes is the same, the most efficient parallelization implementation is using pure multithreading (OpenMP). Although the number of threads on a single SMP is restricted at current time (usually less than eight), larger systems of single SMP could become available in the foreseeable future with 64-bit memory addressing. The efficiency comparison of hybrid code with the pure MPI implementation reveals that a performance improvement is achieved: the overall computational time has reduced and the scaling of the code with increasing thread number is better. Hence a combination of MPI and OpenMP parallelization paradigms provides more efficient parallelization strategy. In addition, the hybrid code makes full use of the memory on each node due to the shared memory mechanism, allowing for the possibility of running larger-scale programs. However, the improvement in time is quite small since the pure MPI parallelization algorithm for forward MC method is optimized and scales very well already, with no significant impairment due to load imbalance or memory limitation. Overall, the three parallelization implementations have excellent scalability with almost linear acceleration when a parallelized platform is available. Thus when more threads/processes are utilized, the faster the simulation will be.

The widefield implementation enables the ability to perform simulations with spatial variation in illumination intensity, which is essential for the emerging pattern light tomography applications.^{9}^{,}^{28}29.^{–}^{30} Compared to the conventional punctual illumination, the widefield light illumination results in more errors on the boundaries where the model mismatch exists due to the larger illumination area, in both the forward propagation and Jacobians. When the time-gated information is considered, the RMS difference comparison represents a significant improvement in accuracy for mMC at early gates, yet with an acceleration of $\sim 5\mathrm{x}$ compared to vMC. This is of importance as the early gates are not modeled accurately by diffusion equation while the incentive of using early gates to increase spatial resolution in FMT is rising. However, this accuracy improvement in Jacobian does not necessarily relate to the reconstruction performance (13.2% discrepancy in the RMS differences for Jacobian at the selected time gate, 3.4% difference in the maximum reconstructed value comparing the mMC and vMC reconstruction). This can be explained by the fact that the disparity mainly exists around the boundaries while the expected fluorescent objects are closer to the center.

In the current version of the code, the time-of-flight between the illumination plane and the object surface was ignored, however this can be easily corrected with a postprocessing. In addition, according to the light propagation velocity in air, the maximum error in time for the typical specimen thickness (around 2 cm) in preclinical applications is $\sim 3\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ps}$, which is negligible compared to the minimum time interval of 20 ps for the time-resolved platform. Another potential difficulty for *in vivo* assessment may reside in the availability of an anatomical atlas, as high-resolution imaging modalities are required to obtain accurate surfaces for mesh-based reconstructions.

As in a few examples, this code can be used for investigating the fluorophore distribution with time-gated data type while serving as the forward solver for MRI-guided FMT. In this work, the mesh model for the forward and reconstruction processes was confined in homogeneous tissue and uniformly tessellated; however, it is noteworthy that it can also present in a highly heterogeneous medium with arbitrary internal/external boundaries while retaining a small number of nodes and elements, thanks to its flexibility. The representation of mesh-based models for high-resolution 3-D anatomic images can be further optimized (e.g., reducing the nodes while keeping the geometry unchanged) to benefit the accuracy of the reconstructions.

The rationale of similar results using a node- and element-based weight matrix is that these methods render essentially the same spatial profiles of photon propagation, though the number of nodes for both cases are much less than that of tetrahedron elements (15,581 nodes compared to 92,713 elements). Furthermore, in the ill-posed inverse problem, the effective information mainly relies on the measurements but not the image-space spatial distribution when the number of measurements (dependent on the number of SD pairs and time gates selected) is much less than that of the unknowns. Casting nodes as the unknowns in the inverse problem leads to a much smaller unsolved linear system,^{31} which is superior as the memory constraint is usually a burden in reconstructions, especially when the time-resolved data are employed to provide extra information. Due to this advantage in memory utilization, employing nodes instead of elements also allows for the use of denser source-detector pairs, which is crucial for improving resolution of the reconstructed objects.

In addition, the formulation for diffuse optical tomography (DOT) using the MC method is readily available for the adjoint method, thus the mMC method can be easily applied to DOT to quantitatively determine the functional parameters (blood volume and hemoglobin oxygenation) related to the optical properties as well.^{32} As FMT/DOT is becoming frequently used in combination with high-resolution imaging methods such as MRI, CT, x-ray, and ultrasound,^{33} mMC can serve as an effective tool to incorporate high-resolution structural information into FMT/DOT with limited spatial resolution for quantitative functional and molecular studies.

## 5.

## Conclusion

This work focuses on assessing the feasibility and evaluating the computational performance of the widefield mMC algorithm for time-resolved FMT, and the improvements on the quality of models with complex boundaries. We demonstrate that the mMC method is a computationally efficient solution for tomographic use when modeling of general media is required (about five times faster than vMC). Simulations on a complex digital mouse phantom establish that both mMC and vMC match well with the high-resolution vMC output in temporal and spatial profiles for punctual SD settings; however, for widefield illumination, the accuracy for mMC is improved compared to that for vMC. Moreover, with the recent progress of MC simulation using GPU,^{34}^{,}^{35} it is expected that a full reconstruction can be finished with a typical personal computer in the time frame comparable to the classic DE-based models. The current version of this code is available for download as MMC v1.0 at http://mcx.sourceforge.net/mmc/, and more features may be added, such as curved illumination patterns using a surface mesh.

## Acknowledgments

This work was supported by National Institutes of Health (NIH) grant R21 EB013421 and by the National Science Foundation (NSF) grant CBET-1149407. Qianqian Fang wishes to thank the funding support from the Bill & Melinda Gates Foundation (#OPP1035992).