## 1.

## Introduction

Diffuse fluorescent tomography (DFT) is being increasingly used as a noninvasive and highly sensitive method to image the fluorescent properties of tissues that report physiopathological processes in intact organisms, preferably using near-infrared radiation.^{1}2.3.^{–}^{4} And its methodology has been extensively explored within the framework of three measurement techniques in diffuse optics.^{5}^{,}^{6} In comparison with the continuous-wave (CW) and frequency-domain (FD) modes, the time-domain (TD) approach acquires much richer information about the fluorophore distribution in a living tissue and, therefore, offers potential to achieve higher spatial resolution, better recovered contrast, as well as effective separation among multiple properties (yield and lifetime) and compositions.^{7}8.9.^{–}^{10} Gao et al. have developed a finite-element time-difference photon diffusion modeling to solve the time-dependent coupled diffusion equations. Results show that the TD–DFT simultaneous reconstruction of the fluorescent yield and lifetime in a two-dimensional (2-D) circular domain can achieve superior quantitative accuracy, spatial resolution, and noise robustness, than using CW or transformed FD reconstructions.^{11} Zhu et al. have also applied finite element based TD–DFT model to a three-dimensional (3-D) mouse model. Simulative results demonstrate that the incorporation of early and late time-bins can effectively improve the image accuracy.^{12} However, since the photon migration modeling, i.e., the forward calculation, normally requires adequate recursive steps to temporally resolve the relevant physical quantities (boundary flux and interior photon density) at a numerically stable interval, the conventional approach of TD–DFT image reconstruction that explicitly uses the full time-resolved (TR) data with no parallelization potentials usually leads to a lengthy computational process, especially for 3-D scenarios and large-sized tissues, such as those in small animal whole-body imaging or optical mammography.^{13}14.15.^{–}^{16}

Until now, many approaches have been proposed to reduce the computational complexity of the TD–DFT problems.^{17}18.19.^{–}^{20} Holt et al. have proven that proper multiple time gates are important to provide superior image accuracy in TD–DFT when compared to the CW-DFT method.^{17} Furthermore, an overlap time gate is applied by Li to achieve better spatial resolution and fidelity, compared to those using the CW mode and traditional multiple time gate.^{21} Most of the aforementioned approaches are based on a featured-data scheme utilizing either Mellin or Laplace transformation of the time-dependent diffusion model, which has been proven to be computationally simple and demonstrate redundancy reduction.^{19}^{,}^{20} However, since the high-order Mellin or real-number Laplace transformation is prone to be highly correlated, increasingly subjected to noise, and/or mathematically incomplete, simulative and experimental reconstructions with such a scheme both exhibit inferior quantitative performance and poor spatial resolution than directly using the full time-resolved measurements, severely limiting its application in high-fidelity imaging. Therefore, to uncover the potential of TD–DFT for clinically required accuracy and spatial details in an effective way, there is a critical need to effectively use the full profiles of the TR measurements.

Since the TD system is mathematically equivalent to the combination of many FD ones employing a wide range of modulation frequencies, a Fourier transformation can therefore be used to possess the approximate optical information about the fluorophore distribution in a TD measurement mode. Recently, Pulkkinen et al. have employed a truncated Fourier-series approximation to simplify the finite-element solutions of the TD radiative transfer equation. Simulations on a homogeneous rectangular computational domain show that although the method with truncated Fourier-series approximation requires more memory and time to solve the equation of a singular frequency component, than that with the time-difference solution of a single temporal step, the total time of the former with 35 frequency components is only 1/5th that of the latter with 400 temporal steps, which greatly shortens the calculation time.^{22} However, the studies are only for the forward calculations of a 2-D domain. Ziegler et al. have applied the Fourier transformation to the nonlinear DFT reconstruction of a 3-D compressed breast-simulating phantom in a transmission-reflection measurement pattern, where the nearest source–detector distance is 10 mm, and only three angular frequency components are needed to obtain an accurate TR data.^{23} However, the number is severely insufficient for applications with a smaller source–detector distance or larger size, where quite a number of frequency components and a long computation time are required. On that occasion, it is more significant to expand the truncated Fourier-series strategy to the full TR-DFT reconstruction problem, where a great challenge lies in the computational efficiency and severely increased unsuitability of obtaining the accurate anatomical structure and pathological information, due to the rapid growing in the number of the unknowns and the nonlinear dependency of the measurements on the optical properties. Conveniently, a parallelization strategy is necessary and easily utilized in the reconstruction.

To better take advantages of the TR measurements while effectively reducing the lengthy computation time incurred from the requisite temporal steps, we herein propose a parallelized Fourier-series truncated diffusion approximation (DA) to accelerate the full TR-DFT reconstruction. In our scheme, the time-dependent photon density is expanded to a Fourier series, based on which the TD diffusion equation (DE) can be solved through a series of independent FD–DEs at multiple sampling frequencies. Moreover, a combined multicore CPU-based coarse-grain and multithread GPU-based fine-grain parallelization strategy is employed to accelerate the process. The resulting temporal point spread functions (TPSFs) are tested in comparison with those of TD finite-difference (TDFD) solution and Monte Carlo (MC) simulation. With such forward calculations, both the TD and FD inversion procedures are developed based on a block algebraic reconstruction technique (ART). Simulations and phantom experiments are performed to compare with the explicit TD scheme to validate the accuracy and computational resources of the proposed method.

## 2.

## Methods

In general, the goal of DFT reconstruction (referred to as the inverse problem) is to quantitatively derive the distributions of the fluorescent properties in a tissue domain, by fitting the model predictions (referred to as the forward problem) to the diffusive excitation and emission light measurements on the boundary.

## 2.1.

### Fourier-Series Truncated DE for Full TR Data

For the TD forward problem, the model-predicted boundary flux is calculated using the Fick’s law with the Robin boundary condition, under the Galerkin finite-element method (FEM) to transform the continuous DE into an $N$-dimensional discrete problem of finding the nodal field photon density $\mathrm{\Phi}({r}_{i},t)$ at all nodes ${r}_{i}(i=\mathrm{1,2},\cdots ,N)$; the resulting matrix equation can be given by^{24}

## (1)

$$\{\begin{array}{l}[\mathbf{K}({\kappa}_{\nu})+\mathbf{C}({\mu}_{a\nu}c)]{\mathbf{\Phi}}_{\nu}(t)+\mathbf{B}\frac{\partial {\mathbf{\Phi}}_{\nu}(t)}{\partial t}={\mathbf{Q}}_{\nu}(t)-{\mathbf{b}}_{\nu}(t)\\ {\mathbf{\Gamma}}_{\nu}(t)=\frac{c}{2}\frac{1-{R}_{f}}{1+{R}_{f}}{\mathbf{\Phi}}_{\nu}(t)\end{array},$$To reduce the amount of the full TR data that is too large to be handled on the hardware within an acceptable period, one can either use the temporal filters or carry out the calculation in the FD. In our approach, we choose the second strategy by approximating the TD photon density $\mathrm{\Phi}(r,t)$ as a weighted combination of the truncated Fourier series at multiple frequency components:

## (2)

$$\mathrm{\Phi}(r,t)=\sum _{k=-{N}_{\omega}}^{{N}_{\omega}}\mathrm{\Phi}(r,{\omega}_{k})\mathrm{exp}(j{\omega}_{k}t),$$Thus, the TD forward calculation can be obtained by solving a series of uncorrelated FD–DEs.

## 2.2.

### Full TR–DFT Method with Fourier-Series Truncated DE

Before directly using the TR measurement for the image reconstruction, several necessary data preprocessing steps are acquired. First, as only a finite interval of the collected TPSF data is of interest, an identical start–end time is chosen to cover the most effective range of the TPSF curve, for an efficient implementation of the subsequent Fourier transformation. Second, to reduce the amount of calculation and effectively enhance the noise robustness of the reconstructions while still preserving the advantages of the full TR scheme, an overlap-delaying time gate method is employed to deal with the measurements.^{21} Third, a self-normalized data type as described in Ref. 11 is employed in the research to avoid the required scaling-factor calibration and spectroscopic characterization of the experimental system, and effectively reduce the adversity of system fluctuations that may occur between excitation and emission measurements.

To simplify the validation of the proposed method, we only consider the reconstruction of fluorescent yield, whose Jacobian matrix can be efficiently calculated by

## (4)

$$J({\xi}_{d},{\zeta}_{s},{\omega}_{k})\phantom{\rule{0ex}{0ex}}=\frac{c}{W}\sum _{j}{\langle {G}_{m}({\xi}_{d},\mathbf{r},{\omega}_{k})\xb7{\mathrm{\Phi}}_{x}(\mathbf{r},{\zeta}_{s},{\omega}_{k})\xb7\frac{1}{1+i{\omega}_{k}\tau}\rangle}_{{\mathrm{\Omega}}_{n\to j}}V({\mathrm{\Omega}}_{n\to j}),$$^{25}$\tau $ is the fluorescence lifetime.

At last, the block ART technique is employed here for the reconstruction of the fluorescent properties, allowing a group of equations that belong to the same source–detector pair to perform concurrently in each block.^{26}

According to the data processing in the TD or FD, we herein develop two inversion procedures, with the diagrams shown in Fig. 1. Figure 1(a) is the flow chart of the Fourier-series truncated DA-based DFT with time-block ART. In this DFT scheme, the TR measurements ${\widehat{\mathrm{\Gamma}}}_{x}({\xi}_{d},{\xi}_{s},t)$ and ${\widehat{\mathrm{\Gamma}}}_{m}({\xi}_{d},{\xi}_{s},t)$ are preprocessed using the overlap-delayed time-gating method; then its self-normalized data type $\chi ({\xi}_{d},{\xi}_{s},t)$ is ready for the reconstruction. On the other hand, the Green’s function ${G}_{m}({\xi}_{d},\mathbf{r},\omega )$ and the photon density ${\mathrm{\Phi}}_{x}(\mathbf{r},{\zeta}_{s},\omega )$ that are calculated from Eq. (3) are used for producing the Jacobian matrix $J({\xi}_{d},{\zeta}_{s},{\omega}_{k})$ according to Eq. (4). After an inverse Fourier transformation, the TD Jacobian matrix $J({\xi}_{d},{\zeta}_{s},t)$ can be applied to the time-block ART. Another inversion procedure is the Fourier-series truncated DA-based DFT with frequency-block ART, as the flow chart shown in Fig. 1(b). The Fourier-transformed frequency measurements $\chi ({\xi}_{d},{\xi}_{s},{\omega}_{k})$ are herein directly used in the frequency-block ART, with the FD Jacobian matrix $J({\xi}_{d},{\zeta}_{s},{\omega}_{k})$.

## 2.3.

### Selection for the Sampling Frequency

According to the FD sampling theorem, the sampling pulse function is ${\delta}_{{\omega}_{s}}(\omega )=\sum _{n=-\infty}^{\infty}\delta (\omega -n{\omega}_{s})$, where ${\omega}_{s}=(2\pi /{T}_{s})$ (${T}_{s}$ denotes the sampling period) is the sampling frequency.

Since the frequency spectrum of the temporal shaped incident light pulse is continuously decaying in amplitude as a function of frequency and the higher-frequency components always show lower signal-to-noise ratios, the truncated Fourier-series approximation only takes into account the ${N}_{\omega}$ smallest frequencies of the incident light.

The periodicity of the radiance due to truncated Fourier-series approximation is not always physically realistic. According to the diffusion theory, the light–tissue interaction will cause the spreading of the propagating light pulse in temporal and spatial domains. The further away from the light source the pulse is inspected, the more spreading it will generally be, causing narrowing of its Fourier spectrum, which means that with the increase of the source–detector distance, fewer Fourier components are required to obtain the approximate pulse of a certain accuracy.

Therefore, to recover the TPSF curves with a balance of accuracy and computation time, we herein use the maximum measured time ${T}_{m}$ of all the detectors, based on the Nyquist theorem ${T}_{s}>2{T}_{m}$, to decide the sampling rate. In this work, a GPU-accelerated fluorescence MC simulation is used to estimate the maximum measured time ${T}_{m}$.^{27}

## 2.4.

### Parallelization Strategy

Due to the intrinsic parallelization mechanism of the frequency components $\mathrm{\Phi}(r,{\omega}_{k})$, as well as the element-dependent feature of the stiffness matrices $\mathbf{M}$ and $\mathbf{B}$ in the forward calculation and the Jacobian matrix $\mathbf{J}$ in the inverse implementation,^{28} the FD–DEs in Eq. (3) under multiple frequencies can be easily paralleled using a combined multicore CPU and multithread GPU parallelization strategy.

The flow chart of the parallelization strategy for the full TR-DFT reconstruction is illustrated in Fig. 2. First, an ${N}_{\omega}$-core CPU is employed for the coarse-grain parallelization, on which each FD–DE is performed independently. Second, in each FEM solution of FD–DE, the time-consuming calculations of the stiffness matrices $\mathbf{M}$ and $\mathbf{B}$, and the Jacobian matrix $\mathbf{J}$ are concurrently executed using thousands of threads in the GPU, respectively. Furthermore, in the multisource incentive issues, the FD–DE is often solved using the lower upper (LU) decomposition in the real field form. However, due to the lack of structure of the sparse matrix and the dependency among the elements, the LU decomposition is quite time-consuming and unable to be accelerated by optimization of the procedure or other parallelization strategy. For this purpose, we use the following matrix manipulation under a GPU acceleration mechanism:

## (5)

$$\left[\begin{array}{l}\mathrm{Re}{\mathbf{\Phi}}_{\nu}({\omega}_{k})\\ \mathrm{Im}{\mathbf{\Phi}}_{\nu}({\omega}_{k})\end{array}\right]=\left[\begin{array}{l}({\mathbf{B}}^{-1}\mathbf{M})/{\omega}_{k}\mathbf{E}\\ -\mathbf{E}({\mathbf{B}}^{-1}\mathbf{M})/{\omega}_{k}\end{array}\right]{[({\mathbf{MB}}^{-1}\mathbf{M})/{\omega}_{k}+{\omega}_{k}\mathbf{B}]}^{-1}\phantom{\rule{0ex}{0ex}}\left[\begin{array}{l}\mathrm{Re}[{\mathbf{Q}}_{\nu}({\omega}_{k})-{\mathbf{b}}_{\nu}({\omega}_{k})]\\ \mathrm{Im}[{\mathbf{Q}}_{\nu}({\omega}_{k})-{\mathbf{b}}_{\nu}({\omega}_{k})]\end{array}\right],$$The aforementioned accelerated calculations using the multithread GPU are referred to as fine-grain parallelization strategy. The whole GPU is divided into ${N}_{\omega}$ blocks, each corresponding to a frequency component and performing respective calculations, leading to thousands of speed-ups in theory. Overall, the proposed parallelization strategy could effectively accelerate the calculation of the Fourier-series truncated DA.

## 3.

## Results

## 3.1.

### Forward Results from Simulations

To simulate the fluorescent target embedded in the tissue, we consider throughout this study a fluorescent cylindrical region with a radius of $r=3.75\text{\hspace{0.17em}}\text{\hspace{0.17em}}mm$ and a length of $L=40\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, embedded in a cylinder background with a radius of $R=\phantom{\rule{0ex}{0ex}}15\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ and a length of $L=40\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ at the location of ($\rho =\phantom{\rule{0ex}{0ex}}7.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, $\theta =0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{deg}$, $\mathrm{Z}=20\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$), with the top view shown in Fig. 3(a). The optical properties of the background and target are assumed to be the same at both the excitation and emission wavelengths with the values of ${\mu}_{ax,m}^{(B)}={\mu}_{ax,m}^{(T)}=0.004\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$ and ${\mu}_{sx,m}^{(B)}={\mu}_{sx,m}^{(T)}=1.0\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$ ($B$ denotes the background and $T$ denotes the target), which is well justified since the optical properties change slightly with wavelength in the spectral range considered, and the Stocks shift of the fluorescent dye is comparable to the width of its fluorescence band. The fluorescent yield and lifetime are set to be $\eta {\mu}_{af}^{(B)}=1\times {10}^{-5}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$, $\eta {\mu}_{af}^{(T)}=\phantom{\rule{0ex}{0ex}}0.001\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$, and ${\tau}^{(B)}={\tau}^{(T)}=1000\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ps}$. For the forward model, the excitation re-emission and fluorescent emission are temporally measured at $\theta =0$, 90, and 180 deg of the horizontal plane of $Z=20\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ (shown as D1, D2, and D3 in the figure, respectively), under a $\delta $–shaped exciting laser pulse at a polar angle of $\theta =0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{deg}$. The FEM mesh contains 60,750 triangle elements that join at 11,536 nodes. A workstation with eight 3.60 GHz processors of Intel Core i7-3820, a total RAM of 32 GB, and a GPU card of Nvidia GeForce GTX 660 is used as the execution platform. All the eight CPU cores are used for the parallelized process.

To investigate the accuracy of the parallelized Fourier-series truncated diffusion approximation (PFST–DA) in recovering the full TR data, we compare the normalized TPSF curves of the excitation and emission at the measuring sites, using the proposed PFST–DA, TDFD, and MC methods. According to the prior knowledge of the fluorescence MC simulation, the maximum flight times of the fluorescence signal at the farthest and nearest measuring positions are 8000 and 125 ps, respectively. Thus, the fundamental frequency is $\pi /8000$, and the total number of Fourier components is 64. To achieve an adequate accuracy of the forward calculation while avoiding an excess computational cost,^{11} 400 temporal steps and ${10}^{10}$ photon packages are used for the TDFD solution and the GPU-accelerated fluorescence MC simulation, respectively. Figures 3(b)–3(d) are the normalized TPSF curves of the excitation and emission light at the measuring sites using the three methods. From left to right are the light detected at D1, D2, and D3, respectively. For the excitation light, the max relative errors of the PFST–DA method to the TDFD method and MC simulation at the three detection sites are $<0.24$ and 0.2%, respectively. Although the emission results of the PFST–DA are a little worse than the excitation ones, the values are still within 0.35%. As to the computation time, the one with the PFST–DA scheme is 205.46 s, much less than that of the TDFD scheme by 2655.17 s.

To verify the accuracy of the selection criterion for the sampling frequencies, we assess the influence of the number of sampling frequencies on the forward calculation. A series of ${N}_{\omega}=20$, 40, 64, and 160 sampling frequencies are used for the PFST–DA method with the detection site D2 shown in Fig. 3(a). Compared normalized TPSF curves for the excitation and emission light are shown in Figs. 4(a) and 4(b), respectively. The maximum relative difference between that of ${N}_{\omega}=64$ and that of ${N}_{\omega}=160$ is only 0.08%, which is considerably accurate. It is worth noting that, for assuring an adequate accuracy of the forward calculation without any excess computational cost, the number of the sampling frequencies is enough to be decided according to the selection criterion in Sec. 2.3.

## 3.2.

### Reconstruction Results from Simulations

The performance of the proposed method is evaluated for a cylinder background with a radius of $R=15\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ and a length of $L=40\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, embedded with one and two fluorescence targets, called samples 1 and 2. A total of 64 coaxially aligned source–detector optodes are hierarchically and equally distributed around the cylinder by four layers at the horizontal planes of $Z=5$, 15, 25, and 35 mm. To make the simulated data realistic, the measurements are generated by a GPU-accelerated MC simulation with ${10}^{10}$ photon packages for each source radiation. We compare four reconstruction methods under the ART technique: the CW method with ART, the TDFD method with time-block ART (tTDFD), the PFST–DA method with time-block ART (tPFST–DA), and the PFST–DA method with frequency-block ART (fPFST–DA).

First, sample 1 is employed to compare the reconstruction results in terms of the maximum and the full width at half maximum (FWHM), respectively. The fluorescence target in sample 1 locates at ($\rho =7.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, $\theta =0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{deg}$, $\mathrm{Z}=22.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$) with a radius of $r=2.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ and a length of $l=15\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$. The reconstructed yield images at the horizontal slice of $Z=22\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ and a vertical slice of $Y=0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ using the four reconstruction strategies with a target yield of $\eta {\mu}_{af}^{(T)}=0.001\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$ are shown in Fig. 5; so are the corresponding $X$-profiles and $Z$-profiles. Similarly, Figs. 6(a) and 6(b) and Figs. 6(c) and 6(d) are the corresponding profiles for the target yield of $\eta {\mu}_{af}^{(T)}=0.002$ and $0.003\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$, respectively. Further comparisons among the four reconstruction strategies with respect to the reconstructed peak value and FWHM are shown in Table 1. It can be observed that the locations and sizes recovered by the three full TR methods are more in agreement with the true locations. The peak values of the tPFST–DA method at the three realistic targets are 2.16, 2.05, and 2.02%, a little lower than that with the tTDFD method, whereas in the fPFST–DA method, the values are 3.33, 4.26, and 4.58%, respectively. As for the computational time, it takes $\sim 624.56\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{min}$ for the tTDFD method to reconstruct the yield images, which is $\sim 1.15$ and 9.83 times those with the tPFST–DA and the fPFST–DA methods, respectively, at an approximate number of projection. The results reveal that, compared to the tTDFD method, the two PFST–DA methods can obtain the approximately accurate reconstructed images with much shorter computation time, especially with the fPFST–DA method.

## Table 1

Comparison among the four reconstruction methods for simulation scenarios of sample 1.

Method | Yield | ||||||
---|---|---|---|---|---|---|---|

0.001 (mm−1) | 0.002 (mm−1) | 0.003 (mm−1) | |||||

Peak value (mm−1) | FWHM- X/Z (mm) | Computation time (s) | Peak value (mm−1) | FWHM- X/Z (mm) | Peak value (mm−1) | FWHM- X/Z (mm) | |

CW | $8.0818\times {10}^{-4}$ | 8.5/15.0 | 389.57 | $1.6047\times {10}^{-3}$ | 8.5/15.0 | $2.4012\times {10}^{-3}$ | 8.5/15.0 |

tTDFD | $9.8269\times {10}^{-4}$ | 7.0/14.5 | 40,218.01 | $1.9522\times {10}^{-3}$ | 7.0/14.5 | $2.9217\times {10}^{-3}$ | 7.0/14.5 |

tPFST–DA | $9.6144\times {10}^{-4}$ | 7.5/14.5 | 32,971.28 | $1.9124\times {10}^{-3}$ | 8.0/14.5 | $2.8628\times {10}^{-3}$ | 7.5/14.5 |

fPFST–DA | $9.5001\times {10}^{-4}$ | 8.0/15.5 | 4071.36 | $1.8690\times {10}^{-3}$ | 8.0/15.0 | $2.7878\times {10}^{-3}$ | 7.5/14.5 |

For sample 2, the two fluorescence targets are with the same radius and length as that in sample 1 at the locations of ($\theta =\phantom{\rule{0ex}{0ex}}0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{deg}$, $Z=22.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$) and ($\theta =180\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{deg}$, $Z=12.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$), under a fixed target fluorescent properties of $\eta {\mu}_{af}^{(T)}=0.001\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$. To compare the four reconstruction methods by the spatial resolution, the center-to-center separations (CCS) of the two targets are set as 8, 11, and 14 mm, respectively, with a measure defined as $R=[{(\eta {\mu}_{af})}_{\mathrm{max}}-{(\eta {\mu}_{af})}_{(x=0\text{\hspace{0.17em}}\mathrm{mm})}]/[{(\eta {\mu}_{af})}_{\mathrm{max}}-{(\eta {\mu}_{af})}_{\mathrm{min}}]$ in the $X$-profile of the reconstructed yield image. Figures 7Fig. 8–9 illustrate the reconstructed yield images at the horizontal slice of $Z=18\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ and a vertical slice of $Y=0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, with the corresponding profiles along the $X$ axis and $Z$ axis. Further comparison among the four reconstruction strategies is shown in Table 2. The images in Figs. 7 and 8 reveal that the tTDFD method can best separate the two targets, similar to the tPFST–DA and the fPFST–DA methods, and much better than the CW method. This is because the tTDFD method carries the richest and truest information about the targets. As of Fig. 9 where $\mathrm{CCS}=14\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, the spatial resolution by the fPFST–DA method is the highest, where the spatial resolutions of the two PFST–DA methods are even 0.64 and 3.53% better than that of the tTDFD method, respectively, which may be explained as the reconstruction quality of ART algorithm is related to an optimal iteration number and relaxing factor. In this situation, the parameters are more fit for the fPFST–DA method.

## Table 2

Comparison among the four reconstruction methods for simulation scenarios of sample 2.

Method | CCS | |||||
---|---|---|---|---|---|---|

14 (mm) | 11 (mm) | 8 (mm) | ||||

Peak value left/right (mm−1) | R | Peak value left/right (mm−1) | R | Peak value left/right (mm−1) | R | |

CW | $5.3406\times {10}^{-4}/5.1345\times {10}^{-4}$ | 0.5205 | $4.8363\times {10}^{-4}/4.7913\times {10}^{-4}$ | 0.2053 | $4.9975\times {10}^{-4}/4.9100\times {10}^{-4}$ | 0.1073 |

tTDFD | $8.6000\times {10}^{-4}/7.8663\times {10}^{-4}$ | 0.9294 | $7.0558\times {10}^{-4}/6.4973\times {10}^{-4}$ | 0.4760 | $7.4114\times {10}^{-4}/6.4996\times {10}^{-4}$ | 0.2131 |

tPFST–DA | $8.5501\times {10}^{-4}/8.3843\times {10}^{-4}$ | 0.9358 | $6.6754\times {10}^{-4}/6.8690\times {10}^{-4}$ | 0.4516 | $7.0443\times {10}^{-4}/7.0179\times {10}^{-4}$ | 0.1961 |

fPFST–DA | $8.1069\times {10}^{-4}/8.5000\times {10}^{-4}$ | 0.9647 | $5.6056\times {10}^{-4}/5.7478\times {10}^{-4}$ | 0.4377 | $5.6939\times {10}^{-4}/5.8947\times {10}^{-4}$ | 0.1993 |

## 3.3.

### Reconstruction Results from Experiments

To validate the proposed method, a time-correlated single photon counting (TCSPC) system is employed to perform a series of TD phantom experiments. As shown in Fig. 10, the light source in the setup is a picosecond-pulsed laser system that employs a programmable controller (PDL828, PicoQuant, Germany) to drive 780 nm [the peak excitation wavelength of indocyanine green (ICG)] fiber-coupled 70 ps pulsed laser diode heads (LDH-P-P-780, PicoQuant, Germany). The output beam is distributed to $1662.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$ source fibers by a $1\times 16$ fiber switch for the tissue scanning. The emergent photons are collected by $500\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$ detection fibers placed at 16 sites equally spaced around the phantom boundary on the imaging plane, in which only measurements from the 11 detectors opposite to each of the 16 sources are employed for the image reconstruction. Then the collected photons are delivered to a photomultiplier tube (PMT) photon-counting header (PMC-100, Becker&Hickl, Germany) and temporally resolved by a TCSPC module (SPC-134, Becker&Hickl, Germany). In front of the PMT detector mounts a six-hole filter wheel (FW103B, Thorlabs) that houses four neutral density filters with the optical densities of 1 to 4, respectively, and a bandpass filter (ICG-A Emitter, Semrock). To facilitate the deployment of the fibers and the modeling of the light propagation, the output ends of 16 source fibers and input ends are paired into 16 coaxial optodes.

For the TCSPC setup, the output of a time-to-amplitude converter is sampled into 4096 time bins with a width of $\mathrm{\Delta}{T}_{b}=\phantom{\rule{0ex}{0ex}}17.114\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ps}$. The acquisition periods of the excitation and emission signals are adjusted to satisfy a suitable signal-to-noise ratio (SNR) respectively, within the scope of the effective counting rate.^{16} To prevent the overflow of the Born-normalized TR data and also assure an adequately high SNR, time ranges of the excitation and emission TPSFs where the sample intensities are $>20\%$ of their respective maximum are decided, and their intersection is chosen for the reconstructions. For the full use of the TR information, the gate interval in the overlap time-gating method is set to the time-bin width of the measured TPSF, while the gating width is set to 100 gate intervals.

To simulate the fluorescent targets embedded in the tissue, a 5-mm-diameter cylindrical hole is drilled into a 30-mm-diameter cylindrical phantom made from polyformaldehyde and filled with a mixture of 1% Intralipid solution and ICG concentration of $\sim 100\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{nM}$, leading to a fluorescent yield of $\eta {\mu}_{af}=0.001\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$. The absorption and the reduced scattering coefficients of the targets are assumed to be equal at both the excitation and emission wavelengths, and determined to be the same as the background ones with ${\mu}_{ax}={\mu}_{am}=0.0038\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$ and ${\mu}_{sx}={\mu}_{sm}=0.978\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$, respectively, by the TR spectroscopy technique,^{19} while the initial background fluorescent yield is empirically set to $\eta {\mu}_{af}=1\times {10}^{-5}{\mathrm{mm}}^{-1}$, and the lifetime is kept as a constant of $\tau =560\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{ps}$ for both the background and targets. Similar to the simulative investigations, 400 temporal steps and 64 sampling frequencies are used for the TDFD solution and the PFST–DA methods, respectively.

Figure 11 shows the reconstructed yield images at the horizontal slice of $Z=30\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ and the corresponding $X$-profiles using the CW, tTDFD, tPFST–DA, and fPFST–DA reconstruction strategies. It is obvious that the locations and quantitative nature of the two targets are more in good agreement with the true targets for the three TD schemes than those with the CW scheme. Further quantitative details about the reconstructed peak value of the two targets and the spatial resolution are shown in Table 3. Compared to the tTDFD method, the maximum relative difference of the peak value by the two PFST–DA methods is $\sim 5.86\%$. As to the spatial resolution, the relative errors are 3.29 and 4.96% for the tPFST–DA and the fPFST–DA methods, respectively. The computation time of the PFST–DA method in the forward calculation is 54.48 s, $<10\%$ that of the tTDFD method with 587.32 s. Although the reconstruction time of the tPFST–DA method is similar to that of the TDFD time of 147.98 min, that of the fPFST–DA method is only 15.31 min, $\sim 10.35\%$ that of the latter. All the observations confirm to those in the simulative investigations.

## Table 3

Comparison among the four reconstruction methods for experimental scenarios.

Method | CW | tTDFD | tPFST–DA | fPFST–DA | |
---|---|---|---|---|---|

Peak value (${\mathrm{mm}}^{-1}$) | Left | $1.7028\times {10}^{-4}$ | $5.1047\times {10}^{-4}$ | $5.2001\times {10}^{-4}$ | $5.1870\times {10}^{-4}$ |

Right | $1.9036\times {10}^{-4}$ | $5.5383\times {10}^{-4}$ | $5.7754\times {10}^{-4}$ | $5.8350\times {10}^{-4}$ | |

Spatial resolution $R$ | 0.2587 | 0.8575 | 0.8293 | 0.8150 | |

Computation time (s) | 132.19 | 9479.11 | 8948.23 | 990.37 |

## 4.

## Discussion and Conclusions

We propose in this paper a parallelized Fourier-series truncated DA for the full TR–DFT image reconstruction. In this computationally efficient scheme, the time-dependent photon density is expanded to a Fourier series and the TD–DE is calculated by solving the independent FD–DEs at multiple sampling frequencies, with the support of a combined multicore CPU-based coarse-grain and multithread GPU-based fine-grain parallelization strategy. Based on this forward model, both the TD and FD inversion procedures are developed under the framework of block ART, referred to as the tPFST–DA and fPFST–DA methods, respectively.

The forward resulting TPSF curves are tested by comparison with those from the TDFD solution and the TD MC simulation. The reconstructions are validated for their quantitativeness and spatial resolution by both simulations and phantom experiments. Results show that the proposed method can generate reconstructions comparable to the explicit TD scheme, with significantly reduced computational time, especially by the fPFST–DA method.

According to the studies done in Ref. 19, the advantage of the truncated Fourier-series method mainly lies in the far fewer number of frequency components needed than the temporal steps in the tTDFD scheme. With the aforementioned accelerating strategy, the truncated Fourier-series scheme is more suitable to be applied to the full TR–DFT reconstruction of 3-D or larger-sized tissues that require more frequency components for an accurate approximation. However, in the experimental measurements, due to the large differentiation of the impulse response function curve of individual fiber switching channel and the fact of manual switching of fibers needed in the current setup, the workload to accurately obtain all the instrument response function curves was significantly increased. Considering the measuring time and system stability, the experimental validations were performed in a 2-D mode.

It should be pointed out that the calculations of stiffness and Jacobian matrices in the TDFD method are executed in the same fine-grain parallelization strategy as the PFST–DA method, while every time step is performed in a recursive way, leading to no coarse-grain parallelization mechanism. Moreover, although there is a potential of using the block iterative algorithms on a parallel architecture, such as a multicore CPU or multithread GPU, the block ART iteration needs ${N}_{\omega}$ times storage space of that in a conventional ART algorithm, to move the Jacobian matrix into and execute the calculation in the GPU device memory, which makes it difficult to effectively execute in the device memory.

## Acknowledgments

The authors acknowledge the funding support from the National Natural Science Foundation of China (81271618, 81371602, 61475115, 61475116, and 81401453), Tianjin Municipal Government of China (13JCZDJC28000, 14JCQNJC14400, and 15JCZDJC31800), the Research Fund for the Doctoral Program of Higher Education of China (20120032110056), and the 111 project (B07014).

## References

## Biography

**Xi Yi** is a PhD of biomedical engineering at the Biomedical Photonic Imaging Laboratory of Tianjin University. Her research is focused on diffuse optical tomography and diffuse fluorescence tomography. She is focused on developing advanced optical imaging methods to improve the quality of large-sized or three-dimensional image reconstructions on the base of certain acceleration method.