## 1.

## Introduction

Fluorescence molecular tomography (FMT) has been an important tool for preclinical imaging in the past two decades and has attracted a lot of research interests.^{1}^{,}^{2} In FMT, fluorescent agents (e.g., fluorophores) are injected into the object such as small animals. Upon illumination by near-infrared laser beams on the object surface, the fluorophores then become excited and start emitting fluorescence photons, which propagate out of the object surface and are captured by detectors such as charge-coupled device (CCD) cameras. Due to the high-scattering effects of photons in tissues, the FMT system matrix is usually ill-conditioned and the reconstruction problem ill-posed. Immense efforts have been devoted to experimentally alleviating the ill-posedness such as collecting more measurement data by using dense noncontact CCD cameras, employing multispectral wavelengths for both excitation and emission,^{3}^{–}^{5} and using different illumination patterns.^{6}^{–}^{8} These indeed improved the image resolutions. However, they led to the processing of large amounts of three-dimensional (3-D) data, which requires tremendous computational time. From the theoretical aspect, a plethora of nonlinear regularization methods has also been applied,^{9}^{–}^{17} which again have caused additional complexity in computations. The non-negativity constraint in FMT reconstruction is another challenge. This constraint has been typically handled by a backtracking line search strategy,^{14} which is time consuming for large-scale measurement data.

To reduce the computational burden, many ideas from other fields have been introduced to FMT. Fourier transform,^{18} wavelet transform methods,^{19}^{,}^{20} discrete cosine transform,^{21} and principal component analysis methods^{22} have been adopted to compress the data and reduce its dimension. The ordered subsets (OS) method,^{23} originally introduced to speed up emission/transmission tomography by breaking down large system matrix into smaller blocks, has recently been successfully adopted into FMT by employing the majorization-minimization (MM) framework.^{14}^{,}^{17}^{,}^{24}^{,}^{25} In contrast with other methods that need a slow line search in handling the non-negativity constraint, the MM framework has the advantage of reducing the optimization into separable one-dimensional problems and hence the non-negativity constraint can be enforced straightforwardly in a parallel way.^{26} In particular, the nonuniform multiplicative MM algorithm with OS acceleration (NUMOS) that we proposed recently has been found to significantly improve the FMT image reconstruction speed and quality.^{25} Nevertheless, the MM algorithm is a first-order method and typically needs a lot of iterations to converge.

Another direction in reducing the computational cost is to improve the convergence rate of the FMT reconstruction algorithms. Compared with first-order methods, second-order methods such as Newton-type algorithms have faster convergence; however, the computation of Hessian is very demanding. Recently a class of momentum acceleration techniques from Nesterov has been found to achieve optimal convergence rate for a class of gradient descent-based methods whose computational complexities are at linear level.^{27}^{,}^{28} In FMT, Han et al.^{11} applied recently the fast iterative shrinkage thresholding algorithm (FISTA),^{29} which is based on the 1983 version of the Nesterov’s technique.^{27} Momentum methods have been combined with the OS techniques for an even faster convergence in computed tomography (CT) very recently.^{30}

In this paper, we propose to include the 2005 version of Nesterov’s momentum method^{28} into our NUMOS algorithm for a faster and stable convergence, since the 1983 version was not as stable.^{30} The concept and some preliminary results of this paper have appeared earlier in a conference paper.^{31} Note that our work is different from the aforementioned work in CT in two ways: first, the regularization they used is an edge-preserving Huber type of function that promotes local smoothness whereas our model for FMT uses ${L}^{1}$ to promote sparsity; second, their surrogate functions are either a spatially uniform type or a nonuniform type that depends on a good estimate of the true solution, which is readily available in CT through the filtered back-projection reconstruction,^{30} but not in FMT.

The rest of the paper is organized as follows. In Sec. 2, we first introduce the background of FMT forward modeling, regularized reconstruction, the MM framework, the OS technique, and the Nesterov’s momentum method. Then we propose our fast NUMOS (fNUMOS) algorithm. In Sec. 3, we present both numerical simulation and phantom experiment results and comparison of fNUMOS with other methods. In Sec. 4, we conclude the paper with summary and some discussions.

## 2.

## Methods

## 2.1.

### Forward Modeling and Regularized Fluorescence Molecular Tomography Reconstruction

For FMT in the continuous-wave domain, photon transfer is modeled by the following coupled diffusion equations, along with Robin-type (mixed) boundary conditions:

## Eq. (1)

$$\{\begin{array}{ll}-\nabla \xb7[{D}_{ex}(\mathbf{r})\nabla {\mathrm{\Phi}}_{ex}(\mathbf{r})]+{\mu}_{a,ex}(\mathbf{r}){\mathrm{\Phi}}_{ex}(\mathbf{r})& ={\mathrm{\Delta}}_{s}(\mathbf{r})\\ \mathbf{n}\xb7[{D}_{ex}(\mathbf{r})\nabla {\mathrm{\Phi}}_{ex}(\mathbf{r})]+{\alpha}_{ex}{\mathrm{\Phi}}_{ex}(\mathbf{r})& =0\\ -\nabla \xb7[{D}_{em}(\mathbf{r})\nabla {\mathrm{\Phi}}_{em}(\mathbf{r})]+{\mu}_{a,em}(\mathbf{r}){\mathrm{\Phi}}_{em}(\mathbf{r})& ={\mathrm{\Phi}}_{ex}(\mathbf{r})\mathbf{x}(\mathbf{r})\\ \mathbf{n}\xb7[{D}_{em}(\mathbf{r})\nabla {\mathrm{\Phi}}_{em}(\mathbf{r})]+{\alpha}_{em}{\mathrm{\Phi}}_{em}(\mathbf{r})& =0\end{array},$$The above equations can be discretized by the finite element method (FEM), leading to linearized equations:^{4}^{,}^{32}

## Eq. (2)

$$[{K}_{ex}][{\mathrm{\Phi}}_{ex}]=[{\delta}_{s}(\mathbf{r}-{\mathbf{r}}_{s})]\phantom{\rule{0ex}{0ex}}[{K}_{em}][{\mathrm{\Phi}}_{em}]=[{\mathrm{\Phi}}_{ex}][\mathbf{x}],$$A typical solution of Eq. (4) is obtained by minimizing the following regularized squared data-measurement misfit under the non-negativity constraint:

## Eq. (5)

$$\widehat{\mathbf{x}}=\mathrm{arg}\text{\hspace{0.17em}}\underset{\mathbf{x},\mathbf{x}\ge \mathbf{0}}{\mathrm{min}}\mathrm{\Psi}(\mathbf{x}):=\frac{1}{2}{\parallel A\mathbf{x}-\mathbf{b}\parallel}_{2}^{2}+\lambda {\parallel \mathbf{x}\parallel}_{1},$$## 2.2.

### Majorization-Minimization Algorithm

The MM algorithm, also known as optimization transfer algorithm, is a general framework for solving minimization problems where an approximation of the objective function, often referred to as the majorization or surrogate function, is minimized at every step. The approximated solution using MM algorithm will converge to the true solution of Eq. (5) as the problem is convex.^{33} MM algorithm has known advantages in optimization problems including avoiding matrix inversions, linearizing an optimization problem, dealing gracefully with inequalities, and so on.^{33} Many powerful algorithms can be understood from the MM point of view, especially the gradient-based methods such as iterative shrinkage thresholding algorithm,^{34} iteratively reweighted ${L}^{1}$ algorithm,^{35} and iteratively reweighted least squares.^{36}

The definition of a surrogate function ${\mathrm{\Psi}}^{sur}(\mathbf{x})$ in the minimization problem includes the following three requirements:

## Eq. (6)

$$\{\begin{array}{ll}{\mathrm{\Psi}}^{sur}(\mathbf{x})\ge \mathrm{\Psi}(\mathbf{x}),& \text{for all}\text{\hspace{0.17em}\hspace{0.17em}}\mathbf{x};\\ {\mathrm{\Psi}}^{sur}({\mathbf{x}}^{n})=\mathrm{\Psi}({\mathbf{x}}^{n}),& \text{at some point}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{x}}^{n};\\ \nabla {\mathrm{\Psi}}^{sur}({\mathbf{x}}^{\mathbf{n}})=\nabla \mathrm{\Psi}({\mathbf{x}}^{\mathbf{n}}),& \text{at point}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{x}}^{n}.\end{array}$$^{25}In the popular separable quadratic surrogates (SQS) algorithm,

^{37}

^{,}

^{38}the surrogate function ${\mathrm{\Psi}}^{k}(\mathbf{x};{\mathbf{x}}^{k})$ was chosen based on a Jensen inequality for least squares, which was first proposed by de Pierro.

^{39}Adding in the ${L}^{1}$ regularization, we have

## Eq. (7)

$$\mathrm{\Psi}(\mathbf{x})=\frac{1}{2}{\parallel \mathbf{b}-\mathbf{Ax}\parallel}_{2}^{2}+\lambda {\parallel \mathbf{x}\parallel}_{1}\phantom{\rule{0ex}{0ex}}\le {\mathrm{\Psi}}^{k}(\mathbf{x};{\mathbf{x}}^{k}):=\frac{1}{2}\sum _{i=1}^{{N}_{m}}\sum _{j=1}^{{N}_{n}}{\beta}_{ij}{[{b}_{i}-{(A{\mathbf{x}}^{k})}_{i}-\frac{{a}_{ij}}{{\beta}_{ij}}({x}_{j}-{x}_{j}^{k})]}^{2}+\lambda \sum _{j=1}^{{N}_{n}}{x}_{j}\phantom{\rule{0ex}{0ex}}=\mathrm{\Psi}({\mathbf{x}}^{k})-\sum _{j=1}^{{N}_{n}}\{\sum _{i=1}^{{N}_{m}}[{b}_{i}-{(A{\mathbf{x}}^{k})}_{i}]*{a}_{ij}-\lambda \}*({x}_{j}-{x}_{j}^{k})+\frac{1}{2}\sum _{j=1}^{{N}_{n}}\left(\sum _{i=1}^{{N}_{m}}\frac{{a}_{ij}^{2}}{{\beta}_{ij}}\right){({x}_{j}-{x}_{j}^{k})}^{2}\phantom{\rule{0ex}{0ex}}=\mathrm{\Psi}({\mathbf{x}}^{k})+\nabla \mathrm{\Psi}({\mathbf{x}}^{k})\xb7(\mathbf{x}-{\mathbf{x}}^{k})+\frac{1}{2}{\parallel \mathbf{x}-{\mathbf{x}}^{k}\parallel}_{{D}^{k}}^{2},$$## Eq. (8)

$${x}_{j}^{k+1}={\{{x}_{j}^{k}-{[\nabla \mathrm{\Psi}({\mathbf{x}}^{k})]}_{j}/{d}_{j}^{k}\}}_{+},$$^{17}This is a gradient descent-type algorithm with different step size ${d}_{j}$ for each component ${x}_{j}$.

## 2.3.

### Nonuniform Weighting Parameter ${\beta}_{ij}$

The choice of the weighting parameter ${\beta}_{ij}$ for the Jensen inequality is crucial in determining how well the surrogate approximates the original objective function and also how fast the corresponding iterative algorithm [Eq. (8)] converges. Traditionally, the SQS was chosen in a uniform additive way, which can be precomputed and needs no iterative updates:^{14}^{,}^{17}^{,}^{37}^{,}^{38}

## Eq. (11)

$${x}_{j,{L}^{1}}^{A,k+1}={[{x}_{j}^{k}+\frac{{({A}^{t}\mathbf{b})}_{j}-{({A}^{t}A{\mathbf{x}}^{k})}_{j}-{\lambda}_{1}}{{({A}^{t}A{\mathbf{1}}_{{N}_{n}})}_{j}}]}_{+}.$$^{25}

## Eq. (12)

$${\beta}_{ij}^{gA}=\frac{{a}_{ij}^{q}}{\sum _{k=1}^{{N}_{n}}{a}_{ik}^{q}},\phantom{\rule[-0.0ex]{2em}{0.0ex}}q>0.$$Uniform weighting, however, ignores the different updating needs at different positions, which could hinder the convergence of iterations to the true solution.^{40} Nonuniform weighting strategy has been reported recently to improve image qualities in CT.^{40}^{,}^{41} In our previous work, we proposed to solve the ${L}^{1}$ regularized least squares optimization problem [Eq. (5)] by using the following nonuniform multiplicative type ${\beta}_{ij}$:^{26}^{,}^{38}^{,}^{42}

## Eq. (13)

$${\beta}_{ij}^{M}=\frac{{a}_{ij}{x}_{j}^{k}}{{(A{\mathbf{x}}^{k})}_{i}},\phantom{\rule[-0.0ex]{2em}{0.0ex}}\text{where}\text{\hspace{0.17em}\hspace{0.17em}}{(A{\mathbf{x}}^{k})}_{i}=\sum _{l=1}^{{N}_{n}}{a}_{il}{x}_{l}^{k}.$$^{26}and the associated iterative update equation [Eq. (8)] becomes

## Eq. (15)

$${x}_{j,{L}^{1}}^{k+1}={x}_{j}^{k}\frac{{[{({A}^{t}\mathbf{b})}_{j}-{\lambda}_{1}]}_{+}}{{({A}^{t}A{\mathbf{x}}^{k})}_{j}},$$## 2.4.

### Ordered Subsets Acceleration Technique

Due to the large scale of 3-D data, tomographic imaging requires a lot of computational time. The OS technique was proposed in 1994 for emission/transmission tomography to evenly break-down large matrices into smaller blocks, so that a speed up of the convergence by a factor proportional to nOS, the number of subsets, is possible.^{23}^{,}^{38} In FMT, the OS technique has been successfully applied recently, where the sensitivity matrix has been divided into OS.^{14}^{,}^{26} In particular, combining the nonuniform update with this OS technique, we have proposed NUMOS (see Algorithm 1) for FMT, which has provided significant speed enhancement over the uniform methods,^{14}^{,}^{17} while maintaining high-quality reconstruction results.^{26}

## Algorithm 1

NUMOS.

Initialization: ${\mathbf{x}}^{0}={x}_{0}*{\mathbf{1}}_{n}$, $B={({A}^{t}b-\lambda /\mathrm{nOS}*{\mathbf{1}}_{{N}_{n}})}_{+}$ |

for$k=1$ to ${N}_{\text{max}}/\mathrm{nOS}$do |

Divide $A$ and $B$ into nOS submatrices, ${\{{A}_{i}\}}_{i=1}^{\mathrm{nOS}}$, ${\{{B}_{i}\}}_{i=1}^{\mathrm{nOS}}$, based on a random partition of the ${N}_{d}$ detectors |

for$i=1$ to nOS do |

$l=(k-1)*\mathrm{nOS}+i$ |

${\mathbf{x}}^{k+1}={B}_{i}\u200a.*{\mathbf{x}}^{k}./({A}_{i}^{t}{A}_{i}{\mathbf{x}}^{\mathbf{k}})$ |

end for |

end for |

where $0<{x}_{0}<1$ is randomly picked, ${N}_{\text{max}}$ is the number of iterations, nOS is the number of subsets, $.*$ and $./$ are the entry-wise multiplication and division, respectively. |

However, there are a few known issues with the OS technique. One is that the selection of OS can be difficult when the geometry is complicated. In the study of mouse-shaped numerical phantom, we balanced our selection of subsets by randomly generating them at each iteration, which slowed down the acceleration from the OS technique.^{26} Another issue with this OS technique is that, like other block-iterative methods, increasing the number of OS leads to larger approximation errors, causing the convergence of iterative updates to stop at a limit-cycle before approaching the minimum.^{43} Different approaches have been proposed to address this limit-cycle issue, but again at the expense of slowing down the overall convergence.^{44}^{–}^{46}

## 2.5.

### Nesterov’s Momentum Acceleration Techniques

Another approach to reduce the computational burden is to design improved algorithms with faster convergence rates. Recently, a series of momentum techniques of Nesterov^{27}^{,}^{28} have emerged, using previous iterations to obtain optimal convergence rates for first-order optimization methods. In particular, based on the 1983 version of the Nesterov’s technique, the algorithm FISTA^{29} was developed, the application of which in FMT will be analyzed in Sec. 4. We chose to focus on the 2005 version of the Nesterov’s technique^{28} (see Algorithm 2) in this paper, since it provides more stability without requiring much extra calculations than the 1983 version.^{27}^{,}^{30} This 2005 Nesterov’s technique has attracted quite some attention recently.^{30}^{,}^{47} In particular, Kim et al.^{30} has considered the combination of this technique with the OS method in CT^{26}.

## Algorithm 2

Nesterov’s algorithm (2005).

Initialization: ${\mathbf{x}}^{0}={\mathbf{v}}^{0}={\mathbf{z}}^{0}$, ${t}_{0}=1$ |

for$m=1$ to ${N}_{\text{max}}$do |

${t}^{m}=(m+1)/2$ |

${\mathbf{x}}^{m}={[{\mathbf{z}}^{m-1}-\frac{1}{L}\nabla \mathrm{\Psi}({\mathbf{z}}^{m-1})]}_{+}$ |

${\mathbf{v}}^{m}={[{\mathbf{z}}^{0}-\frac{1}{L}\sum _{l=1}^{m}{t}^{l-1}*\nabla \mathrm{\Psi}({\mathbf{z}}^{l-1})]}_{+}$ |

${\mathbf{z}}^{m}=(1-\frac{{t}^{m}}{\sum _{l=0}^{m}{t}^{l}})*{\mathbf{x}}^{m}+\frac{{t}^{m}}{\sum _{l=0}^{m}{t}^{l}}*{\mathbf{v}}^{m}$ |

end for |

where $L$ is the Lipschitz constant for the surrogate function of $\mathrm{\Psi}(\mathbf{z})$ at ${\mathbf{z}}^{m-1}$. |

## 2.6.

### Proposed Nonuniform Multiplicative Updating Scheme Accelerated by Ordered Subsets and Momentum Method

The Nesterov’s techniques are based on the gradient method, which from the MM algorithm point of view, is equivalent to using the following surrogate function for $\mathrm{\Psi}(\mathbf{x})$:^{29}

## Eq. (16)

$${\mathrm{\Psi}}^{k}(\mathbf{x};{\mathbf{x}}^{k})=\mathrm{\Psi}({\mathbf{x}}^{\mathbf{k}})+\nabla \mathrm{\Psi}({\mathbf{x}}^{k})\xb7(\mathbf{x}-{\mathbf{x}}^{k})+\frac{L}{2}{\parallel \mathbf{x}-{\mathbf{x}}^{k}\parallel}^{2},$$The above choice of surrogate function [Eq. (16)] is essentially assigning every component ${x}_{j}$ the same iterative step size, which does not take into account the different updating needs between target and background locations.^{40} In this paper, we proposed to employ the nonuniform update to satisfy the different updating needs and combine with OS method and the 2005 version of the Nesterov’s momentum technique for much faster convergence. We refer to this new algorithm as fNUMOS (see Algorithm 3). Note that for fNUMOS, ${\mathbf{x}}_{PM}^{m}$ is new term we have to introduce, so that the nonuniform technique can be combined with the Nesterov’s momentum technique. In addition, we followed the method of Tseng^{48} to choose the weighting parameters ${t}^{m}$, which leads to a faster convergence than the original ${t}^{m}$ as used by Nesterov in Algorithm 2.

## Algorithm 3

fNUMOS.

Initialization: ${\mathbf{x}}^{0}={x}_{0}*{\mathbf{1}}_{n}$, ${\mathbf{z}}^{0}={\mathbf{x}}^{0}$, ${t}_{0}=1$, $B={A}^{t}b-\lambda /\mathrm{nOS}*{\mathbf{1}}_{{N}_{n}}$ |

for$k=1$ to ${N}_{\text{max}}/\mathrm{nOS}$do |

Divide $A$ and $B$ into nOS submatrices, ${\{{A}_{i}\}}_{i=1}^{\mathrm{nOS}}$, ${\{{B}_{i}\}}_{i=1}^{\mathrm{nOS}}$, based on a random partition of the ${N}_{d}$ detectors |

for$i=1$ to nOS do |

$m=(k-1)*\mathrm{nOS}+i$ |

${t}^{m}=[1+\sqrt{1+4{({t}^{m-1})}^{2}}]/2$ |

${\mathbf{x}}_{PM}^{m}={B}_{i}.*{\mathbf{z}}^{m-1}./({A}_{i}^{t}{A}_{i}{\mathbf{z}}^{\mathbf{m}-\mathbf{1}})$ |

${\mathbf{x}}^{m}={({x}_{PM})}_{+}$ |

${\mathbf{v}}^{m}={[{\mathbf{z}}_{0}-\sum _{l=1}^{m}{t}^{l-1}*({\mathbf{x}}_{PM}^{l}-{z}^{l-1})]}_{+}$ |

${\mathbf{z}}^{m}=(1-\frac{{t}^{m}}{\sum _{l=0}^{m}{t}^{l}})*{\mathbf{x}}^{m}+\frac{{t}^{m}}{\sum _{l=0}^{m}{t}^{l}}*{\mathbf{v}}^{m}$ |

end for |

where $0<{x}_{0}<1$ is randomly picked, ${N}_{\text{max}}$ is the number of iterations, nOS is number of subsets, $.*$ and $./$ are the entry-wise multiplication and division, respectively. |

end for |

## 2.7.

### Selection of Regularization Parameters and Image Quality Metrics

For each type of regularization, we identified the best image that can be reconstructed by searching through a range of values for the regularization parameter within $[0,\mathrm{max}({A}^{t}b)]$. Eleven different criteria in optical tomography have been compared in Correia et al.^{49} and the L-curve method was found to be optimal for finding the best Tikhonov regularization parameter in image deblurring problems. In our study, we employed two criteria: the volume ratio (VR),^{50} which is defined as the ratio of reconstructed target volume to true target volume and is related to the sparsity of the reconstructed targets, and the Dice similarity coefficient (Dice),^{51} which measures the location accuracy of the reconstructed objects. These criteria are sufficient to evaluate the sizes and positions of the targets. We also calculated the mean squared error (MSE) for the simulation study when ground truth was available, which measures the difference between reconstructed and true fluorophore concentrations. To assess image qualities, we also computed the contrast-to-noise ratio (CNR),^{52} which measures how well the reconstructed target can be distinguished from the background. The definitions of these metrics are as follows:

## Eq. (18)

$$\mathrm{Dice}=\frac{2*|r\mathrm{ROI}\cap \mathrm{ROI}|}{|r\mathrm{ROI}|+|\mathrm{ROI}|},$$## Eq. (20)

$$\mathrm{CNR}=\frac{\text{Mean}({x}_{\mathrm{ROI}})-\text{Mean}({x}_{\mathrm{ROB}})}{\sqrt{{\omega}_{\mathrm{ROI}}\mathrm{Var}({x}_{\mathrm{ROI}})+(1-{\omega}_{\mathrm{ROI}})\mathrm{Var}({x}_{\mathrm{ROB}})}}.$$## 3.

## Results

## 3.1.

### Numerical Simulation

To validate our algorithm for FMT in small animals, we simulated a mouse by first obtaining the surface mesh of the Digimouse,^{53} then using Tetgen^{54} to regenerate a uniform internal mesh with a total of 32,332 nodes and 161,439 tetrahedral elements. We then simulated two capillary tubes at the center of the mouse trunk, each having a diameter of 2 mm and length 20 mm. For simplicity, we assigned the fluorophore concentration to be 1 for all the nodes inside the two tubes and 0 outside. We selected 60 internal nodes as laser source points, uniformly distributed on five rings around the trunk, and we set all the 4020 surface nodes that cover the trunk to be detectors. The simulated tissue optical properties were ${\mu}_{a}=0.007\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{-1}$, ${\mu}_{s}^{\prime}=0.72\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{-1}$ at both the excitation wavelength (650 nm) and the emission wavelength (700 nm). White Gaussian noise was added to the simulated measurement data, so that the signal-to-noise ratio (SNR) of the measurement data was 1.

For the reconstruction, we employed the ${L}^{1}$ regularization since the targets are sparse. We started iterations from the same randomly picked uniform initial point ${\mathbf{x}}_{0}$ and reconstructed the fluorophore distribution using five different algorithms: (a) uniform update, (b) nonuniform update, (c) NUMOS with $\mathrm{nOS}=24$, (d) fNUMOS with $\mathrm{nOS}=1$, and (e) fNUMOS with $\mathrm{nOS}=24$, so that we can see how each part of the proposed algorithm was making a difference toward a faster convergence, while maintaining a high-quality result. For the uniform update, we chose the regularization parameter ${\lambda}_{1}=6.5\times {10}^{-4}$, and for the nonuniform update, we set the regularization parameter to be ${\lambda}_{1}=1\times {10}^{-3}$ as before.^{17}^{,}^{26} The maximum number of iterations was set to be 2000.

In Fig. 1(a) and Table 1, we can see that even with 2000 iterations, the uniform method is still far away from obtaining a close-to-truth result, the reconstructed targets are very large ($\mathrm{VR}=6.84$ and $\mathrm{Dice}=0.20$), with very low image intensities, relatively low CNR, and large MSE. We see a clear boost of image qualities from Fig. 1(b) when we use the nonuniform update, i.e., NUMOS with $\mathrm{nOS}=1$. From Table 1, we see that NUMOS1 used only 1310 iterations to produce a much better result with VR about $7\times $ smaller, Dice about $3\times $ higher, CNR $2\times $ higher, and MSE about $2\times $ smaller. When nonuniform update was combined with $\mathrm{nOS}=24$ or the Nesterov’s technique, we see that less computational time was needed to obtain a similar quality image. In adddition, we see that fNUMOS1 was around $3\times $ faster than NUMOS24, using approximately 35 and 107 s, respectively. This implies that the Nesterov’s momentum acceleration technique is more effective than the OS technique. Finally, when we combine both OS and Nesterov techniques with the nonuniform update, only 5 iterations and 10 s were needed to reach an image with high quality as shown in Fig. 1(e), which implies a speed up of about 10× than the NUMOS algorithm we previously studied.^{26} From the profile plots in Fig. 2, we can also verify that the quality of the reconstructed image from uniform method is poor, while images from the NUMOS and fNUMOS methods are much better.

## Table 1

Comparison of VR, Dice, CNR, MSE, Time (s), and number of iterations for the FMT reconstruction of the simulated mouse using different algorithms.

Method | VR | Dice | CNR | MSE | Time (sec) | Iterations |
---|---|---|---|---|---|---|

Uniform | 6.84 | 0.20 | 4.83 | $2.94\mathrm{E}-3$ | 563.46 | 2000 |

NUMOS1 | 1.01 | 0.58 | 9.94 | $1.74\mathrm{E}-3$ | 491.28 | 1310 |

NUMOS24 | 1.01 | 0.58 | 9.81 | $1.80\mathrm{E}-3$ | 106.97 | 53 |

fNUMOS1 | 1.02 | 0.59 | 9.54 | $1.69\mathrm{E}-3$ | 35.13 | 121 |

fNUMOS24 | 1.01 | 0.59 | 10.27 | $1.70\mathrm{E}-3$ | 10.02 | 5 |

## 3.2.

### Phantom Experiment

Next we used a set of data from cubic phantom experiments to validate the acceleration effects of our proposed fNUMOS method for FMT. The cubic phantom was of dimension $32\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}\times 32\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}\times 29\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ and was composed of 1% intralipid, 2% agar, and water in the background. We inserted two capillary tubes with length 12 mm and diameter 1 mm as targets, in which both 6.5 mm DiD (D307, Invitrogen corporation) fluorescence dye solution and uniformly distributed ${}^{18}[\mathrm{F}]$-fluoro-2-deoxy-d-glucose (FDG) at activity level of 100 $\mu {\mathrm{C}}_{i}$ were injected. The cubic surface was extracted first and then the FEM mesh was generated, consisting of 8690 nodes and 47,581 tetrahedral elements.^{5}^{,}^{55} For the optical imaging, laser at a wavelength of 650 nm scanned the front surface of the phantom at 20 illumination nodes. Measurements were collected at 1057 detector nodes by using a conical mirror system and a CCD camera.^{4}^{,}^{17} The filtered excitation wavelength was at 700 nm. The tissue optical properties were ${\mu}_{a}=0.0022\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{-1}$, ${\mu}_{s}^{\prime}=1.10\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{-1}$ at both 650- and 700-nm wavelengths. Details of the simultaneous positron emission tomography (PET) imaging can be found in Li et al.^{4}^{,}^{5}^{,}^{55} We thresholded the PET images at 20% of the maximum FDG concentrations to identify the positions of the capillary tubes.

For the regularized reconstruction, we also compared the results from the five different algorithms (as illustrated in Fig. 3) and we empirically chose the ${L}^{1}$ regularization parameter ${\lambda}_{1}$ to be $9\times {10}^{3}$ for the uniform methods and $5\times {10}^{3}$ for the nonuniform algorithms. We set the maximum number of iterations to be ${N}_{\text{max}}=768$ for nonuniform algorithms; whereas for the uniform cases, we allowed up to 5000 iterations for better results. We computed the VR, Dice, and CNR for the reconstructed images. True locations of targets were obtained from the PET image that was acquired simultaneously. Note that the associated intensity information of the true image was not available so we could not compute the MSE here. From Table 2, we see that when we chose $\mathrm{nOS}=24$, fNUMOS reached a result with VR of 1.02, Dice of 0.42, and CNR of 8.64 after four iterations, taking only 0.63 s. To obtain an image of similar quality with the Nesterov’s momentum technique only, fNUMOS for $\mathrm{nOS}=1$ took 84 iterations and 1.34 s; with the OS acceleration alone, NUMOS for $\mathrm{nOS}=24$ took 32 iterations and 4.85 s; without any of those two acceleration techniques, NUMOS for $\mathrm{nOS}=1$ took 640 iterations and 10.42 s. So for the phantom experiment, fNUMOS can be about $8\times $ faster than the NUMOS we proposed before.^{26} Similar to the simulation results, we can also verify from the profile plots in Fig. 4 that the quality of the reconstructed image from uniform method is poor, while from the NUMOS and fNUMOS methods is much better.

## Table 2

Comparison of VR, Dice, CNR, Time (s), and number of iterations for the FMT reconstruction of cubic phantom using different algorithms.

Method | VR | Dice | CNR | Time (sec) | Iterations |
---|---|---|---|---|---|

Uniform | 2.50 | 0.33 | 6.62 | 78.25 | 5000 |

NUMOS1 | 1.13 | 0.41 | 7.93 | 10.42 | 640 |

NUMOS24 | 1.02 | 0.39 | 8.20 | 4.85 | 32 |

fNUMOS1 | 1.06 | 0.41 | 8.08 | 1.34 | 84 |

fNUMOS24 | 1.02 | 0.42 | 8.64 | 0.63 | 4 |

## 4.

## Discussion and Conclusion

In this paper, we proposed fNUMOS, aiming to solve FMT with ${L}^{1}$ regularization at a very fast speed while maintaining the reconstruction accuracy by combining Nesterov’s momentum technique into a nonuniform updating scheme with acceleration from the OS method (with a relatively small number of OS). Our iterative updating scheme for all reconstruction methods we described in Sec. 2 was based on a separable MM framework, which allowed for a highly vectorized code that was automatically parallelized by MATLAB 2013b. We conducted our computations on a dual Intel Xeon E5-2680 v2 CPU workstation with 20 cores and 128 GB memory. Using both numerical simulation and phantom data, we compared fNUMOS with other algorithms. We found that fNUMOS performed best in the sense that reconstructed targets were more localized [smaller VR, higher Dice, and smaller MSE (only for simulation case)] than those from other methods. fNUMOS was found to be robust for high noise levels (i.e., $\mathrm{SNR}=1$) and had a significantly faster speed of convergence. In our numerical simulation study, we found a significant speed gain from the momentum technique at about 10 times in comparison with the NUMOS algorithm we studied before.^{26} For data of the cubic phantom experiment from a smaller system matrix, the speed gain was about eight times. This implies that fNUMOS has great potential in obtaining higher spatial resolution when a finer mesh is used.

For a thorough comparison on how the image quality metrics, VR and Dice, change according to the iteration time for uniform and nonuniform algorithms with or without the acceleration techniques of momentum or order subsets, we evaluated all eight scenarios in two groups: (1) Uniform1, $\mathrm{f}+\text{Uniform}1$, NUMOS1, and fNUMOS1, and (2) Uniform24, $\mathrm{f}+\text{Uniform}24$, NUMOS24, and fNUMOS24. Results for the simulated mouse data are shown in Figs. 5 and 6 for $\mathrm{nOS}=1$ and $\mathrm{nOS}=24$, respectively. In both figures, we clearly see that nonuniform update, NUMOS1, performs much better than the uniform method, Uniform1, (blue/star versus magenta/sphere curves), and it is even better than Uniform accelerated by the momentum technique, f+Uniform1, (blue/star versus green/diamond curves). Across Figs. 5 and 6, we can reaffirm that the momentum technique is faster than the OS technique in accelerating the convergence (red/plus curve in Fig. 5 versus blue/star curve in Fig. 6). Finally when both acceleration techniques are combined together with nonuniform update, the proposed fNUMOS24 can achieve a high-quality reconstruction using the shortest amount of time. For the phantom experimental data, the results from comparison are plotted in Figs. 7 and 8. We see similar trends as in the simulated case and fNUMOS24 again performs the best, producing a high-quality image within a second.

For the popular FISTA algorithm that was adopted in FMT,^{11} we already compared it with the NUMOS1 algorithm that we proposed in our previous work using data from phantom experiments.^{26} We found that excluding the computational time for the largest eigenvalue of the system matrix for the $L$ needed in Eq. (16), FISTA was still slower than NUMOS and the final reconstruction results were not as good either.^{26} For the numerically simulated mouse with a large system matrix, we were able to compute the largest eigenvalue with our workstation, which alone cost about 30 min. We also explored the backtracking version FISTA^{29} to estimate the $L$ and we found that it tended to reach a large $L$ value that slowed down the convergence. As we pointed out in Sec. 2.6, the slower convergence and poorer image quality are due to the fact that choosing of $L$ is essentially assigning a uniform updating step size for each location, thus it is not surprising that FISTA was not as efficient as our nonuniform methods, including the earlier NUMOS and the newly proposed fNUMOS.

We noticed that the OS method ($\mathrm{nOS}=24$) had limited capability increasing the speed of nonuniform type algorithms by around four times for the numerically simulated mouse and around two times for the cubic phantom. That limit in speed increase is mainly due to the choice of OS being random at every iteration, which greatly increased the total cost. We also see that for experimental data of smaller size, the overhead cost from randomization will take a larger portion of the total computational time, so that the speed up is not as significant as for the larger simulated data. It would be of great interest to identify an optimal deterministic way of selecting the OS, so that we may potentially obtain a speed increase closer to the ideal nOS times.

On the choice of the best ${L}^{1}$ regularization parameter, we swept through a range of values within $[0,\mathrm{max}({A}^{t}b)]$. We realized that the ${L}^{1}$ regularization parameter can actually have a range where images of similar qualities (in terms of VR, Dice, and CNR) can be reconstructed, although the corresponding numbers of iterations needed will vary accordingly. Due to the significant reduction of computational time, we may further investigate the optimal way to determine an appropriate regularization parameter.

For the stopping criteria for our iterative algorithms, we found that since the convergence speed of different algorithms varies, the stopping criteria need to be selected differently. In addition, our OS is chosen in a randomized fashion, the errors in between consecutive iterations also possess randomness, so no simple stopping criteria exist. We plan to further investigate the appropriate stopping criterion for each algorithm in the future, especially when we can identify a deterministic selection of OS.

In summary, we have investigated in this paper the accelerating effects of a Nesterov-type momentum technique on the nonuniform updating scheme with or without the OS method. We have obtained high-quality images using around $10\times $ less time with the proposed new method than some of the existing cutting-edge methods. Our next step will be applying the proposed new method to *in vivo* experiments.

## Acknowledgments

We would like to thank the support from University of California, Merced, Startup Fund.

## References

_{1}minimization,” J. Fourier Anal. Appl., 14 (5–6), 877 –905 (2008). http://dx.doi.org/10.1007/s00041-008-9045-x Google Scholar