## 1.

## Introduction

Optical tomography^{1}2.3.4.5.6.7.^{–}^{8} is known as a safer alternative to x-ray tomography. Usually, tomography consists of a light source generating penetrative light and a detector capturing the light, which allows to estimate the inside of the object through which the light is passing. The most important application is x-ray computed tomography (CT), where x rays are used due to their penetrative property. The balance between the radiation exposure of the human body and the quality of the obtained results has been debated since the early days when x-ray CT was invented. Therefore, there is an urgent demand for a safer medical tomography, such as optical tomography.

Modeling the behavior of light plays an important role in optical tomography, and in the mesoscale, in which the wavelength of light is close to the scale of tissue, the radiative transport equation (RTE) is used for describing the behavior of light scattering.^{5}^{,}^{9} At the macroscale,^{6} the time-independent or dependent RTE is often approximated with a diffusion equation.

Similarly, the computer graphics community has the used time-independent RTE, and in contrast to the (surface) rendering equation,^{10}^{,}^{11} often calls it the volume rendering equation (VRE).^{10}^{,}^{12}

## (1)

$$(\omega \xb7\nabla )L(x,\omega )=-{\sigma}_{t}(x)L(x,\omega )+{\sigma}_{s}(x){\int}_{{S}^{2}}{f}_{p}(x,\omega ,{\omega}^{\prime})L(x,{\omega}^{\prime})\mathrm{d}{\omega}^{\prime},$$^{13}

^{,}

^{14}The path integral, which can be considered as a discrete version of the continuous Feynman path integral,

^{15}

^{,}

^{16}has been recently employed to solve the VRE in an efficient way with Monte Carlo integration, such as Metropolis light transport

^{17}

^{,}

^{18}or bidirectional path tracing.

^{19}

In this paper, we propose an optical tomography method using path integral as a forward model and solving a nonlinear inverse problem that minimizes the discrepancy between measurements and model predictions in a least-squares sense. To the best of our knowledge, the discretized path integral has not been used in optical tomography before. In our work, we simplify the path integral with some assumptions. The path integral, as the name suggests, gathers (or integrates) the contributions of all possible paths of light.^{17}^{,}^{18}^{,}^{20}21.22.^{–}^{23} We approximate the integral of an infinite number of paths with the sum of a finite number of paths, discretize a continuous medium into voxels of a regular grid, and continuous light paths into discrete ones (i.e., polylines). We deal with anisotropic scattering having a peak in the forward direction, which is different from other discretization methods using discrete ordinate or spherical harmonics.^{13}^{,}^{24}^{,}^{25} In this work, we focus on estimating the spatially varying extinction coefficient ${\sigma}_{t}(x)$ at each discretized voxel location of the medium while fixing scattering properties (e.g., scattering coefficients ${\sigma}_{s}$ and phase functions ${f}_{p}$). By separating the scattering properties from our problem, we formulate optical tomography as an optimization problem with inequality constraints solved by an interior point method.

An interior point method^{26} is an iterative method to solve an optimization problem with inequality constraints describing a feasible region in which the optimal solution must reside. To this end, a series of nonconstrained optimization problems are constructed by combining the constraints and the original objective function and are solved by an ordinal gradient-based (Quasi-Newton) method.

To summarize our contribution, we reformulate the problem of optical tomography by combining a path integral with several simplifying assumptions to model the light transport in the participating media. This paper is an extension of our previous conference version^{27}^{,}^{28} with additional theoretical background and additional experiments and discussions, and is structured as follows. In Sec. 2, we briefly review previous work related to path integrals and optical tomography. In Sec. 3, we describe how to model the light transport in participating media and turn optical tomography into an optimization problem. In Sec. 4, we show how to solve the optimization problems. Section 5 reports some simulation results, and Sec. 6 concludes the paper.

## 2.

## Related Work

In this section, we briefly review related work on optical tomography and path integrals in computer graphics.

Optical tomography^{4}^{,}^{5} (or inverse transport,^{6}^{,}^{7} inverse scattering,^{29} scattering tomography^{30}^{,}^{31}) is a problem in medical imaging using light sources to reconstruct the optical properties of tissue from measurements (time-dependent or stationary, angular-dependent or independent) at the surface boundary. Analytically solving the RTE [Eq. (1)] with boundary conditions is difficult, however, and approximations, such as discrete ordinates and $N$’th-order spherical harmonics (${P}_{N}$ approximation), are often used and solved numerically by, for example, finite element methods (FEM) or finite difference methods. The famous diffuse approximation^{5}^{,}^{6} (DA) is a ${P}_{1}$ (thus first-order) approximation with the assumption on a phase function being isotropic. The DA is an approximation to RTE at a macroscopic scale when scattering is large while absorption is low and scattering is not highly peaked. Diffuse optical tomography (DOT) is based on DA and today represents the frontier of optical tomography^{32}^{,}^{33} with many clinical applications.^{34} DA, however, does not often hold in realistic participating (scattering) media; absorption may not be small compared to scattering, and the shapes of the phase functions can be highly peaked in the forward direction, which is often modeled by Henyey-Greenstein,^{35} Schlick,^{36} or Mei and Rayleigh phase functions.^{10}^{,}^{12}^{,}^{37}^{,}^{38} Experimental evidence^{39} also suggests a highly peaked shape of the phase functions in biological media. DOT works, but is still limited; therefore, other methods have also been studied for cases when DA does not hold.

Statistical Monte Carlo methods are used for media in which the assumptions do not hold; however, they are computationally intensive and inefficient for solving the forward problem,^{4}5.6.^{–}^{7}^{,}^{34} i.e., solving the RTE with given parameters. Therefore, Monte Carlo based approaches have been used for estimating the spatially constant (not varying) parameters in homogeneous media, such as paper,^{40}^{,}^{41} clouds,^{42} liquids,^{43} plastics,^{44} or uniform material samples.^{45} Another difficulty of Monte Carlo based inverse methods is that an analytical forward model prediction is hard to obtain when we want to minimize the difference between the prediction and measurements except for very special structures.^{46}^{,}^{47} A gradient based least-square approach has been proposed but only for spatially constant parameter estimation,^{40}^{,}^{41}^{,}^{48} while model-free approaches have relied on genetic algorithms,^{42}^{,}^{44} numerical perturbation,^{49}^{,}^{50} voting,^{51} or even simple backprojection.^{52} One of the contributions of the current paper is to enable us to use a gradient based optimization approach for estimating spatially varying parameters, which is extensible by using many optimization methods.

Similar to optical tomography, modeling light transport plays a very important role in computer graphics. Our own work on optical tomography is inspired by Monte Carlo based statistical methods. In the last two decades, methods based on path integrals^{17}18.^{–}^{19}^{,}^{53}54.^{–}^{55} have provided models of light transport for efficient volume rendering. For solving RTE, a path integral has been used for a forward problem solver,^{16}^{,}^{56}^{,}^{57} and has also been applied to optical tomography, but under the diffusion assumption.^{58}^{,}^{59} Our proposed method is based on a path integral to explicitly express the forward model prediction, which is very suitable for solving the inverse problem with gradient based methods. This is an advantage of our method over existing methods because the paths used in the forward model can be generated by either a deterministic or statistical (Monte Carlo) method. To achieve an efficient forward model, we introduce a simplified layered scattering model that uses a limited number of deterministic paths instead of Monte Carlo simulated ones.

## 3.

## Method: Forward Problem

We deal with the following optical tomography problem [this is a conceptual formulation and the actual problem is shown in Eq. (29)].

## (2)

$$\underset{{\mathit{\sigma}}_{t}}{\mathrm{min}}\sum _{i,j}{|{I}_{ij}-{P}_{ij}({\mathit{\sigma}}_{t})|}^{2},$$## 3.1.

### Forward Model

In the forward problem, as we mentioned before, we use a path integral to build a mathematical model for the light transport. Here, we follow the notation developed in the computer graphics literature^{17}^{,}^{23}^{,}^{53}^{,}^{60} to introduce the path integral. Sections 3.2 to 3.6 will show the simplified model we propose.

Given a space ${\mathfrak{R}}^{3}$, a light source is located at ${x}_{0}\in {\mathfrak{R}}^{3}$ and a detector at ${x}_{M+1}\in {\mathfrak{R}}^{3}$, and in between them is the participating media $\nu \subset {\mathfrak{R}}^{3}$ with boundary $\partial \nu $ and interior volume ${\nu}_{0}\u2254\nu \setminus \partial \nu $. A light path $\tilde{x}$ connecting ${x}_{0}$ and ${x}_{M+1}$ of length $M+2$ consists of $M+2$ vertices ${x}_{m}\in {\mathfrak{R}}^{3}$ for $m=0,1,\dots ,M+1$, denoted by $\tilde{x}={x}_{0},{x}_{1},\cdots ,{x}_{M},{x}_{M+1}$. Thus, absorption, scattering, or reflection events happen at ${x}_{1},\dots ,{x}_{M}$. The set of all paths of length $M$ is denoted by ${\mathrm{\Omega}}_{M}$. The path space $\mathrm{\Omega}$ is the countable set of all paths ${\mathrm{\Omega}}_{M}$ of finite length.

A direction is denoted by $\omega \in {S}^{2}$, where ${S}^{2}$ is a unit sphere in ${\mathfrak{R}}^{3}$. A unit vector ${\omega}_{{x}_{m},{x}_{m+1}}$ is the direction from vertex ${x}_{m}$ to vertex ${x}_{m+1}$ in a path $\tilde{x}$.

Veach^{20} introduced a framework representing the rendering equation in the form of a path integral for scenes without participating media (i.e., no scattering), and later, Pauly et al.^{17} extended it to the volume rendering equation with scattering. The amount of light $I$ observed by the detector is given by the path integral

## (5)

$$d\mu (\tilde{x})=\prod _{m=0}^{M+1}d\mu ({x}_{m}),\phantom{\rule[-0.0ex]{2em}{0.0ex}}d\mu ({x}_{m})=\{\begin{array}{ll}dA({x}_{m}),& {x}_{m}\in \partial \nu \\ dV({x}_{m}),& {x}_{m}\in {\nu}_{0}\end{array},$$## (6)

$$f(\tilde{x})={L}_{e}({x}_{0},{x}_{1})G({x}_{0},{x}_{1})[\prod _{m=1}^{M}{f}_{f}({x}_{m-1},{x}_{m},{x}_{m+1})G({x}_{m},{x}_{m+1})]{W}_{e}({x}_{M},{x}_{M+1}),$$## (7)

$${f}_{f}({x}_{m-1},{x}_{m},{x}_{m+1})=\{\begin{array}{ll}{f}_{s}({x}_{m-1},{x}_{m},{x}_{m+1}),& {x}_{m}\in \partial \nu \\ {\sigma}_{s}({x}_{m}){f}_{p}({x}_{m-1},{x}_{m},{x}_{m+1}),& {x}_{m}\in {\nu}_{0}\end{array}.$$Here, the bidirectional scattering distribution function ${f}_{s}({x}_{m-1},{x}_{m},{x}_{m+1})$ is used for locations on the surface of objects, and the scattering coefficient ${\sigma}_{s}({x}_{m})$ at ${x}_{m}$ and the phase function ${f}_{p}({x}_{m-1},{x}_{m},{x}_{m+1})$ are used for those inside the medium. $G({x}_{m},{x}_{m+1})$ is a generalized geometric term.

where $g({x}_{m},{x}_{m+1})$ is a geometric term.## (9)

$$g({x}_{m},{x}_{m+1})=\{\begin{array}{ll}\frac{|{\mathit{n}}_{g}({x}_{m})\xb7{\omega}_{{x}_{m},{x}_{m+1}}|}{{\Vert {x}_{m}-{x}_{m+1}\Vert}^{2}},& {x}_{m}\in \partial \nu \\ \frac{1}{{\Vert {x}_{m}-{x}_{m+1}\Vert}^{2}},& {x}_{m}\in {\nu}_{0}\end{array},$$## (10)

$$T({x}_{m},{x}_{m+1})=\{\begin{array}{ll}{e}^{-\tau ({x}_{m},{x}_{m+1})},& \{{x}_{m},{x}_{m+1}\}\subset {\nu}_{0}\cup \partial \nu \\ 0,& \text{otherwise}\end{array},$$Putting all together, we have a path integral of the following infinite sum of all possible path contributions.

## (12)

$$I=\sum _{M=2}^{\infty}\sum _{k\in {\mathrm{\Omega}}_{M}}{L}_{e}({x}_{0},{x}_{1})G({x}_{0},{x}_{1})[\prod _{m=1}^{M}{f}_{f}({x}_{m-1},{x}_{m},{x}_{m+1})G({x}_{m},{x}_{m+1})]{W}_{e}({x}_{M},{x}_{M+1})\prod _{m=0}^{M+1}d\mu ({x}_{m}).$$Note that all vertices $\{{x}_{m}\}$ depend on a path $k$; different paths have different sets of vertices. In the equation above, however, we omit the path index $k$ for simplicity. Later, we will again use $k$ as the path index.

## 3.2.

### Assumptions on the Path Integral Formulation

As our target is optical tomography, we restrict the model to deal with inside the participating media. To do so, we assume that the light source ${x}_{0}$ and detector ${x}_{M+1}$ are located on the surface, and the other vertices ${x}_{1},{x}_{2},\dots ,{x}_{M},{x}_{M+1}$ are inside the medium, that is, ${x}_{0},{x}_{M+1}\in \partial \nu $ and ${x}_{1},\dots ,{x}_{M}\in {\nu}_{0}$. Then the transmittance is simplified as

Furthermore, we assume that the observations are ideal and the camera response function is the identity, ${W}_{e}({x}_{M},{x}_{M+1})=1$.

Apart from the assumptions above, we rewrite the geometric term and the differential measure. The definitions above use area measures $dA({x}_{m})$ and volume measures $dV({x}_{m})$ along with the squared distance geometric term;^{17}^{,}^{23}^{,}^{53} however, steradian measures $d\omega ({x}_{m})$ and the identity geometric term is equivalent and also widely used.^{10}^{,}^{12}^{,}^{60}

Therefore, we employ the steradian measures and rewrite it as follows:

## (16)

$$d{\mu}_{k}({x}_{m})=\{\begin{array}{ll}dA({x}_{0}),& m=0\\ d\omega ({x}_{m}),& m=1,\dots ,M+1\end{array}.$$Now, Eq. (12) is written as

## 3.3.

### Discretization of the Forward Model

For numerical computation, we first discretize the medium into voxels of a regular grid, where each voxel has its own extinction coefficient ${\sigma}_{t}[b]$ ($b$ is the index of the voxel) as shown in Fig. 1.

With this voxelization, the paths of light are also divided into segments, as explained below. First, we explain the integral [Eq. (11)] along a single segment ${x}_{m},{x}_{m+1}$ of a path $\tilde{x}$. It describes the attenuation of light along the segment due to the extinction coefficients of the voxels involved. Because of the discretization of the medium, Eq. (11) can be written as a sum of voxel-wise multiplications.

## (18)

$$\tau ({x}_{m},{x}_{m+1})={\int}_{0}^{1}{\sigma}_{t}[(1-s){x}_{m}+s{x}_{m+1}]\mathrm{d}s=\sum _{b\in {\mathcal{B}}_{{x}_{m},{x}_{m+1}}}{\sigma}_{t}[b]{d}_{{x}_{m},{x}_{m+1}}[b]={\mathit{\sigma}}_{t}^{T}{\mathit{d}}_{{x}_{m},{x}_{m+1}}.$$For the second equality, $b$ is the index of a set ${\mathcal{B}}_{{x}_{m},{x}_{m+1}}$ of all voxels involved by segment ${x}_{m},{x}_{m+1}$, and ${d}_{{x}_{m},{x}_{m+1}}[b]$ is the length of the part of the segment ${x}_{m}{x}_{m+1}$ passing through voxel $b$. This is illustrated in Fig. 1(c). The extinction coefficient ${\sigma}_{t}$ is now a piece-wise constant function because of the voxelization; then the integral turns into a sum (the idea that this integral can be turned into a sum has been discussed before,^{61} however, not in the context of tomography).

This simplifies the computation; however, the sum over a set ${\mathcal{B}}_{{x}_{m},{x}_{m+1}}$ is not preferable in terms of implementation and optimization. We propose here to use a vector representation of both extinction coefficients and segment lengths, which is the third equality of the above equation. The first vector ${\mathit{\sigma}}_{t}$ stores the values of the extinction coefficients ${\sigma}_{t}[b]$ of all voxels. This vector can be generated by serializing the voxels on the grid in a certain order. The second vector ${\mathit{d}}_{{x}_{m},{x}_{m+1}}$ contains the values of the lengths ${d}_{{x}_{m},{x}_{m+1}}[b]$ for all voxels. We should note that this vector is very sparse; most of the voxels have no intersection with the segment ${x}_{m},{x}_{m+1}$. Hence, only a few elements in ${\mathit{d}}_{{x}_{m},{x}_{m+1}}$ have nonzero values, and the other elements are zero because those voxels $b$ have no intersection and ${d}_{{x}_{m},{x}_{m+1}}[b]=0$.

This sparsity of the vector facilitates the construction of a whole path $\tilde{x}$ because path segments can be added as follows:

where ${\mathit{D}}_{k}$ is the vector of a complete path $k$ of length $M+2$; the $b$’th element can be interpreted as the length of the segment when the path passes through voxel $b$. This notation simplifies a part of Eq. (17) as follows:## (20)

$$\prod _{m=0}^{M}T({x}_{m},{x}_{m+1})=\prod _{m=0}^{M}{e}^{-\tau ({x}_{m},{x}_{m+1})}={e}^{-\sum _{m=0}^{M}\tau ({x}_{m},{x}_{m+1})}={e}^{-\sum _{m=0}^{M}{\mathit{\sigma}}_{t}^{T}{\mathit{d}}_{{x}_{m},{x}_{m+1}}}={e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{k}}.$$Using this notation to rewrite Eq. (17), we have

## (21)

$$I=\sum _{M=2}^{\infty}{L}_{e}({x}_{0},{x}_{1})\sum _{k\in {\mathrm{\Omega}}_{M}}{H}_{k}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{k}}={L}_{e}({x}_{0},{x}_{1})\sum _{k\in \mathrm{\Omega}}{H}_{k}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{k}},$$## 3.4.

### Two-Dimensional Layered Model of Forward Scattering

As a first attempt, we design a two-dimensional (2-D) layered grid, instead of the three-dimensional (3-D) one. Since we voxelize the medium into a regular grid, the 2-D medium consists of parallel layers. Hereafter, a 3-D direction $\omega $ between vertices is written as a 2-D direction $\theta $ and a steradian measure $d\omega $ as an angular measure $d\theta $.

As shown in Fig. 2, we assume a particular layer scattering having the following properties. First, vertices ${x}_{1},\cdots ,{x}_{M}$ of path $\tilde{x}$ are located at the centers of each voxel. The light source ${x}_{0}$ is located on the boundary of the top surface of the voxels in the top layer. Similarly, the detector ${x}_{M+1}$ is located on the boundary of the bottom surface of the voxels in the bottom layer. Second, directions ${\theta}_{{x}_{0},{x}_{1}}$ and ${\theta}_{{x}_{M},{x}_{M+1}}$ at the beginning and end of a path are perpendicular to the boundary. This means that scattering begins at ${x}_{1}$ and ends at ${x}_{M}$. Third, forward scattering happens layer by layer. More specifically, light is scattered at the center of a voxel in a layer and then goes to the center of a voxel in the next (below) layer. Scattering is assumed to happen every time the light traverses voxel centers. Even if the next voxel is just below the current voxel and the path segment is straight, it is regarded as scattering. Fourth, the scattering coefficient is uniform, ${\sigma}_{s}(x)={\sigma}_{s}$.

By ignoring paths exiting from the sides of the grid, the number of all possible paths is ${N}^{M}$, where $M$ is the number of layers and $N$ is the number of voxels in one layer.

## 3.5.

### Approximating the Phase Function with a Gaussian

We use a Gaussian model ${f}_{p}(\theta ,{\sigma}^{2})$ as an approximation of the phase function

## (23)

$${f}_{p}({x}_{m-1},{x}_{m},{x}_{m+1})\equiv {f}_{p}({\theta}_{m},{\sigma}^{2})=\frac{1}{\sqrt{2\pi {\sigma}^{2}}}\mathrm{exp}\left(\frac{-{\theta}_{m}^{2}}{{\sigma}^{2}}\right),\phantom{\rule[-0.0ex]{2em}{0.0ex}}-\frac{\pi}{2}<{\theta}_{m}<\frac{\pi}{2},$$First, existing phase function models^{10}^{,}^{12}^{,}^{35}36.37.^{–}^{38} are those for 3-D scattering, not for 2-D. This means that those functions are normalized for integrals over the unit sphere ${S}^{2}$: ${\int}_{{S}^{2}}{f}_{p}(\omega )\mathrm{d}\omega =1$. Most of the phase functions assume isotropy (rotational symmetry), and hence, the function has a form taking angle $\theta $ as an argument; however, ${\int}_{-\pi}^{\pi}{f}_{p}(\theta )\mathrm{d}\theta \ne 1$. These functions, therefore, are not adequate for our case.

Second, our assumption of layer-wise forward scattering does not allow scattering to happen backwards or sideways, and the Gaussian model is suitable for it. As shown in Fig. 3, the Gaussian model has the form of forward-only scattering (no backwards or sideways) in a reasonable range of ${\sigma}^{2}$, and it is almost normalized; ${\int}_{-\pi /2}^{\pi /2}{f}_{p}(\theta ,{\sigma}^{2})\mathrm{d}\theta \approx 1$. Other 2-D phase functions exist which are not forward-only. For example, Heino et al.^{62} introduced a 2-D analog of Henyey-Greenstein’s phase function,^{35} shown in Fig. 3. Although the parameters are different, the two functions in Fig. 3 have similar shapes. The most important difference is that Heino’s function has backward scattering, but our Gaussian model does not. More realistic scattering rather than the layer-wise forward scattering introduced here needs Heino’s or Henyey-Greenstein’s phase function.

We should note one further simplification in our layer-wise forward scattering model. The angle ${\theta}_{m}$ in the phase function is usually defined between ${\theta}_{{x}_{m-1},{x}_{m}}$ and ${\theta}_{{x}_{m},{x}_{m+1}}$, that is, the difference of directions changed by the scattering event. Instead of dealing with such an exact difference of directions, we use the angle between ${\theta}_{{x}_{m},{x}_{m+1}}$ and the vertical (downward) direction for efficiency of computation. This assumption enables us to discretize the Gaussian phase function much more easily. Since ${f}_{p}(\theta )$ integrates to (approximately) one, such a normalization can be discretized with a sum as follows:

## (24)

$${\int}_{-\pi /2}^{\pi /2}{f}_{p}(\theta ,{\sigma}^{2})\mathrm{d}\theta \approx \sum _{b\in {\mathcal{B}}_{n}}{f}_{p}({\theta}_{b},{\sigma}^{2})\mathrm{\Delta}{\theta}_{b}\approx 1,$$The above equation can be considered as the energy distribution from a voxel in one layer to the voxels in the next layer. For a voxel $b$ at direction ${\theta}_{b}$, the value of ${f}_{p}({\theta}_{b},{\sigma}^{2})\mathrm{\Delta}{\theta}_{b}$ describes what percentage of the energy will be scattered to this voxel. Figure 5 shows plots of the values corresponding to two phase functions with different parameters. We can see that, due to forward scattering, most of the energy is concentrated in the voxel just below, and a small part goes to the adjacent voxels.

The contribution ${H}_{k}$ in Eq. (22) now needs to be rewritten so that it deals with the Gaussian phase function and the discretized energy distribution discussed above. First, we reorder the measure

## (25)

$${H}_{k}=dA({x}_{0})d\theta ({x}_{M+1})\prod _{m=1}^{M}{f}_{f}({x}_{m-1},{x}_{m},{x}_{m+1})d\theta ({x}_{m})$$## (26)

$$=dA({x}_{0})d\theta ({x}_{1})\prod _{m=1}^{M}{f}_{f}({x}_{m-1},{x}_{m},{x}_{m+1})d\theta ({x}_{m+1}),$$## (27)

$${H}_{k}=dA({x}_{0})\mathrm{\Delta}{\theta}_{{x}_{0},{x}_{1}}{\sigma}_{s}^{M}\prod _{m=1}^{M}{f}_{p}({\theta}_{{x}_{m},{x}_{m+1}},{\sigma}^{2})\mathrm{\Delta}{\theta}_{{x}_{m},{x}_{m+1}}.$$Note that the factor $dA({x}_{0})\mathrm{\Delta}{\theta}_{{x}_{0},{x}_{1}}{\sigma}_{s}^{M}$ is common for all paths because we assumed that the grid is uniform so that $dA({x}_{0})$ is constant, and the direction ${\theta}_{{x}_{0},{x}_{1}}$ (or ${\omega}_{{x}_{0},{x}_{1}}$) is perpendicular to the top surface, and ${\sigma}_{s}$ is constant.

## 3.6.

### Observation Model

Suppose the 2-D layered medium is an $M\times N$ grid; it has $M$ layers, each of which is made of $N$ voxels. We now construct an observation model of the light transport between a light source and a detector: emitting light to each of the voxels at the top layer, and capturing light from each voxel from the bottom layer. More specifically, let $i\in {\mathcal{B}}_{1}$ and $j\in {\mathcal{B}}_{M}$ be voxel indices of the light source and detector locations, respectively. By restricting the light paths to only those connecting $i$ and $j$, the observed light ${I}_{ij}$ is written as follows:

## (28)

$${I}_{ij}={I}_{0}\sum _{k=1}^{{N}_{ij}}{H}_{ijk}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ijk}},$$In the above equation, $k$ indexes the light paths, which share the same $i$ and $j$. Due to the layered scattering model in the $N\times M$ grid, the number of different paths between $i$ and $j$ is ${N}_{ij}={N}^{M-2}$. This is, however, too large even for small $N$ and $M$, e.g., $N=M=10$. Therefore, we exclude paths having small contributions from the computation. This is done by a simple thresholding while computing ${H}_{ijk}$ as shown in Algorithm 1. This results in generating fewer paths; ${N}_{ij}\le {N}^{M-2}$. For example, there are ${N}_{ij}=742$ paths for $N=M=20$ with ${\sigma}^{2}=0.4$ when $\mathrm{th}=0.001$, which enables us to reduce the computation cost.

## Algorithm 1

Computing contribution Hijk and omitting low contribution path by thresholding.

Input: Threshold th, path $\tilde{x}={x}_{0},\cdots ,{x}_{M+1}$. |

Output: Contribution ${H}_{ijk}$. |

1 ${H}_{ijk}=1$; |

2 for$m=1$to$M$do |

3 ${H}_{ijk}={H}_{ijk}{f}_{p}({\theta}_{{x}_{m},{x}_{m+1}},{\sigma}^{2})\mathrm{\Delta}{\theta}_{{x}_{m},{x}_{m+1}}$ |

4 if${H}_{ijk}\le \mathrm{th}$then |

5 stop; |

6 omit this path; |

7 accept this path; |

8 return ${H}_{ijk}$; |

## 4.

## Method: Inverse Problem

Next, we propose a method for the inverse problem of the forward model [Eq. (28)] to estimate the extinction coefficients of the 2-D layered model. As we mentioned before, we fix the light paths and assume that the scattering coefficients and parameters of the Gaussian phase function are uniform and known in advance.

## 4.1.

### Cost Function

In the $M\times N$ 2-D layered medium described in Sec. 3.6, we had assumed a configuration of a light source and detector similar to the left-most one shown in Fig. 6; the light source is located above the medium and the detector is below, and the observed light is ${I}_{ij}$, where $i,j$ are the voxel indices of the light source and detector locations. By sliding the light source and the detector, we can obtain ${N}^{2}$ observations, resulting in the following least-squares equation:

## (29)

$$\underset{{\sigma}_{t}}{\mathrm{min}}\text{\hspace{0.17em}}{f}_{0},{f}_{0}=\sum _{i=1}^{N}\sum _{j=1}^{N}{|{I}_{ij}-{I}_{0}\sum _{k=1}^{{N}_{ij}}{H}_{ijk}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ijk}}|}^{2},$$Furthermore, as shown in Fig. 6, we have four configurations of light sources and detectors by changing their positions. This gives us four different sets of observations ${I}_{ij}$ and paths $ijk$. These four different sets lead to four objective functions (${f}_{T2B}$, ${f}_{L2R}$, ${f}_{B2T}$, ${f}_{R2L}$) as shown in Fig. 6. Since the four objective functions share the same variables ${\mathit{\sigma}}_{t}$, we can use all of them at the same time by adding them to form a new single function ${f}_{0}$ at the expense of additional (factor of four) computation cost.

## 4.2.

### Optimization Problem with Inequality Constraints

Since the inverse problem [Eq. (31)] is nonlinear, we employ an interior point method,^{26} an iterative optimization algorithm for problems with constraints. Here, we first review several key points in optimization; then we will develop an algorithm to solve Eq. (31) along with the required first- and second-order derivatives of the cost function.

## 4.2.1.

#### Unconstrained problem: Quasi-Newton

First, we review optimization without constraints, which is used inside the interior point method. The general form of unconstrained optimization is

## (32)

$$\underset{{\mathit{\sigma}}_{t}}{\mathrm{min}}\text{\hspace{0.17em}}f({\mathit{\sigma}}_{t}),$$To solve it, an iterative procedure begins with an initial guess ${\mathit{\sigma}}_{t}^{0}$ and generates a sequence ${\{{\mathit{\sigma}}_{t}^{k}\}}_{k=0}^{\infty}$. It stops when the change of solutions is small enough. The information about function $f$ at ${\mathit{\sigma}}_{t}^{k}$ or even previous estimates ${\mathit{\sigma}}_{t}^{0},{\mathit{\sigma}}_{t}^{1},\cdots ,{\mathit{\sigma}}_{t}^{k-1}$ is used to calculate a direction ${\mathit{p}}_{k}$ to move with a step size ${\alpha}_{k}$. A line search is often used to determine the step size by searching along the direction starting from ${\sigma}_{t}^{k}$ for finding ${\mathit{\sigma}}_{t}^{k+1}$ with the least value of the objective function

## (33)

$$\underset{{\alpha}_{k}>0}{\mathrm{min}}\text{\hspace{0.17em}}f({\mathit{\sigma}}_{t}^{k}+{\alpha}_{k}{\mathit{p}}_{k}).$$Once we find the step size, the estimate ${\mathit{\sigma}}_{t}^{k+1}$ is updated as ${\mathit{\sigma}}_{t}^{k+1}\leftarrow {\mathit{\sigma}}_{t}^{k}+{\alpha}_{k}{\mathit{p}}_{k}$. The direction is ${\mathit{p}}_{k}=-{B}_{k}\nabla f({\mathit{\sigma}}_{t}^{k})$ for the Newton’s method, where ${B}_{k}={\nabla}^{2}f{({\mathit{\sigma}}_{t}^{k})}^{-1}$ is the inverse of the Hessian.

The Newton’s method is well known for its second-order convergence and accuracy. However, when the dimension of the problem is large, calculating the Hessian and its inverse is computationally expensive. Therefore, Quasi-Newton methods are often used, where the inverse Hessian is updated by incremental approximations in order to reduce the computation cost. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) update rules are well known.^{63}

## (36)

$${B}_{k}=(I-\frac{\mathit{s}{\mathit{y}}^{T}}{{\mathit{y}}^{T}\mathit{s}}){B}_{k-1}(I-\frac{\mathit{y}{\mathit{s}}^{T}}{{\mathit{y}}^{T}\mathit{s}})+\frac{\mathit{s}{\mathit{s}}^{T}}{{\mathit{y}}^{T}\mathit{s}}.$$When the conditions ${\mathit{y}}^{T}\mathit{s}>0$ and ${B}_{0}\succ 0$ (where $\succ 0$ means positive definite) are satisfied, the BFGS update guarantees the positive definiteness of ${B}_{k}$. Algorithm 2 shows the Quasi-Newton method.

## Algorithm 2

The Quasi-Newton method with BFGS update rule.

Input: A feasible initial solution ${\mathit{\sigma}}_{t}^{0}$, and ${B}_{0}\succ 0$. |

Result: An estimate ${\mathit{\sigma}}_{t}^{\star}$. |

1 repeat |

2 Compute the Quasi-Newton direction: ${\mathit{p}}^{k}=-{B}_{k}\nabla f({\sigma}_{t}^{k})$. |

3 Find step length ${\alpha}_{k}$ with line search. |

4 Update estimate ${\mathit{\sigma}}_{t}^{k+1}\leftarrow {\mathit{\sigma}}_{t}^{k}+{\alpha}_{k}{\mathit{p}}^{k}$. |

5 Update ${B}_{k}$ with BFGS. |

6 until convergence; |

## 4.2.2.

#### Constrained problem: interior point

Here we introduce a constrained optimization with inequality constraints of the form

## (37)

$$\underset{{\sigma}_{t}}{\mathrm{min}}\text{\hspace{0.17em}}{f}_{0}({\mathit{\sigma}}_{t})\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}\hspace{0.17em}}{f}_{i}(x)\le 0,\phantom{\rule[-0.0ex]{1em}{0.0ex}}i=1,\dots ,m,$$The idea is to approximate it as an unconstrained problem. Using Lagrange multipliers, we can first rewrite Eq. (37) as

## (38)

$$\underset{{\sigma}_{t}}{\mathrm{min}}\text{\hspace{0.17em}}{f}_{0}({\mathit{\sigma}}_{t})+\sum _{i=1}^{m}I[{f}_{i}({\mathit{\sigma}}_{t})],$$Equation (38) now has no inequality constraints, while it is not differentiable due to $I$.

The barrier method^{26} is an interior point method that introduces a logarithmic barrier function to approximate the indicator function $I$ as follows:

## (41)

$$\underset{{\mathit{\sigma}}_{t}}{\mathrm{min}}\text{\hspace{0.17em}}{f}_{0}({\mathit{\sigma}}_{t})+\sum _{i=1}^{m}-(1/t)\mathrm{log}[-{f}_{i}({\mathit{\sigma}}_{t})],$$## (42)

$$\underset{{\mathit{\sigma}}_{t}}{\mathrm{min}}\text{\hspace{0.17em}}t{f}_{0}({\mathit{\sigma}}_{t})-\sum _{i=1}^{m}\mathrm{log}[-{f}_{i}({\mathit{\sigma}}_{t})].$$The barrier method solves Eq. (42) iteratively by increasing the parameter $t$. At the limit of $t\to \infty $, the above problem coincides with the original problem [Eq. (38)].

## 4.3.

### Algorithm for Solving the Inverse Problem

Algorithm 3 shows our algorithm, which uses a barrier method with Quasi-Newton for solving the inverse problem. We should mention the following parts where we have modified the original algorithm.^{26}

## Algorithm 3

Barrier method of interior point with Quasi-Newton solver.

Data: Parameters $\mu >1$, $\u03f5>0$, and $t={t}_{\mathrm{init}}>0$. |

Input: A feasible initial estimate ${\mathit{\sigma}}_{t}^{0}$, and $B\succ 0$. |

Result: An estimate ${\mathit{\sigma}}_{t}^{\star}$. |

1 while$(2MN/t)\ge \u03f5$do // outer loop: barrier method |

2 $t\leftarrow \mu t$. |

3 Set a log-barriered cost function |

$f(t)=t{f}_{0}-\sum _{b}\{\mathrm{log}({\sigma}_{t}[b])+\mathrm{log}(u-{\sigma}_{t}[b])\}$ |

4 $k\leftarrow 0$, ${B}_{k}\leftarrow B$, ${\mathit{\sigma}}_{t}^{k}\leftarrow {\mathit{\sigma}}_{t}$. |

5 repeat // inner loop: Quasi-Newton |

6 Compute the Quasi-Newton direction: ${\mathit{p}}^{k}=-{B}_{k}\nabla f({\mathit{\sigma}}_{t}^{k})$. |

7 Find step length ${\alpha}_{k}$ with line search. |

8 while${\mathit{\sigma}}_{t}^{k}+{\alpha}_{k}{\mathit{p}}^{k}$ is not feasible do |

9 Halve the step size: ${\alpha}_{k}\leftarrow {\alpha}_{k}/2$. |

10 Update estimate ${\mathit{\sigma}}_{t}^{k+1}\leftarrow {\mathit{\sigma}}_{t}^{k}+{\alpha}_{k}{\mathit{p}}^{k}$. |

11 $\mathit{s}={\mathit{\sigma}}_{t}^{k+1}-{\mathit{\sigma}}_{t}^{k}$. |

12 $\mathit{y}=\nabla f({\mathit{\sigma}}_{t}^{k+1})-\nabla f({\mathit{\sigma}}_{t}^{k})$. |

13 if${\mathit{y}}^{T}\mathit{s}>0$then |

14 Update ${B}_{k+1}$ with BFGS [Eq. (36)]. |

15 else |

16 Reset ${B}_{k+1}\leftarrow ({y}^{T}s/{y}^{T}y)I$. |

17 $k\leftarrow k+1$. |

18 until$(1/2)\nabla f{({\mathit{\sigma}}_{t}^{k+1})}^{T}{B}_{k+1}\nabla f({\mathit{\sigma}}_{t}^{k+1})\le \u03f5$; |

19 $B\leftarrow {B}_{k+1}$, ${\mathit{\sigma}}_{t}\leftarrow {\mathit{\sigma}}_{t}^{k}$. |

Warm start: For each inner loop, the Quasi-Newton method needs an initial guess of the inverse Hessian ${B}_{0}$. Instead of fixing ${B}_{0}$ for every inner loop, we reuse the ${B}_{k}$ of the last inner loop to accelerate the convergence (shown in lines 4 and 19 in Algorithm 3).

Checking feasibility: Since the Quasi-Newton method and line search estimate without constraints, the next estimate ${\mathit{\sigma}}_{t}^{k+1}$ may go beyond the constraints; in our case, each element ${\sigma}_{t}^{k+1}[b]$ in ${\mathit{\sigma}}_{t}^{k+1}$ must be inside $[0,u]$ after the step size has been determined. Therefore, in line 8, we check the feasibility of the estimate ${\mathit{\sigma}}_{t}^{k+1}$ for the current step size ${\alpha}_{k}$. If it exceeds the boundary of the feasible region, we pull the estimate back into the feasible region by halving the step size. If it is still outside the feasible region, then the step size is halved again. Why do we not just set the step size so that ${\mathit{\sigma}}_{t}^{k+1}$ is exactly on the boundary? The reason is the log-barrier: if ${\mathit{\sigma}}_{t}^{k+1}$ is on the boundary, in other words, ${\sigma}_{t}^{k+1}[b]$ is either 0 or $u$, then $\mathrm{log}({\sigma}_{t}[b])$ or $\mathrm{log}(u-{\sigma}_{t}[b])$ becomes infinite, which results in numerical instability. Therefore, the procedure described above is needed.

Checking for positive definiteness: The BFGS update rules guarantee ${B}_{k}$ to be positive definite if ${\mathit{y}}^{T}\mathit{s}>0$ and $B\succ 0$ are satisfied. While the latter is satisfied by giving an appropriate initial guess, the former depends on the updates at each iteration. If it is not satisfied, then the BFGS updates are no longer valid, and we reset the inverse Hessian ${B}_{k}$ to a scaled identity^{63} at line 16.

## 4.3.1.

#### Jacobian

Here, we represent the Jacobian of the objective function ${f}_{0}$ in Eq. (29). Note that the objective function ${f}_{0}$ in Eq. (31) can be derived in the same manner.

We first rewrite the objective function ${f}_{0}$ as follows:

## (43)

$${f}_{0}=\sum _{i=1}^{N}\sum _{j=1}^{N}{|{I}_{ij}-{I}_{0}\sum _{k=1}^{{N}_{ij}}{H}_{ijk}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ijk}}|}^{2}$$## (44)

$$=\sum _{i=1}^{N}\sum _{j=1}^{N}({I}_{ij}^{2}-2{I}_{ij}{I}_{0}\sum _{k=1}^{{N}_{ij}}{H}_{ijk}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ijk}}+{I}_{0}^{2}\sum _{k=1}^{{N}_{ij}}\sum _{l=1}^{{N}_{ij}}{H}_{ijk}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ijk}}{H}_{ijl}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ijl}})$$## (45)

$$=\sum _{i=1}^{N}\sum _{j=1}^{N}[{I}_{ij}^{2}-2{I}_{ij}{I}_{0}\sum _{k=1}^{{N}_{ij}}{H}_{ijk}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ijk}}+{I}_{0}^{2}\sum _{k=1}^{{N}_{ij}}\sum _{l=1}^{{N}_{ij}}{H}_{ijk}{H}_{ijl}{e}^{-{\mathit{\sigma}}_{t}^{T}({\mathit{D}}_{ijk}+{\mathit{D}}_{ijl})}],$$## (46)

$$\frac{\partial {f}_{0}}{\partial {\sigma}_{t}}=\sum _{i=1}^{N}\sum _{j=1}^{N}[2{I}_{ij}{I}_{0}\sum _{k=1}^{{N}_{ij}}{H}_{ijk}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ijk}}{\mathit{D}}_{ijk}-{I}_{0}^{2}\sum _{k=1}^{{N}_{ij}}\sum _{l=1}^{{N}_{ij}}{H}_{ijk}{H}_{ijl}{e}^{-{\mathit{\sigma}}_{t}^{T}({\mathit{D}}_{ijk}+{\mathit{D}}_{ijl})}({\mathit{D}}_{ijk}+{\mathit{D}}_{ijl})].$$To simplify the equation, we use the following notation:

## (47)

$$E=\left[\begin{array}{c}{e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ij1}}\\ {e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ij2}}\\ \vdots \\ {e}^{-{\mathit{\sigma}}_{t}^{T}{\mathit{D}}_{ij{N}_{ij}}}\end{array}\right],\phantom{\rule[-0.0ex]{2em}{0.0ex}}H=\left[\begin{array}{c}{H}_{ij1}\\ {H}_{ij2}\\ \vdots \\ {H}_{ij{N}_{ij}}\end{array}\right],$$## (48)

$${\mathit{D}}_{ij}=\left[\begin{array}{c}{\mathit{D}}_{ij1}\\ {\mathit{D}}_{ij2}\\ \vdots \\ {\mathit{D}}_{ij{N}_{ij}}\end{array}\right],\phantom{\rule[-0.0ex]{2em}{0.0ex}}{\tilde{\mathit{D}}}_{ij}=\left[\begin{array}{cccc}{\mathit{D}}_{ij1}+{\mathit{D}}_{ij1}& {\mathit{D}}_{ij1}+{\mathit{D}}_{ij2}& \cdots & {\mathit{D}}_{ij1}+{\mathit{D}}_{ij{N}_{ij}}\\ {\mathit{D}}_{ij2}+{\mathit{D}}_{ij1}& {\mathit{D}}_{ij2}+{\mathit{D}}_{ij2}& \cdots & {\mathit{D}}_{ij2}+{\mathit{D}}_{ij{N}_{ij}}\\ \vdots & \vdots & \cdots & \vdots \\ {\mathit{D}}_{ij{N}_{ij}}+{\mathit{D}}_{ij1}& {\mathit{D}}_{ij{N}_{ij}}+{\mathit{D}}_{ij2}& \cdots & {\mathit{D}}_{ij{N}_{ij}}+{\mathit{D}}_{ij{N}_{ij}}\end{array}\right].$$Now, ${f}_{0}$ and the gradient can be represented as

## (49)

$${f}_{0}=\sum _{i=1}^{N}\sum _{j=1}^{N}[{I}_{ij}^{2}-2{I}_{ij}{I}_{0}{E}^{T}H+{I}_{0}^{2}{({E}^{T}H)}^{2}],$$## (50)

$$\frac{\partial {f}_{0}}{\partial {\sigma}_{t}}=\sum _{i=1}^{N}\sum _{j=1}^{N}(2{I}_{ij}{I}_{0}\mathrm{sum}[(E\times H)\otimes {\mathit{D}}_{ij}]-{I}_{0}^{2}\mathrm{sum}\{[(E\times H){(E\times H)}^{T}]\otimes {\tilde{\mathit{D}}}_{ij}\}),$$## (51)

$$A=\left[\begin{array}{cccc}{a}_{11}& {a}_{12}& \cdots & {a}_{1m}\\ {a}_{21}& {a}_{22}& \cdots & {a}_{2m}\\ \vdots & \vdots & \cdots & \vdots \\ {a}_{n1}& {a}_{n2}& \cdots & {a}_{nm}\end{array}\right],\phantom{\rule[-0.0ex]{2em}{0.0ex}}B=\left[\begin{array}{cccc}{\mathit{b}}_{11}& {\mathit{b}}_{12}& \cdots & {\mathit{b}}_{1m}\\ {\mathit{b}}_{21}& {\mathit{b}}_{22}& \cdots & {\mathit{b}}_{2m}\\ \vdots & \vdots & \cdots & \vdots \\ {\mathit{b}}_{n1}& {\mathit{b}}_{n2}& \cdots & {\mathit{b}}_{nm}\end{array}\right],$$## (52)

$$A\otimes B=\left[\begin{array}{cccc}{a}_{11}{\mathit{b}}_{11}& {a}_{12}{\mathit{b}}_{12}& \cdots & {a}_{1m}{\mathit{b}}_{1m}\\ {a}_{21}{\mathit{b}}_{21}& {a}_{22}{\mathit{b}}_{22}& \cdots & {a}_{2m}{\mathit{b}}_{2m}\\ \vdots & \vdots & \cdots & \vdots \\ {a}_{n1}{\mathit{b}}_{n1}& {a}_{n2}{\mathit{b}}_{n2}& \cdots & {a}_{nm}{\mathit{b}}_{nm}\end{array}\right].$$## 5.

## Numerical Simulations

In this section, we report the results obtained by numerical simulations using the proposed model.

The following parameters have been used in Algorithm 3: ${t}_{\mathrm{init}}=1.0$, $\mu =1.5$, $\u03f5={10}^{-2}$. For the line search, the range for the step size was ${\alpha}_{k}\in [0,100]$. For the initial guess, we used $B=I$, ${\mathit{\sigma}}_{t}^{0}=0$. For the 2-D layered medium, the grid size was set to $N=M=20$ with square voxels of size 1 (mm), i.e., the medium is $20\text{\hspace{0.17em}\hspace{0.17em}}(\mathrm{mm})\times 20\text{\hspace{0.17em}\hspace{0.17em}}(\mathrm{mm})$, and $dA=1$ (mm). The values of the extinction coefficients are set between 1.05 and 1.55 (${\mathrm{mm}}^{-1}$), and the upper bound in Eq. (30) is set to $u=2.0$ (${\mathrm{mm}}^{-1}$). The parameter of the Gaussian phase function is 0.2 or 0.4, and the scattering coefficient is set to ${\sigma}_{s}=1$ (${\mathrm{mm}}^{-1}$). The threshold for excluding low contribution paths is $\mathrm{th}=0.001$.

The ground truth and the estimated extinction coefficients are shown in Fig. 7. The matrix plots in the top row of the figure represent five different media [from (a) to (e)] used for the simulation. Each voxel $b$ is shaded in gray according to the values of the extinction coefficient ${\sigma}_{t}[b]$, and darker gray represents larger values of ${\sigma}_{t}[b]$. Also, the values of ${\sigma}_{t}[b]$ are displayed at each voxel. In the same manner, the middle and bottom rows show the estimated results when the following values of the parameter of the Gaussian phase function were used: ${\sigma}^{2}=0.2$ and 0.4. Figure 8 shows the observations ${I}_{ij}$ in a matrix form, from which the extinction coefficients are estimated. Each element in these plots is now an observation ${I}_{ij}$. We can see observations with higher values (shown in darker shades of gray in the plots) on the diagonal. The observations obtained for ${\sigma}^{2}=0.4$ seem to be fainter than those obtained for ${\sigma}^{2}=0.2$ due to the larger amount of scattering.

The left-most column of Fig. 7(a) shows the simplest case: the medium has almost homogeneous extinction coefficients of value 1.05 (voxels shaded in light gray) except for a few voxels with much higher coefficients of 1.2 (voxels shaded in dark gray), which means that those voxels absorb much more light than other voxels. The coefficients are estimated reasonably well as shown in the middle and bottom rows, and the root mean squared error (RMSE) shown in Table 1 is small enough with a relative error of $0.0075/1.05=0.7\%$ to the background coefficient value. The other media, shown in columns (b) to (e), have more complex distributions of the extinction coefficients. We summarize the quality of the estimated results in terms of RMSE in Table 1. Numbers in the brackets are relative errors of RMSE to the background extinction coefficient values (i.e., 1.05). Computation time is also shown in Table 1. Note that our proposed method has been currently implemented in MATLAB®, which can be accelerated further by using C++.

## Table 1

Root mean squared errors (RMSEs) and computation time for the numerical simulations for five different types of media [(a) to (e)] with a grid size of 20×20, for two different Gaussian phase function parameter values. Numbers in the brackets are relative errors of RMSE to the background extinction coefficient values (i.e., 1.05).

(a) | (b) | (c) | (d) | (e) | ||
---|---|---|---|---|---|---|

RMSE | ${\sigma}^{2}=0.2$ | 0.0067506 | 0.014253 | 0.017771 | 0.016220 | 0.057692 |

(0.643%) | (1.36%) | (1.69%) | (1.54%) | (5.49%) | ||

${\sigma}^{2}=0.4$ | 0.0075305 | 0.014369 | 0.017704 | 0.015692 | 0.058464 | |

(0.717%) | (1.37%) | (1.69%) | (1.49%) | (5.57%) | ||

Computation time (s) | ${\sigma}^{2}=0.2$ | 142 | 113 | 297 | 190 | 269 |

${\sigma}^{2}=0.4$ | 127 | 110 | 186 | 156 | 267 |

The values of the cost function ${f}_{0}$ over iterations of the outer loop in Algorithm 3 are shown in Fig. 9 for each medium. These curves show that the proposed method effectively minimizes the original objective function [Eq. (31)] for the five different types of media shown here and probably for other media. Figure 10 demonstrates how the log-barriered cost function $f$ in Algorithm 3 evolves over all iterations of the inner loop; the number of iterations in the horizontal axis accumulates all inner iterations of the Quasi-Newton method. We can see that each inner loop successively minimizes the log-barriered function and the warm start (reusing the Hessian from the previous outer loop) may help the gap of values between inner loops.

## 5.1.

### Comparison Results

We compare our method to a standard DOT with FEM (Refs. 64 and 65) using different optimization methods implemented in the Electrical Impedance Tomography and Diffuse Optical Tomography Reconstruction Software (EIDORS).^{64}^{,}^{65} The ground truth used in this comparison is shown in the top row of Figs. 11(a)–11(e): $N=M=24$ medium of size $24\text{\hspace{0.17em}\hspace{0.17em}}(\mathrm{mm})\times 24\text{\hspace{0.17em}\hspace{0.17em}}(\mathrm{mm})$ with extinction coefficient distributions almost the same as those shown in Figs. 7(a)–7(e).

For solving DOT by EIDORS, we used $24\times 24\times 24=1152$ triangle meshes (i.e., each voxel is divided into two triangle meshes), and for the boundary condition, we placed 16 light sources and 16 detectors at the same intervals around the medium. We chose two solvers: Gauss-Newton (GN) method and primal-dual (PD) interior point method. We used ${\mathit{\sigma}}_{t}^{0}=0$ as the initial guess for both our method and EIDORS.

The results obtained by our method (${\sigma}^{2}=0.4$) and DOT with GN and PD are shown in Fig. 11. The results obtained by the proposed method are shown in the second row, which are similar to those in the third row of Fig. 7. The third row in Fig. 11 shows the results for DOT with GN. These kind of blurred results are typical for DOT estimation due to its diffusion approximation. The last row shows results for DOT with PD, which look better than those obtained for DOT with GN, but still have a tendency of overestimating the high coefficient value areas.

We summarize RMSE values and computation time for each method in Table 2 in the same format as Table 1. RMSE values of our method are two to five times smaller than those of DOT, and this demonstrates that the proposed method can achieve much more accurate results.

The current disadvantage is its large computation cost, as our method takes up to 1000 times longer than DOT. We plan to reduce the computation cost by optimizing the code using C++ and adopting other solvers.

## Table 2

RMSEs and computation time for the numerical simulations for five different types of media [(a) to (e)] with grid size of 24×24, for the proposed method and diffuse optical tomography (DOT) with two solvers. Numbers in the brackets are relative errors of RMSE to the background extinction coefficient values (i.e., 1.05).

(a) | (b) | (c) | (d) | (e) | ||
---|---|---|---|---|---|---|

RMSE | Ours | 0.007662 | 0.01244 | 0.026602 | 0.021442 | 0.051152 |

${\sigma}^{2}=0.4$ | (0.730%) | (1.18%) | (2.53%) | (2.04%) | (4.87%) | |

DOT (Gauss-Newton) | 0.053037 | 0.060597 | 0.7605 | 0.059534 | 0.0855 | |

(5.05%) | (5.77%) | (7.53%) | (5.67%) | (8.14%) | ||

DOT (primal-dual) | 0.052466 | 0.0626 | 0.081081 | 0.066042 | 0.080798 | |

(5.25%) | (5.97%) | (8.11%) | (6.60%) | (8.08%) | ||

Computation time (s) | Ours ${\sigma}^{2}=0.4$ | 257 | 217 | 382 | 306 | 504 |

DOT (Gauss-Newton) | 0.397 | 0.390 | 0.407 | 0.404 | 0.453 | |

DOT (primal-dual) | 1.11 | 1.09 | 1.14 | 1.08 | 1.15 |

## 6.

## Conclusion with Discussion

In this paper, we have proposed a path integral based approach to optical tomography for multiple scattering in discretized participating media. Assuming the scattering coefficients and phase function are known and uniform, the extinction coefficients at each voxel in a 2-D layered medium are estimated by using an interior point method. Numerical simulation examples are shown to demonstrate that the proposed framework works better than DOT in the simplified experimental setup, while its computation cost needs to be reduced.

There are many directions for further research, including relaxing the assumption of 2-D layered scattering model to more realistic scattering with other phase functions, using paths generated by Monte Carlo based statistical methods, extending the formulation to a full 3-D scattering model, and solving the issues mentioned below.

Limitations—stability and uniqueness: The current formulation presented in this paper estimates only the extinction coefficients; the scattering coefficients and phase function parameters are assumed to be known and uniform. This is one of the limitations of the proposed method, however, it is a common limitation of optical tomography. It is known that the scattering and absorption coefficients cannot be separated from stationary measurements of light intensity,^{34} and the solutions are not unique. Also, given stationary measurements without angle information, the problem becomes ill-posed^{6}^{,}^{7} and hence not stable. To overcome this limitation, we need to extend the current formulation to handle other measurements that enable stability and uniqueness, such as time-dependent, frequency-dependent, or angle-dependent measurements.

Computational cost: A large part of the computational cost of the proposed method comes from the forward model prediction [Eq. (28)], which appears in the gradient computation [Eq. (7)]. It depends on the number of paths ${N}_{ij}$; we currently use about 700 paths out of all ${20}^{18}$ possible paths, and for each path, we need to compute path vectors ${\mathit{D}}_{ijk}$, ${\mathit{D}}_{ijk}+{\mathit{D}}_{ijl}$, and factors ${H}_{ijk}$. A possible acceleration is the precomputation of these variables, but this would lead to a trade-off with storage cost. Each ${\mathit{D}}_{ijk}$ has dimensions of $20\times 20=400$, each pair of $ij$ has about 700 vectors of ${\mathit{D}}_{ijk}$, and the number of pairs $ij$ (hence observations) is $20\times 20=400$. In total, $\sim 450\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{MB}$ memory would be required even if single precision floating numbers were used for storing all ${\mathit{D}}_{ijk}$. Fortunately, these vectors are necessarily sparse, and we have used sparse matrices to store them. However, the increase will be linear in the number of paths ${N}_{ij}$ and quadratic with the grid size $\mathrm{max}(N,M)$. Therefore, we plan to consider more efficient implementations.

## Acknowledgments

This research is supported in part by a grant from the Japan Society for the Promotion of Science (JSPS) through the Funding Program for Next Generation World-Leading Researchers (NEXT Program) initiated by the Council for Science and Technology Policy (CSTP), and by JSPS KAKENHI Grant Number 26280061.

## References

## Biography

**Bingzhi Yuan** received his BE degree in software engineering from the Beijing University of Posts and Telecommunications, China, and his ME degree in engineering from Hiroshima University, Japan, in 2010 and 2013, respectively. Currently, he is a PhD student at Hiroshima University.

**Toru Tamaki** received his BE, ME, and PhD degrees in information engineering from Nagoya University, Japan, in 1996, 1998, and 2001, respectively. After being an assistant professor at Niigata University, Japan, from 2001 to 2005, he is currently an associate professor in the Department of Information Engineering, Graduate School of Engineering, Hiroshima University, Japan. His research interests include computer vision and image recognition.

**Yasuhiro Mukaigawa** received his ME and PhD degrees from the University of Tsukuba in 1994 and 1997, respectively. He became a research associate at Okayama University in 1997, an assistant professor at the University of Tsukuba in 2003, an associate professor at Osaka University in 2004, and a professor at Nara Institute of Science and Technology (NAIST) in 2014. His current research interests include photometric analysis and computational photography.

**Hiroyuki Kubo** received his ME and PhD degrees from Waseda University in 2008 and 2012. Since 2014, he has been an assistant professor at NAIST, Japan.

**Bisser Raytchev** received his PhD in informatics from Tsukuba University, Japan, in 2000. After being a research associate at NTT Communication Science Labs and AIST, he is presently an assistant professor in the Department of Information Engineering, Hiroshima University, Japan. His current research interests include computer vision, pattern recognition, high-dimensional data analysis, and image processing.

**Kazufumi Kaneda** is a professor in the Department of Information Engineering at Hiroshima University. He received his BE, ME, and DE degrees from Hiroshima University, Japan, in 1982, 1984, and 1991, respectively. In 1986, he joined Hiroshima University. He was a visiting researcher at Brigham Young University from 1991 to 1992. His research interests include computer graphics and scientific visualization.