Open Access
14 August 2020 Efficient inversion strategies for estimating optical properties with Monte Carlo radiative transport models
Callum M. Macdonald, Simon Arridge, Samuel Powell
Author Affiliations +
Abstract

Significance: Indirect imaging problems in biomedical optics generally require repeated evaluation of forward models of radiative transport, for which Monte Carlo is accurate yet computationally costly. We develop an approach to reduce this bottleneck, which has significant implications for quantitative tomographic imaging in a variety of medical and industrial applications.

Aim: Our aim is to enable computationally efficient image reconstruction in (hybrid) diffuse optical modalities using stochastic forward models.

Approach: Using Monte Carlo, we compute a fully stochastic gradient of an objective function for a given imaging problem. Leveraging techniques from the machine learning community, we then adaptively control the accuracy of this gradient throughout the iterative inversion scheme to substantially reduce computational resources at each step.

Results: For example problems of quantitative photoacoustic tomography and ultrasound-modulated optical tomography, we demonstrate that solutions are attainable using a total computational expense that is comparable to (or less than) that which is required for a single high-accuracy forward run of the same Monte Carlo model.

Conclusions: This approach demonstrates significant computational savings when approaching the full nonlinear inverse problem of optical property estimation using stochastic methods.

1.

Introduction

Inverse problems arise in many areas within biomedical optics, both for global characterization of optical properties of media and for image reconstruction, among other applications.1 Inverse problems are often considered as optimization problems, solved by deriving the gradient of an objective function and iteratively descending through the solution space. This process requires repeated solutions of forward and corresponding adjoint problems that are often computationally demanding in their own right. If the forward problem is given by the solution to a partial differential equation (PDE), then one appealing approach is to solve the forward and inverse problems simultaneously so that the forward problem is only approximately solved at intermediate stages in the algorithm (i.e., before it has finally converged); this approach (which has its basis in optimal control) is known as PDE-constrained optimization.24 In this work, we seek an equivalent framework for the case where approximate noisy solutions to the forward model can (or must) be sought by stochastic methods.

The application of stochastic methods for the solution of PDEs is particularly pertinent in problems involving diffuse optics, since the “gold standard” method of solving the radiative transfer equation (RTE)—which is the most generally applicable description of the underlying physics—is to use stochastic (Monte Carlo) techniques;5 their use in such applications parallels their extensive employment in other fields such as neutron physics.6 While approximations to the RTE (such as diffusion) which permit deterministic solutions are available, these are often not valid in many cases such as in small domains, close to sources and boundaries, and in regions with weak scattering or strong absorption. Analytical solutions to the RTE itself are known for some geometries, such as infinite space7 and layered media,8 but such expressions are not readily available for general domains. The practicality of Monte Carlo techniques has been significantly boosted by recent advances in computational hardware developments, particularly in the application of parallelization.9,10 Other approaches to improve their computational performance have been explored, such as the introduction of perturbation techniques11 or variance reduction techniques.12,13 Consequently, even when the aforementioned approximations to the RTE are reasonable, Monte Carlo solutions may offer an attractive alternative to the use of deterministic techniques such as the finite element method, when the complexity of the geometry or probe requires a high-density discretization of the spatial domain.

With both deterministic and stochastic solvers, the computational cost of the forward model typically remains the limiting factor in image reconstruction procedures. However, stochastic methods have a particular quality distinct from deterministic methods: one may arbitrarily trade computational expense against noise in the estimated solution without bias. In the case of diffuse optics, this trade-off is mediated through the number of virtual photons simulated by the Monte Carlo model for a given problem. This fact naturally leads one to consider how much noise can be tolerated during the solution of the inverse problem, and if a strategy can be found by which to approach this solution with the least work.

Parallels can be drawn between this problem and large-scale machine learning, where the requirement is to find the global minimum of a loss function expressed through fitting a model to a very large set of training data. The recent growth in this field has led to significant developments in optimization methods using stochastic subsets, especially the use of approximate gradients at intermediate steps, which is a technique known as stochastic-gradient descent (SGD). At the heart of this issue is the interplay between optimization and randomness, and the fact that attaining highly accurate estimations of the gradient at each step in SGD can come at a high cost when dealing with large datasets. However, if we can accept certain levels of randomness in our gradient computation, then each step in the gradient descent (GD) can be achieved at a lower computational cost. Returning to the context of biomedical optics, we may be able to accept a “noisy” low-cost forward model computation (which would otherwise be undesirable in the PDE-constrained approach) and simulate fewer photon trajectories during the earlier stages of the inversion process, leading to an overall accuracy versus computation time benefit. Thus, the topic of how to most efficiently utilize finite-sized datasets in machine learning is relevant to the deployment of Monte Carlo-based solvers in biomedical optics.

In this study, we attempt to translate these recent insights from SGD in machine learning into practical suggestions to improve the use of Monte Carlo methods in inverse problems that arise in biomedical optics. To do this, we employ a fully stochastic computation of an objective gradient using forward and adjoint models of the RTE solved by the Monte Carlo method. This allows for the full nonlinear inverse problem to be approached. In our demonstration problems, the inverse problem can be approximately solved using a total computational expenditure which is similar or less than that which would typically be dedicated to a single high-quality (low variance) solution of the forward imaging problem.

This paper is organized as follows. First, we outline some key aspects of GD in Sec. 2, including what appropriate metrics can be used to quantify acceptable levels of variance in the computation of subgradients via a stochastic process (i.e., Monte Carlo), and what step sizes to use to allow convergence. In Sec. 3, we describe the example problems for evaluating the improvements of SGD in a biomedical context, including details of the Monte Carlo forward/adjoint models and gradient calculations. In Sec. 4, we apply these ideas to two different coupled-physics imaging (CPI) modalities, namely, quantitative photoacoustic tomography (QPAT) and ultrasound-modulated optical tomography (UMOT).14,15 Both of these problems are nonlinear and entail the RTE for an accurate description, but exhibit different degrees of ill-posedness and resolution; thus they serve to demonstrate the generality of our approach. We evaluate the performance of various Monte Carlo inversions using simulated QPAT and UMOT data in Sec. 4, and discuss what practical lessons can be taken from this in Sec. 5.

2.

Modeling and Inversion Problems in Optical Tomography

A common problem in biomedical optics involves finding the internal distribution of some optical properties x within a medium using various measurements made around and/or within the medium yobs. To do this, we can employ some forward model of the underlying physical problem A, which produces an output y, given some estimate of the internal properties x,

Eq. (1)

y=Ax,
where in this case, the forward model A could represent the RTE and all relevant aspects of the optical setup (geometry of sources and detectors). In cases where A is not directly invertible, then to solve for an unknown distribution of properties x, we can formulate a cost function as a measure of the quality of an estimate. This could for example be the L2-norm of the residual between the real measured data, yobs, and our forward modeled data, y.

Eq. (2)

F(x)=12yobsy2=12yobsAx2.
From this point, the problem now becomes one of minimization, where we will qualify our solution x* as that which minimizes the cost function, x*=argminxF(x). It is worthy to note that the ground-truth parameters xtrue may differ from the minimizer x* leading to reconstruction error. This minimization problem can be approached via iterative GD, where we start with some estimate x0, and each successive iterate, xn, is determined by subtracting a (scaled) gradient of our cost function F (relative to the internal optical properties) from the previous iterate

Eq. (3)

xn=xn1αnF(xn1),
where αn is the step size which scales the update term. If we have access to some computation or set of computations (sometimes referred to as a “first-order oracle”) which we can call to compute F(xn1) and F(xn1), then this algorithm can be implemented and is said to converge if limnF(xn)=0. In practice, the descent may be terminated early once the cost function reaches some acceptable value, for example, when the norm of the difference between observed and model data is of the same order as measurement noise, a criterion known as the discrepancy principle.16

2.1.

Stochastic-Gradient Descent

In a stochastic setting, for instance, when our forward model A is a Monte Carlo model of radiative transport, then the true cost F and gradient F given in Eq. (3) are not directly available. Instead, we may only have access to estimates of the cost function and gradient (provided by a stochastic first-order oracle). In Sec. 3, we detail the nature of these stochastic Monte Carlo computations in the radiative transport setting. In the interest of generality, for now, we simply assume such models exist, and we can make a call to a “stochastic oracle” to attain FSn and FSn which we assume are nonbiased approximations., i.e.,

Eq. (4)

E[FSn(xn)]=F(xn),E[FSn(xn)]=F(xn),
where E denotes the mean (expected) value for scalar quantities or the mean (expected) vector for vector quantities such as the gradient. Here, Sn denotes the n’th “sample” used in the computation. The meaning of sample here depends on the application. For example, in machine learning, this may refer to a particular training example (or group of training examples) to be used during one learning iteration.17 In Monte Carlo modeling of radiative transport, the sample refers to the set of virtual photons (and their associated random number seeds) that are initiated in the simulation to represent an optical source, which are subsequently used to estimate F(xn) and F(xn). The stochastic version of gradient descent (SGD) thus attempts to minimize a sampled objective function, FSn, by updating the previous iterate with a scaled sampled gradient.

Eq. (5)

xn=xn1αnFSn(xn1).
As with any computation, a call to a stochastic oracle at each iteration comes with a certain computational cost. The particular cost may depend on a number of factors, including the sample size, |Sn|. This is one of the reasons why the study of SGD is of such importance in modern machine learning, where training datasets may be of an enormous size, which means that computing a gradient based on all available data at each iteration could be very costly. Rather, individual samples (|Sn|=1) or batches of samples (|Sn|>1) may be used instead at each iteration. While this degrades the quality of any individual gradient estimate compared to using all available data, if the variance of these estimates is maintained below an acceptable value, the overall trade-off may be net positive. What this means in a Monte Carlo radiative transport context is that we may be able to allow low-quality gradient estimations (simulating only a small number photons) for a large part of the inversion process when estimating optical properties, saving on per iteration computational resources, and leading to an overall efficiency improvement. This is in contrast to typical implementations of iterative Monte Carlo solvers in the biomedical optics community, where each iteration is computed with large numbers of photons that are deemed sufficient to produce “stable” and “smooth” (low variance) forward model data.1824 In some cases, where a linearized approximation is assumed for the inverse problem, the cost of rerunning the forward model can be avoided using techniques such as perturbation Monte Carlo (PMC) methods.11,25,26 However, for the full nonlinear problem, although PMC can be used for calculation of the problem Jacobian, this has to be recomputed at each iteration of, for example, a Gauss–Newton optimization scheme.27

In this study, if we are to accept a level of variance and imperfection in our forward/adjoint models, this of course raises the question of how much variance is acceptable in order for SGD to be successful? Furthermore, what measure of the variance is the best indicator in terms of efficiency/performance for common Monte Carlo solvers? To begin to answer this, it is important to first note that fixed-step SGD does not in general converge to a solution.28,29 That is, if αn is fixed for all n, eventually there will come a point where the next update of the estimate [with the term αnFSn(xn1)] will reliably “undo” the work of the prior step, which will effectively halt the descent. The point at which this occurs depends on the variance of FSn. We can see this by rewriting the sampled stochastic gradient estimate as

Eq. (6)

FSn(xn)=F(xn)+ϵSn(xn),
where ϵ is a random vector with E[ϵSn(xn)]=0 for all n. As GD progresses successfully, the “true” gradient F will eventually begin reducing in size as we near the minimum. Once the magnitude of the true gradient reduces to a point at which it is comparable to the randomness of ϵSn, the problem arises. The larger the expected magnitude of ϵSn, the sooner the minimization of the cost function reaches this limiting scenario, where further iterations will only lead to a random walk about this point.

To prevent this from happening, we may take one of two actions (or a combination thereof): (i) reduce the step size at each iteration such that we can avoid “backtracking” in the descent, more on this in Sec. 2.3 or (ii) gradually improve the accuracy of our sampled gradient such that the variance of the sampled gradient remains below some threshold value compared to the norm of the true gradient F. In other words, we may wish to ensure the inequality

Eq. (7)

Vtot2(xn)E[ϵSn(xn)2]F(xn)2γtot2,γtot>0,
where γtot is a positive coefficient describing the acceptable threshold. The aforementioned inequality is known as the “norm test.”30 It is worthy to note that, since for any vector of random variables the variance of its length is the sum of the variances parallel and orthogonal to any fixed vector, this test equally penalizes the components of randomness parallel and perpendicular to the true gradient. Recent studies, however, have demonstrated that controlling the component of randomness parallel to F is potentially a more relevant objective, as the component of the sampled gradient orthogonal to the true gradient is zero in expectation. An alternative measure of acceptable variance in FSn has thus been introduced as the “inner product test,”30 which only aims to restrict the component of variance in the sampled gradient parallel to the true gradient F.

Eq. (8)

V2(xn)E[ϵSn(xn),F(xn)2]F(xn)4γ2,γ>0.
This inner product test imposes a less restrictive limitation of the overall variance in the sampled gradients, particularly in cases where the variance may be higher in directions orthogonal to the true gradient than in the direction parallel to F. However, either of these metrics will be able to exploit the fact that an increased expected error, E[ϵSn], will correlate to a cheaper computation of the estimated gradient. Thus, setting larger values of γtot or γ in the inequalities will correspond to cheaper computational requirements for each step, but also a more pronounced random walk component to the GD. In many cases, it may be found that the penalty paid by increasing the random walk component is acceptable (up to a point) compared to the penalty paid in computational cost for reducing the expected norm of ϵ to a negligible value. For example, using Monte Carlo RTE simulations to compute F with a negligible level of variance (i.e., setting γtot1) may take billions of simulated photons at each step. Whereas, it may be possible to compute a gradient that passes the norm test or inner product with larger values of γtot or γ with many orders of magnitude less photons, particularly during the early stages of GD, where we may be far from the minimum. The ideal choice of γtot or γ will depend on the specific application.

2.2.

Adaptive Sample Size

We have discussed two different measures of the variance in the sampled gradient FSn that we wish to investigate in the context of Monte Carlo estimation of media properties, viz., the norm test [Eq. (7)] and the inner product test [Eq. (8)]. To satisfy the inequalities defining these tests as the GD progresses, we will be required to reduce the variance in the sampled gradients FSn whenever the norm test or inner product test fail. This can be done by increasing the sample size (number of photons used |Sn|) when making a call to the stochastic oracle. Two practical considerations are still required: first, how to compute the “true” gradient F, which is needed to evaluate the norm test and inner product test; and second, by how much we should increase the sample size in a situation where one of the tests fails?

The true gradient F is only calculable in the limit that an infinite number of photons are used in the Monte Carlo model. This limit can equivalently be represented as an average over independent repeated outputs of the sampled gradient

Eq. (9)

F(xn)=limNrep1Nrepj=1NrepFSj(xn),|Sj|=|Sn|  j.
Using a finite value of Nrep in the evaluation of Eq. (9) provides an approximation to the true gradient, and when this is used to compute the norm and inner product tests [Eqs. (7) and (8)], the inequalities will fail before they would if Nrep=, thus acting as a conservative approximation. It is noted that this method of approximating the true gradient is computationally taxing. However, in practice, the inner product test and norm tests can still be conducted efficiently if they are only computed occasionally (not at every iteration) of the descent. For example, using Nrep=100 repeated computations of the sampled gradient to conduct the tests once every 100 iterations (thus only updating our sample size every 100 iterations) would only double the total number of simulated photons required for the inversion. In this study, we evaluate these metrics once every 10 iterations using Nrep=100 repeated sampled gradients. While this is a significant computational burden, we do so in this study as we are interested in assessing the best case scenarios for such methods. It is worthy to note that although we compute the aforementioned approximation to the true gradient to evaluate the inner product and norm tests, we only ever update our estimate using the sampled gradient.

In terms of increasing the sample size in the event where the inner product and/or norm tests fail, this can be done in a number of ways. A simple method we will employ in this study is to scale the current sample size by some factor κ(n) to increase the number of photons used in the next iteration

Eq. (10)

|Sn+1|=κ(n)|Sn|.
One option for κ(n) is to use the same factor by which the variance exceeds our imposed limit at a given point in the descent. For instance, upon failure of the inner product test for a chosen value of γ, we can increase the sample size on the next iteration using κ(n)=V2(xn)/γ2. However, we also investigate other forms of κ(n) in Sec. 4, which better cope with statistical variations that can lead to overestimating the required sample size increase.

2.3.

Step Size

In cases where we are not taking actions to bound the error in the sampled gradient (such as enforcing successful outcomes of an inner product test or norm test), fixed-step SGD may only converge to a region around the solution. Reducing the step size sufficiently at each step is usually required to allow convergence.31 However, it can be shown that if we are bounding the error in the sampled gradient, e.g., by increasing the sample size, then fixed-step SGD may converge so long as the following is satisfied for all n:30

Eq. (11)

αn1(1+γtot2)L,
where L is the Lipschitz constant for F. The Lipschitz constant for a functional F is a measure of its rate of change with respect to its parameter and can be defined, for example, as the smallest constant such that 2FLId, where Id is the identity matrix, and we assume that F is twice continuously differentiable. It can also be interpreted as the largest eigenvalue of the Hessian of F.32 As intuition may indicate, when the sample size (e.g., number of simulated photons) increases toward the maximum number of samples |Sn||Smax| (|Smax|= in the case of Monte Carlo RTE simulations), the expected error in the sampled gradient approaches zero, |ϵSn|0, as do the measures of variance in the sampled gradients (Vtot20, V20), as defined in Eqs. (7) and (8). In other words, as the stochasticity in the problem reduces to zero, we approach the classical step size of the deterministic problem given by α=1L.32

In this study, we aim to satisfy the aforementioned step size criteria for an assumed value of the Lipschitz constant L, which we will choose conservatively depending on the particular scenario. However, as we are primarily interested in reaching the best possible solution for a given allocation of computational resources, convergence to a region around the unique solution may be sufficient for our purposes. For this reason, we will also investigate larger step size criteria, which violate Eq. (11), yet exhibit good performance in our scenarios of interest.

Taking the above considerations into account, we present a basic method for SGD using adaptive sample sizes in Algorithm 1 (simplified from Ref. 30). The algorithm imposes a limit on the total number of photons to be simulated using Monte Carlo transport models throughout the entire descent, Nph.

Algorithm 1

Inversion using Monte Carlo sampled gradients with adaptive sample size.

Choose initial photon sample size |S1|, and desired value of γ or γtot
whilei=1n|Si|<Nphdo
if run test? then
  compute sampled gradient, FSn, and approximate true gradient, F [using Eq. (9)]
  check norm test (or) inner product test is satisfied
  if test fail then
   increase sample size on next iteration |Sn+1|=κ(n)|Sn|
  else
   set |Sn+1|=|Sn|
  end if
else
  compute sampled gradient only FSn
  set |Sn+1|=|Sn|
end if
 update xn+1=xnαnFSn
end while

3.

Stochastic Forward and Adjoint Models

In this section, we cover the computation of the stochastic forward model and stochastic gradient approximation, referred to as the first-order stochastic oracle. We will cover the basic radiative transport forward problem, and the gradient computations involved in our example problems of absorption estimation in QPAT and UMOT. The specific details of these models are not required to understand the main premise of this paper, but serve as a demonstration in a context familiar to many in the biomedical optics community, where Monte Carlo models of optical transport are employed to estimate medium properties.

3.1.

Forward Model

For any optical source Q(r,s^), either incident on a medium or present within it, we wish to model the resulting radiance, ϕ(r,s^), describing the radiant flux at each position r, and in each direction s^. This can be achieved using the RTE.

Eq. (12)

[s^·+μa(r)+μs(r)]Tμa,μsϕ(r,s^)=μs(r)S2p(s^,s^)Sμsϕ(r,s^)ds^+Q(r,s^),
where T and S denote the attenuation and scattering operators, respectively, which together compose the RTE operator, L. For notational convenience, we assume that Eq. (12) is combined with appropriate boundary conditions, which we do not write explicitly here; see Ref. 33 for more details. Here, μa is the absorption coefficient, μs is the scattering coefficient, and p is the scattering phase function. Using the defined operators, Eq. (12) can be rewritten in a more compact form:

Eq. (13)

Lμa,μsϕ=(Tμa,μsSμs)ϕ=Q.
To obtain (stochastic) estimates of the radiance resulting from a given source, and thus to obtain an estimate of any derived data function y(ϕ), we can implement a Monte Carlo solver, LMC1. In this study, we have adapted a hardware-accelerated version (utilizing graphics processing units) of the commonly employed “Monte Carlo multilayer” program used to simulate radiative transport within a layered planar medium.34,35 The basic operation of this program is unchanged from the original release. Simulated photons are initiated by sampling from a given source function, Q, and scattering/absorption events are pseudorandomly generated along each photon’s trajectory until either: (i) the photon leaves the domain or (ii) the photon drops its weight below some threshold value. In this study, the scattering directions are sampled from the Henyey–Greenstein scattering phase function. The expected accuracy of the computed radiance using Monte Carlo solvers LMC1 depends on the total number of photons used, i.e., the sample size |Sn|. As |Sn|, the radiance approaches the deterministic solution of the RTE. Importantly, however, Monte Carlo models allow an estimate of the radiance to be achieved with any number of photons with |Sn|1. The expected computational requirements (number of floating point operations) of the Monte Carlo solver LMC1 also scales with the number of photons simulated, and it is this trade-off between accuracy of the forward model (and corresponding adjoint model) and computational cost that we will be investigating.

3.2.

Gradient Computation: Adjoint Model

To compute the gradient of our cost function F with respect to the optical properties of the medium, we make use of an adjoint RTE model. Although direct methods of finding the derivative of a Monte Carlo method can also be developed,12 adjoint methods have more applicability in general, and also allow closer comparison with optimization techniques used in machine learning. For further details of forward and adjoint methods in the RTE, we refer to Ref. 36; for specific details of CPI problems, we refer to Ref. 37. We first consider a change to Eq. (12) where μaμa+μaδ, μsμs+μsδ, for the same source Q, which results in a change in radiance ϕϕ+ϕδ. This implies

Eq. (14)

(Tμa+μaδ,μs+μsδSμs+μsδ)(ϕ+ϕδ)=(Tμa,μsSμs)ϕ,(Tμa,μsSμs)ϕδ=(μaδ+μsδSμsδ)ϕ,

Eq. (15)

Lμa,μsϕδ=(μaδ+μsδSμsδ)Lμaδ,μsδδϕ.
We also define the fluence, Φ, as the angular integral of the radiance:

Eq. (16)

Φ(r)=S2ϕ(r,s^)ds^.
To proceed beyond this point, we must now consider the specific form of the data function relevant to a particular modality of interest. We begin with the first of our two example modalities, QPAT.

3.2.1.

QPAT case

In QPAT, the medium is illuminated with a pulsed optical source, Q (see Fig. 1). The distributed optical energy is absorbed at various points within the sample, giving rise to internal acoustic waves. These acoustic waves can be detected at the surface of the medium by a sensor and processed to locate the initial pressure distribution p0 within the medium.3840 This internal pressure distribution is related to the spatial distribution of absorbed optical energy, h, where

Eq. (17)

h(r)=μa(r)Φ(r),
and where Φ is the optical fluence of Eq. (16). We have omitted the Grüneisen parameter for clarity of exposition, though this parameter can be included in practice. Assuming that we can recover the absorbed optical energy, h, the problem remains to find the distribution of μa(r) within the medium.41,42 It is worth noting that although the optical source is pulsed, it is acceptable to use a continuous-wave (time-independent) model to describe ϕ and Φ because the time scale of the acoustic wave propagation is orders of magnitude slower than the optical propagation.43 First, restating our cost function in terms of the QPAT data function, h, we have

Eq. (18)

FQPAT=12Ω(hobsh)2dr=12hobsh,hobshL2(Ω).

Fig. 1

Setup for QPAT.

JBO_25_8_085002_f001.png

We then write the Fréchet derivative of FQPAT as

Eq. (19)

DFQPAT=hobsh,DhμaδL2(Ω),
where μaδ is a small change in absorption. In this paper, we will neglect changes in scattering, however, the below formalism is still general for the gradient with respect to absorption. The gradient term with respect to scattering coefficient is described in Ref. 42 and will be included in future investigations. Writing the Fréchet derivative of h as

Eq. (20)

Dh=Φ+μa·DΦ,
and defining Φδ=DΦμaδ, we arrive at

Eq. (21)

DFQPAT=Φ(hobsh),μaδL2(Ω)μa(hobsh),ΦδL2(Ω).
Next, we define the adjoint radiance, ϕ*, as the solution to

Eq. (22)

L*ϕ*=μa(hobsh)
where the right-hand side describes the “adjoint source” which is isotropic in s^. We then substitute the above into Eq. (21) to give

Eq. (23)

DFQPAT=Φ(hobsh),μaδL2(Ω)L*ϕ*,ϕδL2(Ω×Sn1),
where we exploited the fact that the right-hand side of Eq. (22) does not depend on direction. Using the definition of the adjoint operator, and the fact that the change in radiance is zero on the boundary Ω yields

Eq. (24)

DFQPAT=Φ(hobsh),μaδL2(Ω)ϕ*,LϕδL2(Ω×Sn1).

Finally, we make use of the perturbation expression Eq. (15), while again, here, we neglect any change in scattering. This gives

Eq. (25)

DFQPAT=Φ(hobsh),μaδL2(Ω)+ϕ*ϕ,μaδL2(Ω×Sn1),
allowing us to define the (absorption) gradient as in Eq. (33) of Ref. 42

Eq. (26)

FQPATμa=FQPAT=Φ(hobsh)+Sn1ϕ*ϕds^

To compute a stochastic approximation of this gradient, we can thus use the forward model Monte Carlo solver LMC1 to provide estimates of ϕ and Φ, and an adjoint Monte Carlo solver LMC1* to produce ϕ* from an adjoint source term Qadj=μa(hobsh), as defined in Eq. (22). Due to the symmetry of the problem, the adjoint solver is identical to the forward solver and follows the same basic operating principles. The only difference is that here the adjoint source Qadj=μa(hobsh) may in fact be negative in some locations. This is handled by splitting the source term into two parts, one purely positive, Qadj+, and one purely negative, Qadj. Two simulations are then run (where the total number of photons to be used is split between the two simulations accordingly), and the results summed to produce ϕ*. Algorithm 2 describes the basic operation for computing a sampled gradient, FSn, for QPAT using the above derivation. This will be used in conjunction with Algorithm 1 to conduct an inversion with adaptive sample size for each iterate, |Sn|.

3.2.2.

UMOT case

Referring to Fig. 2, in UMOT, we have an optical light source Qq incident on a medium, as well as an optical detector Jm. In addition, an ultrasound source is incident on the medium, where the focus η(r) is scanned through the sample.44,45 Assuming for simplicity an ideal (delta function) ultrasound focus, the data of interest in this case are found to be of the form46

Eq. (27)

b(r)=η(r)Φq(r)Φm(r),
where Φq is the fluence resulting from the optical source Qq, and Φm is the resulting fluence from a virtual source Qm which is reciprocal to the detector Jm.46 From this point, we proceed in similar fashion as in Sec. 3.2.1, where now our data fitting error is given by

Eq. (28)

FUMOT=12Ω(bobsb)2dr=12bobsb,bobsbL2(Ω),
and its Fréchet derivative as

Eq. (29)

DFUMOT=bobsb,DbμaδL2(Ω).
In this case, the Fréchet derivative of b becomes

Eq. (30)

Db=ηΦq·DΦm+ηΦm·DΦq,
leading to

Eq. (31)

DFUMOT=ηΦq(bobsb),ΦmδL2(Ω)ηΦm(bobsb),ΦqδL2(Ω).
Here, we need to define two adjoint radiances, ϕ*,1 and ϕ*,2, as the solution to

Eq. (32)

L*ϕ*,1=ηΦq(bobsb),

Eq. (33)

L*ϕ*,2=ηΦm(bobsb),
and substituting into Eq. (31) to give

Eq. (34)

DFUMOT=L*ϕ*,1,ϕmδL2(Ω×Sn1)L*ϕ*,2,ϕqδL2(Ω×Sn1);
by the same arguments as for QPAT we get

Eq. (35)

DFUMOT=ϕ*,1,LϕmδL2(Ω×Sn1)ϕ*,2,LϕqδL2(Ω×Sn1).
Again using the perturbation expression [Eq. (15)], we have

Eq. (36)

DFUMOT=ϕ*,1ϕm,μaδL2(Ω×Sn1)+ϕ*,2ϕq,μaδL2(Ω×Sn1),
allowing us to define the (absorption) gradient as

Eq. (37)

FUMOTμa=FUMOT=Sn1(ϕ*,1ϕm+ϕ*,2ϕq)ds^.

Fig. 2

Setup for UMOT in the transmission geometry.

JBO_25_8_085002_f002.png

Thus, similar to the QPAT case, here we are able to compute a stochastic approximation of this gradient using the forward model Monte Carlo solver LMC1 to provide ϕq and ϕm from our two sources, and an adjoint Monte Carlo solver LMC1* to produce ϕ*,1 and ϕ*,2 from the adjoint source terms Qadj1=ηϕq(bobsb) and Qadj2=ηϕm(bobsb), as defined in Eqs. (32) and (33). Here as well, adjoint source terms are split into two parts, one purely positive, Qadj+, and one purely negative, Qadj, with the photon budget being split accordingly. Algorithm 3 describes the basic operation for computing a sampled gradient, FSn, for UMOT using the above derivation. This will be used in conjunction with Algorithm 1 to conduct an inversion with adaptive sample size for each iterate, |Sn|.

Algorithm 2

Monte Carlo sampled QPAT gradient.

1. Compute LMC1Qϕ, Φ, using |Sn|/2 photons
2. Construct internal adjoint source Qadj=μa(hobsh)
3. Compute LMC1*Qadjϕ*, Φ*, using |Sn|/2 photons
4. Use Eq. (26) to compute gradient FSn

3.3.

Fluence Monte Carlo

It should be noted that numerous Monte Carlo radiative transport solvers do not explicitly output the radiance, as this requires additional programming to store the angular ordinates at each location. Commonly, only the fluence will be available, which is the angular integral of the radiance Eq. (16). In such cases, the aforementioned integrals for the gradients of interest Eqs. (26) and (37) can be computed under the assumption of approximately angularly isotropic radiances, where for example ϕ*ϕds^ becomes Φ*Φ. The accuracy of this approximation of course depends on the true angular dependence of the radiances, where the approximation is poorest in regions close to directional light sources, but improves further away. The higher the scattering asymmetry g of the medium, the slower the approximation improves as a function of distance from these sources. In many cases, however, this is a satisfactory assumption and is employed in the below example cases.

4.

Results

In this section, we present the results of a number of investigations using our two example problems of QPAT and UMOT. We will demonstrate the implementation of the forward-adjoint Monte Carlo solvers described above, along with adaptive sampling strategies to estimate the absorption coefficient of a medium via SGD. Here, we investigate media with a semi-infinite slab geometry, with numerous layers in the z direction having different optical properties, but otherwise homogeneous in the x and y directions. The application to layered geometry in this demonstration was chosen for simplicity to provide an easily recognizable setting to test these adaptive sampling methods. Furthermore, while apparently simplistic, layered geometries are still of practical interest for applications including instrument calibration and validation, and the imaging of biological structures with small curvature but significant heterogeneity in depth. The latter example includes studies such as functional (cognitive) imaging when localized to small activation regions. Application of these new methods in more complicated 3D geometries will be carried out in future work. Each of the medium layers can be described in terms of thickness, scattering coefficient, absorption coefficient, (background) refractive index, and scattering asymmetry parameter. We will assume all parameters of the layered medium are known a priori with the exception of the absorption coefficient, which we will attempt to solve for. For the examples in this study, we set the total slab thickness to 2 cm, and the inversion is conducted with a resolution of 0.25 mm, (80 layers). The true “measured” data in all problems are generated using a single forward model Monte Carlo simulation using a large sample size of 109 photons. With this sample size, the variance of the measured forward data hobs, bobs is found to be negligible in this setup, and as such can be treated as effectively equivalent to the deterministic solution of the RTE.

To conduct an inversion, we stipulate a total photon budget, Nph, for which all combined sample sizes in the descent must not exceed, i.e., n|Sn|Nph. Once the total photon budget is expended, we terminate the descent. This is to emulate an imposed restriction on computational resources required to reach a solution. While each iteration (involving forward and adjoint runs of the Monte Carlo) has a nonzero computational overhead, optimization of these Monte Carlo programs for repeated iteration (such as employed in Ref. 23) allows this overhead to become negligibly small. This means that the required computational resources of the inversion (and therefore required computation time) are proportional to the total number of simulated photons used throughout the descent, i.e., the photon budget Nph. The inversions are carried out using Algorithm 1, along with Algorithms 2 and 3 to compute the gradients for QPAT and UMOT, respectively. In Algorithm 1, we will compute the metrics Vtot2 and V2 and conduct the norm test and inner product test once every 10 iterations to evaluate the quality of our computed gradients (using Nrep=100 independent repeated samples of the gradient), and to update the step size and sample size. It is worthy to note that as this is an investigation of how such methods might perform in best case scenarios, we do not include the photons used to compute these metrics as counting against the total allowed photon budget.

Algorithm 3

Monte Carlo sampled UMOT gradient.

1. Compute LMC1Qqϕq, Φq, and LMC1Qmϕm, Φm, each using |Sn|/4 photons
2. Construct internal adjoint sources Qadj1=ηΦq(bobsb) and Qadj2=ηΦm(bobsb)
3. Compute LMC1*Qadj1ϕ*,1, Φ*,1, and LMC1*Qadj1ϕ*,2, Φ*,2, each using |Sn|/4 photons
4. Use Eq. (37) to compute FSn

There are three different strategies we have employed to control the step size and sample size as the inversion progresses, see Table 1 for a summary. Strategy 1 uses a fixed-step size as described in Eq. (11) for a chosen value of γtot. The sample size is adaptive and attempts to enforce successful outcomes of the norm test (Vtot2γtot2), by increasing the sample size when the norm test is violated. In the event of a violation of this inequality, the fractional increase in the sample size is equivalent to the factor by which the norm test fails, Vtot2/γtot2. Strategy 2 uses an adaptive step size which still satisfies Eq. (11); however, it selects the largest step size possible for this criterion each time the metrics are evaluated. In this strategy, the sample size is also adaptive and attempts to enforce successful outcomes of the inner product test (V2γ2) by increasing the sample size when the inner product test is violated. In the event of a violation of this inequality, the fractional increase in the sample size is equivalent to the factor by which the inner product test fails, V2/γ2. In strategy 3, we attempt to accelerate the descent using a larger adaptive step size with Vtot in the denominator in place of Vtot2. Upon failure of the inner product test, the sample size is increased by fraction V/γ, and differs from strategy 2 to reduce the speed at which the photon budget is depleted. This is an attempt to reduce premature increase of the sample size caused by volatility in the computation of the norm and inner product metrics.

Table 1

Table showing the different inversion strategies used. Strategy 1 has a constant step size, with adaptive sample size. Strategies 2 and 3 both have adaptive step sizes and adaptive sample sizes. It is worthy to note that in accordance with Algorithm 1, the sample size is only increased upon a failure of the relevant test. If the test passes, then |Sn+1|=|Sn|.

StrategyStep size, αnSample size, |Sn+1|=κ(n)|Sn|
11(1+γtot2)L|Sn+1|=Vtot2γtot2|Sn|
21(1+Vtot2)L|Sn+1|=V2γ2|Sn|
31(1+Vtot)L|Sn+1|=Vγ|Sn|

Finally, we introduce an error function for the estimated absorption distribution, μa, as

Eq. (38)

Fμa=12μatrueμa2.
This metric would not be available under normal circumstances (as we do not know the ground truth μatrue), however it is useful to monitor in terms of the underlying performance of each strategy. Furthermore, as we will see, the sampled data cost function FSn is itself heavily dependent on the number of photons (sample size) used in the forward Monte Carlo and is thus not an ideal indicator of proximity to the true solution.

4.1.

QPAT

We begin with our example QPAT problem. The starting sample size in all cases shown is |S1|=200 photons per iteration (100 for each forward run, and 100 for each adjoint run in accordance with Algorithm 2), and the total photon budget for the inversion was set to Nph=2×106 photons. The Lipschitz constant was set at L=2.5, as this displayed stable descent in our test problems using large photon budgets (low-variance case). The initial estimate of the absorption distribution in the medium is μa=0.2  cm1 in all layers. The scattering coefficient of all layers was set to μs=40  cm1, and the scattering asymmetry parameter was set to g=0.9. The ground-truth absorption μatrue is shown in Fig. 3(a), along with the final retrieved absorption distributions obtained via strategies 1, 2, and 3 using the stated values of γtot and γ. Figure 3(b) shows the corresponding measured data and the final forward modeled data for each of the strategies. Figure 4 shows the outcome of each strategy in terms of the sampled data cost function FSn and the absorption error function Fμa. It can be seen that the ranking of these methods in terms of the lowest achieved value of the sampled cost function FSn does not correlate directly to the best outcomes in terms of the error in the estimated absorption Fμa. This is due to the above-mentioned dependence of the sampled data cost function on the sample size used in the forward model, where for example the case of strategy 2 only appears to perform poorly in terms of FSn due to its small sample size used throughout the inversion. This is more clearly shown in Fig. 3(b), where the final forward modeled data from strategy 2 are noisier than the other strategies due to the low sample size at the end of the inversion, where this noise would clearly impact the sampled cost function. It is worthy to note that the relevant step sizes and sample sizes for each of these three examples are shown in Fig. 5. Before finding the best parameter for strategy 1, we trialed a range of values of γtot over the range (0.1, 200). With lower values, the adaptive sample size was required to increase rapidly to maintain high-quality (low variance) sample gradients. This resulted in the photon budget being depleted early, terminating the descent after around 100 iterations, which did not perform well. Too large a value of γ and the norm test never failed, meaning the sample size was never required to increase and the inversion progressed for the maximum 10,000 iterations permitted by the photon budget. However, as strategy 1 has a fixed value of γtot2 in the denominator of the step size, large values also result in step sizes that were too small to perform well. A value of γtot=4 was found to strike a balance between these two extremes and was the best performer using strategy 1. Strategy 2 has an adaptive step size which selects the largest possible step size that still satisfies Eq. (11), instead of selecting a constant step size that accounts for the worst case scenario, as in strategy 1. For this reason, we found that the largest value of γ=20 was the best performer for this strategy, where the photon budget remained at 200 photons for each of the 10,000 iterations. For strategy 3, the best performer was a value of γ=10, where larger values appeared to allow too much variance in the gradient, leading to unstable descents. In all strategies, the recovered absorption distribution matched the ground-truth absorption more closely in the regions of the sample closest to the light source at z=0. This is due to the decay of the fluence as a function of depth, as we can see the QPAT signal is highest at shallow depths in Fig. 3(b). The deeper regions of the sample were the last to approach the ground truth in each of the three strategies.

Fig. 3

QPAT inversion: (a) ground-truth absorption distribution, μatrue, and estimated absorption distribution, μa, at the point where the photon budget is expended, using each of the three strategies with the stated values of γtot or γ. (b) Associated measured data from ground-truth medium and simulated forward data at the end of the inversion using each strategy.

JBO_25_8_085002_f003.png

Fig. 4

QPAT inversion: (a) sampled cost function, FSn, as a function of iteration, n. (b) Error in absorption estimate, Fμa, as a function of iteration, n.

JBO_25_8_085002_f004.png

Fig. 5

QPAT inversion: (a) step sizes, αn, as a function of iteration, n. (b) Adaptive sample size, |Sn|, as a function of iteration.

JBO_25_8_085002_f005.png

From Fig. 6, we see the values of our two metrics V and Vtot. In all cases, both measures of the variance begin at low values, indicating that even with low numbers of photons being simulated, the computed gradients are of reasonable quality, likely due to the poor initial first guess being far from the true solution. Each of the measures of variance increase as the inversion progresses until they begin to violate the norm test or inner product test depending on the strategy. It is seen that the strategy 1 example attempts to keep Vtot4, however, due to some level of variation in the metrics themselves, this condition can be seen to be violated regularly, requiring regular updates to the sample size. For strategy 2, the imposed limit of V20 is never violated, and thus the sample size is never required to increased. We also see that strategy 3 manages to keep V10 for the majority of the descent.

Fig. 6

QPAT inversion: (a) V as a function of iteration and (b) Vtot as a function of iteration.

JBO_25_8_085002_f006.png

In addition to these experiments shown in Figs. 36, we also trialed a number of other conditions including media with isotropic scatterers (i.e., with g=0), various scattering coefficients, and various initial estimates of the absorption. In all cases explored, the methods showed similar behavior as above, but with some differences in the ideal values of γtot and γ for each strategy. The outcomes of a range of these experiments are shown in Table 2 for various problem parameters. Strategy 3 was used in all cases in the table, with the same Lipschitz constant (L=2.5), starting sample size (|S1|=200 photons), photon budget (Nph=2×106 photons), and ground-truth absorption distribution μatrue as used in the above examples. The final attained values of the sampled data cost function FSn and absorption error Fμa are similar in all cases with the exception of the high asymmetry and low scattering case (g=0.9 and μs=4  cm1). In this case, the reduced scattering coefficient is only μs=μs(1g)=0.4  cm1, meaning much lower overall attenuation of the light through the sample. This results in a more uniform data function, h, where the simulated photons probe the domain more uniformly, and allows the problem to converge significantly faster than in the higher attenuating cases demonstrated in Figs. 36. It is also worth noting that in the regime with low scattering and high scattering asymmetry, it is generally problematic for the performance of approximate transport models such as the diffusion approximation, and the results here highlight the flexibility of RTE based approaches, and the efficiency of the proposed adaptive sampling techniques.

Table 2

Final outcomes of QPAT inversions with various medium optical properties and starting values of μa. Values of FSn and Fμa are the final values at the end of each inversion after the stated number of iterations. In each case, strategy 3 was employed, with a starting sample size of |S1|=200 photons per iteration, and a total photon budget of Nph=2×106 photons. Slab thickness is 2 cm in all cases, with the same ground-truth μatrue distribution as shown in Fig. 3(a).

Starting μa (cm−1)
0.010.21.0
Medium propertiesg=0.9μs=40  cm1γ=20, 10,000 iterations FSn=2.35×103Fμa=3.26×105γ=10, 4476 iterations FSn=4.20×104Fμa=7.65×106γ=10, 6533 iterations FSn=3.38×104Fμa=1.01×105
g=0.9μs=4  cm1γ=5, 3819 iterations FSn=5.30×104Fμa=2.3×107γ=5, 2579 iterations FSn=9.31×105Fμa=3.56×107γ=5, 2834 iterations FSn=1.06×104Fμa=2.19×107
g=0μs=4  cm1γ=20, 10,000 iterations FSn=4.01×103Fμa=8.36×105γ=5, 2056 iterations FSn=1.39×104Fμa=3.44×105γ=10, 10,000 iterations FSn=2.92×103Fμa=8.72×105

Finally, interesting behavior was observed when using certain initial guesses of the absorption. An example of this is shown in Fig. 7, where we show the resulting cost functions for a starting estimate of μa=1  cm1 (significantly overestimating the absorption at all depths), and medium properties of g=0.9 and μs=40  cm1. In this case, we see that the descent appears to encounter local minima in the data cost function FSn at various points during the descent, depending on the particular strategy used. However, the algorithm manages to escape these local minima and converge to a better solution. This is seen to be the case for all three strategies shown in Fig. 7.

Fig. 7

QPAT inversion: with initial estimate of μa=1.0  cm1: (a) sampled cost function, FSn, as a function of iteration, n. (b) Error in absorption estimate, Fμa, as a function of iteration, n.

JBO_25_8_085002_f007.png

4.2.

UMOT

Next, we demonstrate similar experiments performed using the UMOT modality described in Sec. 3.2.2 for the transmission geometry. In this setup, we used the same medium slab size as the QPAT example, and the same optical properties apart from the absorption distribution. The starting sample size in all cases shown is |S1|=4000 photons per iteration, 1000 for each forward run (per each of the two sources), and 1000 for each of the two adjoint sources as outlined in Algorithm 3. The total photon budget for the inversion was set to Nph=4×108 photons. The Lipschitz constant was set at L=50, as this displayed stable descent in our test problems using large photon budgets (low-variance case). The initial estimate of the absorption distribution in the medium is μa=0.1  cm1 in all layers. The ground-truth absorption μatrue is shown in Fig. 8(a), along with the final retrieved absorption distributions obtained via strategies 1, 2, and 3 using the stated values of γtot and γ. Figure 8(b) shows the true measured UMOT data, bobs, along with the forward modeled data from the final estimated medium for each strategy. Figure 9 shows the outcome of each strategy in terms of the sampled data cost function, FSn, and the absorption error function, Fμa. The relevant step sizes and sample sizes for each of these three examples are shown in Fig. 10, and the values of the metrics measuring the variance in the sampled gradients are presented in Fig. 11. Similar to the QPAT modality, we found that strategy 3 performed the best in terms of the final achieved value of the error in the absorption estimate Fμa.

Fig. 8

UMOT inversion: (a) ground-truth absorption distribution, μatrue, and recovered absorption distribution μa using each of the three strategies with the stated values of γtot or γ. (b) Associated measured data from ground-truth medium and simulated forward data at the end of the inversion using each strategy.

JBO_25_8_085002_f008.png

Fig. 9

UMOT inversion: (a) sampled cost function, FSn, as a function of iteration, n. (b) Error in absorption estimate, Fμa, as a function of iteration, n.

JBO_25_8_085002_f009.png

Fig. 10

UMOT inversion: (a) step sizes, αn, as a function of iteration, n. (b) Adaptive sample size, |Sn|, as a function of iteration.

JBO_25_8_085002_f010.png

Fig. 11

UMOT inversion: (a) V as a function of iteration and (b) Vtot as a function of iteration.

JBO_25_8_085002_f011.png

In addition to the results shown in Figs. 811, in Table 3 we present a summary of results for a range of different medium optical parameters and starting estimates of the absorption. In all cases, strategy 3 was used, and the starting photon budget was the same as in the previous UMOT examples (|S1|=4000 photons), with a total photon budget of Nph=4×108 photons. For each of the inversions presented in this table, we conducted the inner product test once every 50 iterations, using Nrep=50 repeated evaluations of the gradient. The resulting inversions display similar error in these cases to the above examples where we used Nrep=100 repeated evaluations of the sampled gradient once every 10 iterations to run the inner product test. This demonstrates that the described methods can still be successful when dedicating fewer computational resources to the inner product or norm test metrics, which control the adaptive sample size and step size.

Table 3

Final outcomes of UMOT inversions with various medium optical properties and starting values of μa. Values of FSn and Fμa are the final values at the end of each inversion after the stated number of iterations. In each case, strategy 3 was employed, with a starting sample size of |S1|=4000 photons per iteration, and a total photon budget of Nph=4×108 photons. Slab thickness is 2 cm in all cases, with the same ground-truth μatrue distribution as shown in Fig. 8(a).

Starting μa (cm−1)
0.010.11.0
Medium propertiesg=0.9μs=40  cm1γ=15, 11,412 iterations FSn=1.88×103Fμa=4.23×105γ=10, 3495 iterations FSn=2.50×104Fμa=2.20×105γ=10, 7823 iterations FSn=5.69×104Fμa=9.37×105
g=0.9μs=4  cm1γ=15, 19,017 iterations FSn=4.71×103Fμa=8.19×105γ=15, 15,813 iterations FSn=5.48×103Fμa=2.07×105γ=15, 14,249 iterations FSn=4.08×103Fμa=3.36×105
g=0μs=4  cm1γ=15, 11,594 iterations FSn=1.65×103Fμa=7.13×105γ=15, 7304 iterations FSn=5.87×104Fμa=4.17×105γ=15, 13,981 iterations FSn=1.05×103Fμa=7.98×105

5.

Discussion and Conclusions

The results shown in Sec. 4 demonstrate that the adaptive sampling strategies performed well in both our example problems of QPAT and UMOT. We were able to achieve low error estimates of the medium absorption using a total computational expenditure that was either comparable to or significantly lower than the resources required to simulate a single low-variance run of the forward problem. In each demonstration, the adaptive sampling strategies maintained low photon numbers throughout the early stages of the inversion. Photon numbers were only increased when required to keep the variance in the gradients below the stipulated limits. These adaptive sampling strategies thus enabled significant computational savings compared to a naïve implementation, which might seek to use low-variance (high quality) computations of the gradient at every iteration. For instance, if we were to use a constant step size of 1/L and the same number of photons per iteration as that which was used to generate the measured data (109 photons), then we find we still required hundreds of iterations to reach a similar quality estimate of the absorption as seen in the above problems. This means that the computational requirements of the low-variance approach would be proportional to Nph=1011 photons. Comparing this to Nph=2×106 photons used in the QPAT examples or Nph=4×108 photons used in the UMOT examples, the required computational resources/time to attain our solutions with these adaptive sampling methods is multiple orders of magnitude lower compared to the naïve low-variance approach.

In this work, we have emphasized the similarities between our approach and that of SGD, as employed in the context of machine learning. However, it should be noted that there are significant differences between the two settings. In machine learning, the measured data are assumed to consist of a large number of samples to be fit to a deterministic model to minimize a suitable loss function, and each stochastic gradient is generated by a random subset of these data forming the descent direction of a subfunction. The same method has also been applied in alternative image reconstruction techniques where the data can be more naturally considered as consisting of a large number of random samples from some underlying distribution, for example, in positron emission tomography.47 By contrast, our image reconstruction approach considers the complete measured data on each iteration, with stochasticity arising from the approximation within the forward model: we are effectively subsampling the gradient in terms of the parameter space, rather than data space. This is to say that at each iteration we utilize a subset of some notionally complete model, rather than of the data. The motivation by which each approach is employed is consistent: stochasticity is intentionally introduced to whichever part of the objective function introduces the greatest computational demand. This suggests a third possible approach, where the computational load of the (sub) gradient computation can be lowered through some stochastic division of both the data and the model; this might be relevant in imaging modalities with discrete counting data, such as time-domain and/or dynamic diffuse optical tomography.

Our work suggests a number of interesting future developments:

  • In the examples shown here the “observed” data were effectively “noise free” by virtue of running the forward Monte Carlo on a very large number of photons. Thus an interesting topic for further study will be to evaluate these methods on noisy forward data, wherein the data fitting term should not be iterated to convergence, but where regularization should be introduced either by early stopping (i.e., by setting a minimum threshold for the data error) or by adding an explicit penalty term.

  • Related to the previous point, we further note that our objective function employed a least-squares data fitting term in this study. Depending upon the nature of the noise in the data and that of the stochastic forward model, more suitable metrics may include the Kullback–Leibler discrepancy (for Poisson likelihood) or a generalized measure of the distance between samples of probability distributions (Wasserstein distance48).

  • Our results demonstrate a consistent tendency for the adaptive sampling method to exhibit a geometric increase in the sample size as the descent progresses. This suggests that our adaptive approach could be employed to find a particular set of sampling parameters that perform well in a given regime, including the starting photon budget |S1|, rate of increase of the sample size κ(n), and rate of change of the step size αn. If a suitable set of such parameters could be found, they could help determine a fully prescribed sampling strategy. Once calibrated for a given problem of choice, this would avoid the need to explicitly compute the variance of the sampled gradients during the descent, and lead to even greater efficiency and speed in the inverse problem.

  • Further topics of interest include more advanced methods of variance reduction (e.g., recursive gradients49); adaptive estimates of the Lipschitz constant as described in Ref. 30; alternative optimization strategies such as back-tracking line-search, or primal dual methods;50 the use of preconditioning and/or second-order optimization methods;51 and an in-depth comparison of these nonlinear adaptive models to the alternative approaches such as PMC.27

In summary, we have successfully demonstrated a means by which stochastic forward models, not directly amenable to standard variational methods for optimization, can be employed efficiently in nonlinear image reconstruction. We expect this concept to lead to many new directions of research in optical image reconstruction.

Disclosures

No conflicts of interest, financial or otherwise, are declared by the authors.

Acknowledgments

This work was funded by the Engineering and Physical Sciences Research Council under Grant Nos. EP/N032055/1 and EP/N025946/1. S. Powell further acknowledges the Royal Academy of Engineering fellowship, RF1516/15/33. The authors would like to thank Robert Twyman and Kris Thielemans for helpful discussions on stochastic gradient and data subset methods.

References

1. 

S. R. Arridge and J. Schotland, “Optical tomography: forward and inverse problems,” Inverse Prob., 25 (12), 123010 (2009). https://doi.org/10.1088/0266-5611/25/12/123010 INPEEY 0266-5611 Google Scholar

2. 

L. T. Biegler et al., “Large-scale PDE-constrained optimization: an introduction,” Large-Scale PDE-Constrained Optimization, 3 –13 Springer, Berlin, Heidelberg (2003). Google Scholar

3. 

K. Bredies et al., Control and Optimization with PDE Constraints, Springer, Basel (2013). Google Scholar

4. 

J. C. De los Reyes, Numerical PDE-Constrained Optimization, Springer, Heidelberg (2015). Google Scholar

5. 

C. Zhu and Q. Liu, “Review of Monte Carlo modeling of light transport in tissues,” J. Biomed. Opt., 18 (5), 050902 (2013). https://doi.org/10.1117/1.JBO.18.5.050902 JBOPFO 1083-3668 Google Scholar

6. 

I. Lux and L. Koblinger, Monte Carlo Particle Transport Methods: Neutron and Photon Calculations, CRC Press, Boca Raton (1991). Google Scholar

7. 

A. Liemert and A. Kienle, “Analytical solution of the radiative transfer equation for infinite-space fluence,” Phys. Rev. A, 83 015804 (2011). https://doi.org/10.1103/PhysRevA.83.015804 Google Scholar

8. 

A. Liemert, D. Reitzle and A. Kienle, “Analytical solutions of the radiative transport equation for turbid and fluorescent layered media,” Sci. Rep., 7 3819 (2017). https://doi.org/10.1038/s41598-017-02979-4 SRCEC3 2045-2322 Google Scholar

9. 

W. C. Lo et al., “Hardware acceleration of a Monte Carlo simulation for photodynamic treatment planning,” J. Biomed. Opt., 14 (1), 014019 (2009). https://doi.org/10.1117/1.3080134 JBOPFO 1083-3668 Google Scholar

10. 

N. Ren et al., “GPU-based Monte Carlo simulation for light propagation in complex heterogeneous tissues,” Opt. Express, 18 (7), 6811 –6823 (2010). https://doi.org/10.1364/OE.18.006811 OPEXFF 1094-4087 Google Scholar

11. 

C. K. Hayakawa et al., “Perturbation Monte Carlo methods to solve inverse photon migration problems in heterogeneous tissues,” Opt. Lett., 26 (17), 1335 –1337 (2001). https://doi.org/10.1364/OL.26.001335 OPLEDP 0146-9592 Google Scholar

12. 

R. Graaff et al., “Condensed Monte Carlo simulations for the description of light transport,” Appl. Opt., 32 (4), 426 –434 (1993). https://doi.org/10.1364/AO.32.000426 APOPAI 0003-6935 Google Scholar

13. 

I. T. Lima, A. Kalra and S. S. Sherif, “Improved importance sampling for Monte Carlo simulation of time-domain optical coherence tomography,” Biomed. Opt. Express, 2 (5), 1069 –1081 (2011). https://doi.org/10.1364/BOE.2.001069 BOEICL 2156-7085 Google Scholar

14. 

H. Ammari, Mathematical Modeling in Biomedical Imaging II: Optical, Ultrasound, and Opto-Acoustic Tomographies, Springer, Heidelberg (2011). Google Scholar

15. 

S. R. Arridge and O. Scherzer, “Imaging from coupled physics,” Inverse Prob., 28 (8), 080201 (2012). https://doi.org/10.1088/0266-5611/28/8/080201 INPEEY 0266-5611 Google Scholar

16. 

V. A. Morozov, “On the solution of functional equations by the method of regularization,” Sov. Math. Doklady, 7 414 –417 (1966). 0197-6788 Google Scholar

17. 

R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” in Adv. Neural Inf. Process. Syst., 315 –323 (2013). Google Scholar

18. 

M. Friebel et al., “Determination of optical properties of human blood in the spectral range 250 to 1100 nm using Monte Carlo simulations with hematocrit-dependent effective scattering phase functions,” J. Biomed. Opt., 11 (3), 034021 (2006). https://doi.org/10.1117/1.2203659 JBOPFO 1083-3668 Google Scholar

19. 

I. Yaroslavsky et al., “Inverse hybrid technique for determining the optical properties of turbid media from integrating-sphere measurements,” Appl. Opt., 35 (34), 6797 –6809 (1996). https://doi.org/10.1364/AO.35.006797 APOPAI 0003-6935 Google Scholar

20. 

Q. Fang and D. Boas, “Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units,” Opt. Express, 17 (22), 20178 –20190 (2009). https://doi.org/10.1364/OE.17.020178 OPEXFF 1094-4087 Google Scholar

21. 

Q. Fang, “Mesh-based Monte Carlo method using fast ray-tracing in Plücker coordinates,” Biomed. Opt. Express, 1 165 (2010). https://doi.org/10.1364/BOE.1.000165 BOEICL 2156-7085 Google Scholar

22. 

J. Buchmann et al., “Three-dimensional quantitative photoacoustic tomography using an adjoint radiance Monte Carlo model and gradient descent,” J. Biomed. Opt., 24 (6), 066001 (2019). https://doi.org/10.1117/1.JBO.24.6.066001 JBOPFO 1083-3668 Google Scholar

23. 

R. Hochuli et al., “Quantitative photoacoustic tomography using forward and adjoint Monte Carlo models of radiance,” J. Biomed. Opt., 21 (12), 126004 (2016). https://doi.org/10.1117/1.JBO.21.12.126004 JBOPFO 1083-3668 Google Scholar

24. 

A. A. Leino, A. Pulkkinen and T. Tarvainen, “ValoMC: a Monte Carlo software and MATLAB toolbox for simulating light transport in biological tissue,” OSA Continuum, 2 (3), 957 –972 (2019). https://doi.org/10.1364/OSAC.2.000957 Google Scholar

25. 

U. Tricoli et al., “Reciprocity relation for the vector radiative transport equation and its application to diffuse optical tomography with polarized light,” Opt. Lett., 42 (2), 362 –365 (2017). https://doi.org/10.1364/OL.42.000362 OPLEDP 0146-9592 Google Scholar

26. 

R. Yao, X. Intes and Q. Fang, “A direct approach to compute Jacobians for diffuse optical tomography using perturbation Monte Carlo-based photon ‘replay’,” Biomed. Opt. Express, 9 (10), 4588 –4603 (2018). https://doi.org/10.1364/BOE.9.004588 BOEICL 2156-7085 Google Scholar

27. 

A. Leino et al., “Perturbation Monte Carlo method for quantitative photoacoustic tomography,” IEEE Trans. Med. Imaging, (2020). https://doi.org/10.1109/TMI.2020.2983129 ITMID4 0278-0062 Google Scholar

28. 

L. Bottou, F. E. Curtis and J. Nocedal, “Optimization methods for large-scale machine learning,” SIAM Rev., 60 (2), 223 –311 (2018). https://doi.org/10.1137/16M1080173 SIREAD 0036-1445 Google Scholar

29. 

D. Newton, F. Yousefian and R. Pasupathy, “Stochastic gradient descent: recent trends,” Recent Advances in Optimization and Modeling of Contemporary Problems, 193 –220 INFORMS, Catonsville, Maryland (2018). Google Scholar

30. 

R. Bollapragada, R. Byrd and J. Nocedal, “Adaptive sampling strategies for stochastic optimization,” SIAM J. Optim., 28 (4), 3312 –3343 (2018). https://doi.org/10.1137/17M1154679 SJOPE8 1095-7189 Google Scholar

31. 

H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat., 22 400 –407 (1951). https://doi.org/10.1214/aoms/1177729586 Google Scholar

32. 

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer Science & Business Media, New York (2013). Google Scholar

33. 

S. R. Arridge, “Optical tomography in medical imaging,” Inverse Prob., 15 (2), R41 –R93 (1999). https://doi.org/10.1088/0266-5611/15/2/022 INPEEY 0266-5611 Google Scholar

35. 

L. Wang, S. L. Jacques and L. Zheng, “MCML—Monte Carlo modeling of light transport in multi-layered tissues,” Comput. Methods Programs Biomed., 47 (2), 131 –146 (1995). https://doi.org/10.1016/0169-2607(95)01640-F CMPBEK 0169-2607 Google Scholar

36. 

G. Bal, “Inverse transport theory and applications,” Inverse Prob., 25 (5), 053001 (2009). https://doi.org/10.1088/0266-5611/25/5/053001 INPEEY 0266-5611 Google Scholar

37. 

G. Bal, “Hybrid inverse problems and internal functionals,” 60 325 –368 (2013). Google Scholar

38. 

L. V. Wang, “Multiscale photoacoustic microscopy and computed tomography,” Nat. Photonics, 3 503 –509 (2009). https://doi.org/10.1038/nphoton.2009.157 NPAHBY 1749-4885 Google Scholar

39. 

P. Beard, “Biomedical photoacoustic imaging,” Interface Focus, 1 602 –631 (2011). https://doi.org/10.1098/rsfs.2011.0028 Google Scholar

40. 

L. Nie and X. Chen, “Structural and functional photoacoustic molecular tomography aided by emerging contrast agents,” Chem. Soc. Rev., 43 (20), 7132 –7170 (2014). https://doi.org/10.1039/C4CS00086B CSRVBR 0306-0012 Google Scholar

41. 

B. Cox et al., “Quantitative spectroscopic photoacoustic imaging: a review,” J. Biomed. Opt., 17 (6), 061202 (2012). https://doi.org/10.1117/1.JBO.17.6.061202 JBOPFO 1083-3668 Google Scholar

42. 

T. Saratoon et al., “A gradient-based method for quantitative photoacoustic tomography using the radiative transfer equation,” Inverse Prob., 29 075006 (2013). https://doi.org/10.1088/0266-5611/29/7/075006 INPEEY 0266-5611 Google Scholar

43. 

K. Wang, M. Anastasio, “Photoacoustic and thermoacoustic tomography: image formation principles,” Handbook of Mathematical Methods in Imaging, 781 –815 Springer, New York (2011). Google Scholar

44. 

H. Ammari et al., “A reconstruction algorithm for ultrasound-modulated diffuse optical tomography,” Proc. Am. Math. Soc., 142 (9), 3221 –3236 (2014). https://doi.org/10.1090/S0002-9939-2014-12090-9 PAMYAR 0002-9939 Google Scholar

45. 

F. J. Chung and J. C. Schotland, “Inverse transport and acousto-optic imaging,” SIAM J. Math. Anal., 49 4704 –4721 (2017). https://doi.org/10.1137/16M1104767 SJMAAH 0036-1410 Google Scholar

46. 

S. Powell, S. R. Arridge and T. S. Leung, “Gradient-based quantitative image reconstruction in ultrasound-modulated optical tomography: first harmonic measurement type in a linearised diffusion formulation,” IEEE Trans. Med. Imaging, 35 (2), 456 –467 (2016). https://doi.org/10.1109/TMI.2015.2478742 ITMID4 0278-0062 Google Scholar

47. 

K. Thielemans and S. Arridge, “Adaptive adjustment of the number of subsets during iterative image reconstruction,” in IEEE Nuclear Sci. Symp. and Med. Imaging Conf., 1 –2 (2016). https://doi.org/10.1109/NSSMIC.2015.7582225 Google Scholar

48. 

C. Villani, Optimal Transport: Old and New, 338 Springer, Berlin (2009). Google Scholar

49. 

L. Nguyen et al., “SARAH: a novel method for machine learning problems using stochastic recursive gradient,” in Int. Conf. Mach. Learn., 34 (2017). Google Scholar

50. 

A. Chambolle et al., “Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications,” SIAM J. Optim., 28 (4), 2783 –2808 (2018). https://doi.org/10.1137/17M1134834 SJOPE8 1095-7189 Google Scholar

51. 

P. Moritz, R. Nishihara and M. I. Jordan, “A linearly-convergent stochastic L-BFGS algorithm,” in Proc. 19th Int. Conf. Artif. Intell. and Stat., 249 –258 (2016). Google Scholar

Biographies of the authors are not available.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Callum M. Macdonald, Simon Arridge, and Samuel Powell "Efficient inversion strategies for estimating optical properties with Monte Carlo radiative transport models," Journal of Biomedical Optics 25(8), 085002 (14 August 2020). https://doi.org/10.1117/1.JBO.25.8.085002
Received: 14 April 2020; Accepted: 23 July 2020; Published: 14 August 2020
Lens.org Logo
CITATIONS
Cited by 6 scholarly publications.
Advertisement
Advertisement
KEYWORDS
Monte Carlo methods

Photons

Tin

Absorption

Data modeling

Optical properties

Stochastic processes

Back to Top