*in silico*three-dimensional phantom images for different radiance approximations. The scattering coefficient is assumed to be homogeneous and known

*a priori*.

## 1.

## Introduction

Biomedical photoacoustic (PA) tomography is a hybrid soft-tissue imaging modality that combines the high spatial resolution of ultrasound with the high contrast and specificity of optical imaging techniques.^{1}^{–}^{3} It relies on the generation of acoustic waves inside the tissue, which result from the absorption of intensity-modulated light, such as laser pulses or frequency chirps, by the tissue chromophores. From time-resolved PA signals recorded at multiple measurement points around the object, three-dimensional (3-D) data sets of the initial pressure distribution (i.e., PA images) can be calculated using acoustic reconstruction algorithms. Quantitative PA tomography (QPAT) aims to exploit the wavelength dependence of the image intensity to recover the local concentrations of endogenous tissue chromophores and exogenous contrast agents from which functional parameters, such as blood oxygen saturation, can be derived. To relate the PA image intensity to local chromophore concentrations, computational models of the physical processes during the image generation in conjunction with inversion schemes represent one approach to QPAT.^{4}^{,}^{5} A major challenge in QPAT is the unknown light fluence in the tissue,^{5}^{–}^{7} which is a nonlinear function of the concentrations and the scattering coefficient. Its effects on PA images have been described as spectral coloring and structural distortion.^{5} For an accurate quantification of concentrations and their ratios (e.g., blood oxygenation), the wavelength-dependent fluence distribution has to be accounted for.

Commonly used fluence correction methods include the application of empirical correction factors^{8} or simple models under the assumption of homogeneous optical properties.^{9} This is deemed sufficient to recover the absorption coefficient distribution from which maps of the chromophore concentrations can then be calculated using linear matrix inversions. The main limitation of these methods is the reliance on *a priori* knowledge of the distribution and wavelength dependence of the fluence, i.e., $\mathrm{\varphi}(\overrightarrow{x},\lambda )$. For *in vivo* images, this assumption is often invalid and can lead to significant quantification errors especially at greater depth. Alternative approaches include data-driven methods. In the work reported by Tzoumas et al.^{10} the wavelength-dependent fluence is written on the basis of eigenspectra obtained using a principle component analysis of *in silico* training data, and Kirchner et al.^{11} calculated the fluence maps by applying deep learning and local context encoding to a large number of training data. While these methods have the potential to offer fast inversions, they require large training sets and may thus lack generality.

Model-based inversions, incorporating light transport models to predict the fluence as a function of the spatial distribution of the absorption and scattering coefficients, remain the most promising approach to QPAT. The initial pressure distribution is obtained by multiplying the fluence with the distribution of the absorption coefficient and the Grüneisen parameter, and PA image data sets can be obtained using acoustic propagation models. The difference between the model output and the measured data, i.e., the objective function, is minimized by iteratively updating the absorption and scattering coefficients during the inversion until convergence is reached. To overcome the nonuniqueness that arises from the use of single-wavelength images,^{5} multi-illumination approaches^{12}^{,}^{13} or multiwavelength image acquisition, in combination with *a priori* knowledge of the wavelength dependence of the absorption and scattering coefficients,^{14}^{,}^{15} are employed.

A major challenge in high-resolution 3-D QPAT is the large number of variables ($>{10}^{6}$). While gradient-free methods can be applied to small-scale problems,^{16} inversions of a larger scale (tens of variables and more) quickly become computationally unfeasible. Gradient-based methods have the potential to overcome these limitations. They have been implemented using the adjoint formalism,^{17}^{–}^{19} which is applied using a finite element model of the diffusion approximation (DA) to the inversion of measured 3-D phantom images.^{20} While the DA is valid in the diffusive regime and can be implemented efficiently,^{21}^{,}^{22} high-resolution PAT can cover depths in the ballistic and quasiballistic regimes, where the DA may not be valid.^{23} Methods that aim to solve the radiative transfer equation (RTE) directly are computationally expensive and have only been demonstrated in two-dimensional (2-D) images so far.^{24}^{–}^{27} Monte Carlo (MC) models have recently gained attention^{28}^{–}^{31} due to their highly parallelized architecture and advances in graphics-processing units and have already been applied to 2-D QPAT inversions^{19} and in initial studies with limited parameter space in 3-D.^{16}^{,}^{32} In this paper, a method for inverting multiwavelength 3-D images based on an adjoint formulation of a radiance MC model is demonstrated *in silico*. The challenges that are addressed in this work are (1) the optimization of the objective function using inherently noisy gradients, (2) accounting for the effect of the concentration-dependent Grüneisen parameter, and (3) the representation of the radiance in terms of spherical harmonics. The capability of this approach to recover absolute chromophore concentrations and their ratios, e.g., blood oxygen saturation (blood ${\mathrm{sO}}_{2}$), from high-resolution 3-D image data sets is demonstrated.

## 2.

## Methods

The forward model of the generation of the initial pressure shown in tomographic PA images is described in Sec. 2.1. The adjoint formalism^{26} with which the gradients of the objective function are calculated is described in Sec. 2.2. The approximation of the radiance field as a finite sum of spherical harmonics^{31} within an MC light transport model is described in Secs. 2.3 and 2.4. The numerical phantom and the simulation parameters are outlined in Secs. 2.5 and 2.6, respectively. To reduce the impact of the inherent MC noise on the parameter update and to maximize the convergence speed of the gradient descent, an adaptive moment estimation (Adam) optimization algorithm^{33} is employed (Sec. 2.7).

## 2.1.

### Forward Model

Assuming that the effects of the limited detection aperture and acoustic propagation can be neglected, the image intensity represents the initial pressure distribution, ${p}_{0}$, which is given as

## Eq. (1)

$${p}_{0}(\overrightarrow{r},\lambda )=\mathrm{\Gamma}(\overrightarrow{r})H(\overrightarrow{r}),$$## Eq. (2)

$$H(\overrightarrow{r})={\mu}_{a}(\overrightarrow{r},\lambda )\mathrm{\varphi}(\overrightarrow{r},\lambda ),$$## Eq. (3)

$$\mathrm{\varphi}(\overrightarrow{r})=\int \varphi (\overrightarrow{r},{\widehat{s}}^{\prime})\mathrm{d}{\widehat{s}}^{\prime}.$$The absorption coefficient is related to the chromophore concentrations via the specific absorption coefficient, ${\alpha}_{k}(\lambda )$, i.e.,

## Eq. (4)

$${\mu}_{a}(\lambda ,\overrightarrow{r})=\sum _{k}^{{N}_{k}}{c}_{k}(\overrightarrow{r}){\alpha}_{k}(\lambda ),$$^{15}

^{,}

^{34}

^{,}

^{35}i.e.,

## Eq. (5)

$$\mathrm{\Gamma}(\overrightarrow{r})={\mathrm{\Gamma}}_{\text{water}}[1+\sum _{k}{\beta}_{k}{c}_{k}(\overrightarrow{r})],$$^{5}This involves the launch of photons (typically represented as packets of energy

^{36}) according to a predefined source distribution. Their propagation within the domain is determined by the optical coefficients ${\mu}_{a}(\overrightarrow{r})$ and ${\mu}_{s}(\overrightarrow{r})$, the scattering phase function $\mathrm{\Theta}(\widehat{s},{\widehat{s}}^{\prime},\overrightarrow{r})$, and the refractive index $n$. The deposition of energy when a photon traverses a voxel is determined by the absorption coefficient ${\mu}_{a}$. The angular dependence of scattering events is described by the Henyey–Greenstein phase function.

^{37}

## 2.2.

### Adjoint-Assisted Optimization

Assuming Gaussian noise on the measured data, an estimate of the chromophore distributions is found by minimizing the objective function $\epsilon $, which is given as

## Eq. (6)

$$\epsilon =\sum _{l}^{{N}_{\lambda}}{\int}_{\mathrm{\Omega}}\frac{1}{2}{[{p}_{0}^{m}({\lambda}_{l},\overrightarrow{r})-{p}_{0}({\lambda}_{l},\overrightarrow{r})]}^{2}\mathrm{d}\mathrm{\Omega},$$To find the chromophore concentration maps, ${c}_{k}(\overrightarrow{r})$, the derivative of $\epsilon $ with respect to ${c}_{k}$ at any position ${\overrightarrow{r}}_{i}$ is required, i.e., $\frac{\partial \epsilon}{\partial {c}_{i}}\equiv \frac{\partial \epsilon}{\partial {c}_{k}({\overrightarrow{r}}_{i})}$. For the sake of brevity, only one chromophore will be considered in the remaining description, i.e., $\frac{\partial}{\partial {c}_{i}}\equiv \frac{\partial}{\partial {c}_{1}({\overrightarrow{r}}_{i})}$. The objective function $\epsilon $ represents the sum of the objective functions, ${\epsilon}_{{\lambda}_{l}}$, at each excitation wavelength, which is given as $\epsilon =\sum _{l}^{{N}_{\lambda}}{\epsilon}_{{\lambda}_{l}}$. For simplicity, only one wavelength, ${\lambda}_{l}$, will be considered to describe the derivative:

## Eq. (7)

$$\frac{\partial {\epsilon}_{{\lambda}_{l}}}{\partial {c}_{i}}=-{\int}_{\mathrm{\Omega}}[{p}_{0}^{m}(\overrightarrow{r})-{p}_{0}(\overrightarrow{r})]\frac{\partial {p}_{0}}{\partial {c}_{i}}\mathrm{d}\mathrm{\Omega}.$$Since ${p}_{0}=\mathrm{\Gamma}{\mu}_{a}\mathrm{\varphi}$, after applying the chain rule, $\frac{\partial {p}_{0}}{\partial {c}_{i}}$ becomes

## Eq. (8)

$$\frac{\partial {p}_{0}}{\partial {c}_{i}}=\frac{\partial \mathrm{\Gamma}}{\partial {c}_{i}}{\mu}_{a}\mathrm{\varphi}+\mathrm{\Gamma}\frac{\partial {\mu}_{a}}{\partial {c}_{i}}\mathrm{\varphi}+\mathrm{\Gamma}{\mu}_{a}\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}$$## Eq. (9)

$$={\beta}_{k}\delta (\overrightarrow{r}-{\overrightarrow{r}}_{i})H(\overrightarrow{r})+\mathrm{\Gamma}(\overrightarrow{r}){\alpha}_{k}({\lambda}_{l})\delta (\overrightarrow{r}-{\overrightarrow{r}}_{i})\mathrm{\varphi}(\overrightarrow{r})+\mathrm{\Gamma}(\overrightarrow{r}){\mu}_{a}(\overrightarrow{r})\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}},$$^{17}

^{,}

^{26}Briefly, the adjoint approach defines a source term ${q}^{*}$ for the adjoint radiance ${\varphi}^{*}$

## Eq. (10)

$${q}^{*}(\overrightarrow{r},\lambda )=\mathrm{\Gamma}(\overrightarrow{r}){\mu}_{a}(\overrightarrow{r},\lambda )[{p}_{0}^{m}(\overrightarrow{r},\lambda )-{p}_{0}(\overrightarrow{r},\lambda )],$$## Eq. (11)

$${\int}_{\mathrm{\Omega}}\mathrm{\Gamma}(\overrightarrow{r}){\mu}_{a}(\overrightarrow{r})({p}_{0}^{m}-{p}_{0})\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}\mathrm{d}\mathrm{\Omega}=-{\alpha}_{k}({\lambda}_{l}){[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}{V}_{\mathrm{vox}},$$## Eq. (12)

$$\frac{\partial {\epsilon}_{{\lambda}_{l}}}{\partial {c}_{i}}=-[{p}_{0}^{m}({\overrightarrow{r}}_{i})-{p}_{0}({\overrightarrow{r}}_{i})]{V}_{\mathrm{vox}}[{\beta}_{k}H({\overrightarrow{r}}_{i})+\mathrm{\Gamma}({\overrightarrow{r}}_{i}){\alpha}_{k}\mathrm{\varphi}({\overrightarrow{r}}_{i})]+{\alpha}_{k}{[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}{V}_{\mathrm{vox}}.$$The adjoint model and its derivation are described in general and in more detail in Secs. 6 and 7.

## 2.3.

### Radiance Approximation

As the radiance $\varphi (\overrightarrow{s})$ and the adjoint radiance ${\varphi}^{*}(\overrightarrow{s})$ are functions of solid angle and are defined on the surface of the unit sphere, both quantities can be expressed on the basis of spherical harmonic functions. The expansion of the radiance into spherical harmonics is based on previous work^{31} and is inspired by the ${P}_{n}$ approximations, described in Ref. 22 (similar to a Fourier expansion in 1-D or 2-D) and is outlined in detail in Sec. 8. Using a finite expansion of ${[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}$ in real spherical harmonics, the gradient equation is given as

## Eq. (13)

$$\frac{\partial {\epsilon}_{{\lambda}_{l}}}{\partial {c}_{i}}={V}_{\mathrm{vox}}\{-[{p}_{0}^{m}({\overrightarrow{r}}_{i})-{p}_{0}({\overrightarrow{r}}_{i})][{\beta}_{k}H({\overrightarrow{r}}_{i})+\mathrm{\Gamma}({\overrightarrow{r}}_{i}){\alpha}_{k}\mathrm{\varphi}({\overrightarrow{r}}_{i})]+{\alpha}_{k}\sum _{l=0}^{{N}_{L}}\sum _{m=-l}^{l}{\psi}_{lm}({\overrightarrow{r}}_{i}){\psi}_{lm}^{*}({\overrightarrow{r}}_{i})\},$$The last term of the gradient in Eq. (13) contains an expansion of the radiance and the adjoint radiance in spherical harmonics

## Eq. (14)

$${\alpha}_{k}\sum _{l=0}^{{N}_{L}}\sum _{m=-l}^{l}{\psi}_{lm}({\overrightarrow{r}}_{i}){\psi}_{lm}^{*}({\overrightarrow{r}}_{i}).$$*a priori*up to which value of ${N}_{L}$ the corresponding coefficients needs to be stored to approximate the radiance with sufficient accuracy. This has been investigated by evaluating the inversion scheme for three different configurations: (1) ${N}_{L}=0$, i.e., using only the fluence, (2) for ${N}_{L}=4$, i.e., the most accurate representation of the radiance, and (3) omitting the radiance, i.e., ${\psi}_{lm}={\psi}_{lm}^{*}=0$ for all $l$, $m$, thus neglecting the gradient term provided by the adjoint radiance. All other parameters remain the same during the inversions.

## 2.4.

### Radiance Monte Carlo Simulations

Most MC simulators provide only the light fluence, which is the radiance integrated over all directions and time. To satisfy Eq. (12), a radiance MC algorithm (RMC)^{19}^{,}^{31} is used. To obtain the radiance $\varphi (\overrightarrow{r})$, the directional information of the photon traversing a voxel at position $\overrightarrow{r}$ is stored by depositing the photon weight into the relevant spherical harmonics coefficients ${\psi}_{lm}(\overrightarrow{r})$. The RMC code is implemented in the Julia programming language,^{38} which controls and dispatches the execution of kernels on both the CPU (written in Julia), and the GPU (written in NVIDIA’s CUDA language).^{31} Because the definition of the adjoint model does not change the dynamics of photon propagation, the same RMC simulation code provides the adjoint radiance ${\varphi}^{*}(\overrightarrow{r})$.

## 2.5.

### In Silico Phantom

An MC model has been used to calculate the multiwavelength 3-D PA images that represented measured data and are referred to as reference images or reference data throughout this paper. The domain of the model is divided into subvolumes (SVs) that represented simplified anatomical structures, such as a subcutaneous tumor and a number of discrete blood vessels. The depths of the structures are similar to those observed in *in vivo* images acquired using a Fabry–Perot scanner with a planar detection geometry.^{39}^{–}^{42} Absorption is assumed to originate from three chromophores, i.e., oxyhemoglobin ${(\mathrm{HbO}}_{2})$, deoxyhemoglobin (HHb), and methylene blue (Mb), as an exogenous contrast agent. It should be noted that the method can potentially be applied to any number of chromophores. The computational model of the phantom consists of nine different SVs, each with homogeneous optical properties (Fig. 1). This includes six tube-like structures to mimic blood vessels, a tumor consisting of an ellipsoidal rim and core SV, and the background. The tubes are positioned adjacent to the tumor at depths of 1.5 and 7.5 mm, have a circular cross section with a radius of 0.4 mm, and are filled with HHb and ${\mathrm{HbO}}_{2}$. The blood oxygen saturation (${\mathrm{sO}}_{2}$) is defined as the ratio of the concentration of oxyhemoglobin and the total hemoglobin concentration

## Eq. (15)

$${\mathrm{sO}}_{2}=\frac{{c}_{{\mathrm{HbO}}_{2}}}{{c}_{{\mathrm{HbO}}_{2}}+{c}_{\mathrm{HHb}}}.$$The tube-like structures are assumed to contain blood ${\mathrm{sO}}_{2}$ ranging from 75% to 98% to represent the typical values found in veins and arteries.^{43} The total hemoglobin concentration is 2.3 mM.^{44} Two concentric ellipsoids represent the tumor at a depth ranging from 3.0 mm to 6.0 mm. The outer (inner) ellipsoid’s axes are $a=4.5(2.5)\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$, $b=3.5(2.0)\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$, and $c=2.0(0.8)\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$, respectively. The tumor SVs contained 20% blood (0.46 mM).^{45} The tumor shell has an ${\mathrm{sO}}_{2}$ of 80%, whereas that of the core is 40% to mimic necrotic tissue. The tumor also contains an exogenous contrast agent, Mb, at a concentration of $10\text{\hspace{0.17em}\hspace{0.17em}}\mu \mathrm{M}$. The background SV contains a blood volume fraction of 1.5% with an ${\mathrm{sO}}_{2}$ of 60%. Other parameters, such as the scattering anisotropy ($g=0.9$), the refractive index (${n}_{i}=1.33$ inside the domain, ${n}_{e}=1.5$ outside of the domain), and ${\mu}_{s}(\overrightarrow{r},\lambda )$ are held constant and uniform across the domain. The absorption spectra of HHb, ${\mathrm{HbO}}_{2}$, and Mb are shown in Fig. 2. The wavelength dependence of the reduced scattering coefficient ${\mu}_{s}^{\prime}(\lambda )={\mu}_{s}(\lambda )(1-g)$ is approximated using ${\mu}_{s}(\lambda )=6.65\xb7{10}^{3}\xb7{\lambda}_{+}^{-1.317}\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{-1}$ with ${\lambda}_{+}=\lambda /\mathrm{nm}$, which resulted in a ${\mu}_{s}^{\prime}$ of $\sim 1\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{-1}$ at $\lambda =800\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{nm}$. The Grüneisen parameter of water is set to 0.124. The coefficient ${\beta}_{k}$ describing the total hemoglobin concentration dependence of the Grüneisen parameter is set to ${\beta}_{\mathrm{Hb},\mathrm{HbO}}=0.02146\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{L}/\mathrm{mmol}$.^{35} It is assumed that the Mb concentration distribution does not change the Grüneisen parameter (${\beta}_{\mathrm{Mb}}=0$). The remaining parts of the volume are assumed to be filled with materials, such as water and lipids, whose absorption is considered negligible at the selected excitation wavelengths.

## 2.6.

### Monte Carlo Simulation Parameters

To obtain reference images, i.e., data sets that represent measured multiwavelength PA images, the domain was discretized into $200\times 200\times 100$ (i.e., $4\times {10}^{6}$) isotropic voxels of size ${V}_{\mathrm{vox}}={10}^{-3}\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{3}$ yielding a total volume of $20\times 20\times 10\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{3}$. The source profile was a 2-D Gaussian function with a width of $\sigma =4\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$. About $2\xb7{10}^{9}\text{\hspace{0.17em}\hspace{0.17em}}\text{photon}$ packets were used in the MC simulation of the light fluence. The angle-dependent radiance was not calculated. Reference image data sets were calculated for three different excitation wavelengths that coincided with the absorption peaks of Mb (664 nm), HHb (758 nm), and the isosbestic point of hemoglobin absorption (798 nm). Gaussian noise ($\sigma =0.1\%$ of the maximum image intensity) was added to the reference data, resulting in negative image intensities in regions of low ${p}_{0}$.

The domain discretization used during the inversion was identical to that used for the reference data set, which may raise the question of whether this constitutes a so-called inverse crime. In MC models, the discretization is used merely as a basis for sampling physical quantities while photon packets can propagate freely in continuous space. This is in contrast to other methods, such as finite elements, where the discretization has a direct impact on the accuracy of the solution. Taking also into account the stochastic nature of MC models, it can be concluded that using identical discretizations does not constitute an inverse crime.

During an inversion, $1\times {10}^{7}\text{\hspace{0.17em}\hspace{0.17em}}\text{photon}$ packets were used for the calculation of the radiance and fluence; $5\times {10}^{6}\text{\hspace{0.17em}\hspace{0.17em}}\text{photons}$ were used for the calculation of the corresponding adjoint quantities. The typical running time for one inversion iteration, including the adjoint model with ${N}_{L}=4$, was 84 s on a high-end consumer GPU (NVIDIA GeForce Titan X Pascal). This was reduced to 17 s when the radiance term in Eq. (13) was neglected, i.e., ${N}_{L}=0$ and no adjoint MC runs. Since $k=3$ independent chromophore concentrations were associated with each voxel, the model contained a total of 12 million variables.

## 2.7.

### Gradient-Based Optimization

The gradient-based optimization was initialized assuming a homogeneous ${c}_{\mathrm{HHb}}={c}_{\mathrm{HbO}}=0.023\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mM}$, i.e., ${\mathrm{sO}}_{2}=50\%$, and ${c}_{\mathrm{Mb}}=0.0\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mM}$. Owing to the stochastic nature of MC models, the gradients [Eq. (13)] were subjected to noise. To compensate for this, the Adam optimization algorithm^{33} was employed. It was developed for the first-order optimization of noisy objective functions in high-dimensional parameter spaces and can be seen as an extension of the momentum algorithm.^{47}^{,}^{48} The Adam algorithm calculated an exponential moving average of the gradient (first moment) and the squared gradient (second moment) using the decay rates ${\beta}_{1,\text{Adam}}$ and ${\beta}_{2,\text{Adam}}$ with an additional bias-correction step. The final update was calculated by multiplying the step-size parameter ${\gamma}_{\text{Adam}}$ with the first-order moment divided by the square root of the second-order moment. The detailed description can be found in Ref. 33. This algorithm was found to dramatically increase the convergence speed, compared to standard gradient descent. Its efficiency depended on the values of a set of parameters, including the step-size ${\gamma}_{\text{Adam}}$, the decay rates ${\beta}_{1,\text{Adam}}$ and ${\beta}_{2,\text{Adam}}$, and an additional ${\epsilon}_{\text{Adam}}$, which avoided division by zero. The decay rates and $\epsilon $ were set to recommended default values (${\beta}_{1,\text{Adam}}=0.9$, ${\beta}_{2,\text{Adam}}=0.999$, and ${\epsilon}_{\text{Adam}}={10}^{-8}$), and the step size was assigned different values depending on the chromophore according to Eqs. (16) and (17).

To ensure fast convergence for all chromophores, the step size for Mb is set to be significantly smaller than that of $\mathrm{HHb}/{\mathrm{HbO}}_{2}$ since Mb concentrations are in the range of $\mu \mathrm{M}$ while those of $\mathrm{HHb}/{\mathrm{HbO}}_{2}$ are in the range of mM. The chromophore-dependent step size is calculated as follows:

## Eq. (16)

$${\gamma}_{\text{chrom}}={\gamma}_{\mathrm{ref}}\frac{\sum _{\lambda}{\alpha}_{\text{chrom},\lambda}\xb7{c}_{\mathrm{max},\mathrm{ref}}}{\sum _{\lambda}{\alpha}_{\mathrm{ref},\lambda}\xb7{c}_{\mathrm{max},\text{chrom}}},$$*ad hoc*to ${\gamma}_{\mathrm{ref}}={\gamma}_{{\mathrm{HbO}}_{2}}=100$; ${\alpha}_{\text{chrom}/\mathrm{ref},\lambda}$ is the specific absorption coefficient of the respective chromophore at wavelength $\lambda $; and ${c}_{\mathrm{max},\mathrm{ref}/\text{chrom}}$ is the anticipated maximum concentration of the respective chromophore, which is set to physiological reasonable values (${c}_{\mathrm{max},\mathrm{HbO}}={c}_{\mathrm{max},\mathrm{HHb}}=2.3\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mM}$, ${c}_{\mathrm{max},\mathrm{Mb}}=30\text{\hspace{0.17em}\hspace{0.17em}}\mu \mathrm{M}$).

The gradient is also expressed as a function of fluence in Eq. (13), either directly or via $H$ and $\mathrm{\Delta}p$. This would result in slow convergence in regions of low fluence. To compensate for this, a spatially dependent step size is used, which increases the step size by normalizing it by the mean fluence over all wavelengths:

## Eq. (17)

$$\gamma {(\overrightarrow{r})}_{\text{chrom},\text{scaled}}=\frac{{\gamma}_{\text{chrom}}}{{\tilde{\mathrm{\varphi}}}_{\text{norm}}(\overrightarrow{r})+{\epsilon}_{\mathrm{\varphi}}}\mathrm{.}$$## Eq. (18)

$${\tilde{\mathrm{\varphi}}}_{\text{norm}}(\overrightarrow{r})=\sum _{\lambda}\mathrm{\varphi}(\overrightarrow{r},\lambda )/\sum _{\lambda}\mathrm{\varphi}({\overrightarrow{r}}_{\mathrm{max}},\lambda ),$$A single iteration of the gradient-based update consisted of the execution of the MC model and its adjoint counterpart for all three wavelengths. The inversion was run for 1500 iterations to investigate the convergence. After each iteration, the updated concentrations for HHb, ${\mathrm{HbO}}_{2}$, and Mb were limited to a reasonable range of values (also known as projected gradient descent) to avoid spurious overshooting or undershooting that could lead to physiologically unrealistic concentrations. To compensate for the effects of noise in low-absorbing regions (i.e., negative ${p}_{0}$ in reference images), negative chromophore concentrations, and hence negative ${\mu}_{a}$ values, were allowed during the gradient descent. To ensure stability, the range of negative concentration values was limited to a tenth of the maximum positive concentrations ($-0.35$ to 3.5 mM for ${c}_{\mathrm{HHb}}$ and ${c}_{{\mathrm{HbO}}_{2}}$, $-0.02$ to 0.2 mM for ${c}_{\mathrm{Mb}}$). The 3-D blood ${\mathrm{sO}}_{2}$ maps were calculated from the recovered ${c}_{{\mathrm{HbO}}_{2}}$ and ${c}_{\mathrm{HHb}}$ images. The scattering distribution was assumed to be known *a priori*.

To verify that the inversion scheme is valid over a range of physiologically plausible parameter values, multiwavelength reference images were calculated and inverted for different ${\mathrm{sO}}_{2}$ values in the inner tumor region and the background. Two scenarios were chosen, each comprising five different combination of ${\mathrm{sO}}_{2}$ values. First, as the tumor core (being completely enclosed by the rim) may be assumed to be most strongly affected by spectral coloring, its ${\mathrm{sO}}_{2}$ value was varied from 10% to 90% in 20% increments (SV 6), whereas all other parameters remained fixed. Second, the ${\mathrm{sO}}_{2}$ value of the background (SV ID 1) was varied from 10% to 90% in 20% increments.

## 3.

## Results

The accuracy of the recovered chromophore concentration and ${\mathrm{sO}}_{2}$ maps are reported in Secs. 3.1 and 3.2. In Sec. 3.3, the results obtained using multiple inversions of image data sets, in which the chromophore concentrations and their ratios are varied, are reported.

## 3.1.

### Absolute Concentrations

Examples of 3-D volume-rendered images of the absolute concentrations of ${\mathrm{HbO}}_{2}$, HHb, and Mb recovered after 1500 iterations are shown in Figs. 3(a)–3(c). The color scales are thresholded to render the background with its comparatively low chromophore concentrations transparent. To reduce the effect of the added Gaussian noise on the rendered images (particularly in regions of low fluence), a 3-D median filter is applied (non-iterative, edge- and face-connected). The error functional as a function of the number of iterations is shown in Fig. 3(d). The value of the error functional cannot reach zero due to the inherent MC noise of the forward model used during the inversion. The dashed orange line indicates the minimum value of the error functional that is reached with $1\times {10}^{7}\text{\hspace{0.17em}\hspace{0.17em}}\text{photons}$, and the green dotted line indicates the total noise level consisting of MC noise and Gaussian noise added to the reference data. The MC noise is obtained from forward calculations with prior knowledge of the correct chromophore distributions.

In Fig. 4, cross-sectional $xz$-images of the true and recovered concentration of the three chromophores and the absolute error at $y=10\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ (center plane) are shown. Excellent agreement is found in regions of high fluence [e.g., corresponding to white, light blue, and light red pixels in Figs. 4(c), 4(f), and 4(i)]. By contrast, significant quantification errors [corresponding to yellow and green pixels in Figs. 4(c), 4(f), and 4(i)] are observed in regions of low fluence. The recovered ${c}_{\mathrm{Mb}}$ appears to exhibit larger quantification errors in the background compared to ${c}_{{\mathrm{HbO}}_{2}}$ and ${c}_{\mathrm{HHb}}$.

Table 1 contains the true and recovered concentrations averaged over all voxels of each SV defined in Sec. 2.5. The brackets indicate the standard deviation (concise notation). The recovered ${c}_{\mathrm{HHb}}$ and ${c}_{{\mathrm{HbO}}_{2}}$ in SV 2 to 9 are in excellent agreement with the true values. While the recovered ${c}_{\mathrm{Mb}}$ are generally in good agreement with the true values, SV 2 to 4 and SV 7 to 9 exhibit negative and small concentrations with large standard deviations. The background (SV 1) also exhibits large errors and standard deviations due to the low signal-to-noise ratio (SNR). In the background region closer to the light source [$140\times 140\times 70$ central voxels inside the $200\times 200\times 100$ image domain, illustrated in Fig. 4(a)] where the SNR is greater, the recovered concentrations are in good agreement with the true values (SV 1* in Table 1).

## Table 1

True and recovered (inversion) absolute chromophore concentrations and blood sO2 in the SVs of the phantom. The concentrations represent the average value over all voxels within each SV; the values in brackets indicate the standard deviations (concise notation). SV 1—background, SV 1*—background close to the source, SV 5 and SV 6—outer and inner tumor SVs, respectively, SV 2 to 4 and SV 7 to 9—blood-filled tubes.

HHb (mM) | HbO2 (mM) | Mb (μM) | sO2 (%) | |||||
---|---|---|---|---|---|---|---|---|

SV | True | Inversion | True | Inversion | True | Inversion | True | Inversion |

1 | 0.0138 | 0.0087(1169) | 0.0207 | 0.0320(212) | 0 | 0.2(58) | 60 | 78.3 |

1^{*} | 0.0138 | 0.0138(219) | 0.0207 | 0.0205(384) | 0 | $-0.002(922)$ | 60 | 59.8 |

2 | 0.506 | 0.503(70) | 1.79 | 1.78(11) | 0 | $-0.08(266)$ | 78 | 77.9 |

3 | 0.115 | 0.117(63) | 2.18 | 2.16(11) | 0 | $-0.04(244)$ | 95 | 94.9 |

4 | 0.046 | 0.047(31) | 2.25 | 2.25(6) | 0 | $-0.05(106)$ | 98 | 97.9 |

5 | 0.092 | 0.092(18) | 0.368 | 0.367(31) | 10 | 9.76(99) | 80 | 79.9 |

6 | 0.276 | 0.275(18) | 0.184 | 0.183(29) | 10 | 9.51(100) | 40 | 39.8 |

7 | 0.575 | 0.560(128) | 1.73 | 1.71(20) | 0 | $-0.35(510)$ | 75 | 75.3 |

8 | 0.230 | 0.229(90) | 2.07 | 2.04(14) | 0 | $-0.41(374)$ | 90 | 89.9 |

9 | 0.460 | 0.450(125) | 1.84 | 1.82(20) | 0 | $-0.37(509)$ | 80 | 80.0 |

As described in Sec. 2.4, the inversion is implemented using an approximation of the radiance based on spherical harmonics of varying degree ${N}_{L}$, including the omission of the adjoint term. The inversions are found to converge to the same final values while propagating along different routes. Convergence is reached irrespective of the radiance approximation (see Fig. 5).

## 3.2.

### Blood Oxygen Saturation

Figure 6 shows the cross-sectional $xz$-images ($y=10\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$) of the known and recovered blood ${\mathrm{sO}}_{2}$ together with the absolute error. The accuracy is clearly affected by noise as shown in the difference image in Fig. 6(c). While most voxels in SV 2 to 9 exhibit an error within $\pm 5\%$ ${\mathrm{sO}}_{2}$, it is noticeably larger for objects at greater depth. Concentrations in the background SV 1 are also affected by noise, particularly in regions further away from the source. However, the accuracy improves near the source where SNR is increased. The average values of blood ${\mathrm{sO}}_{2}$ for each SV are also calculated and are summarized in Table 1. Blood ${\mathrm{sO}}_{2}$ in SV 2 to 9, i.e., corresponding to blood-filled tubes and the tumor SVs, are found to lie within 0.3% of the true values (i.e., while ${\mathrm{sO}}_{2}$ errors of individual voxels can be quite large due to noise, averaging over lots of voxels greatly improves accuracy). The average blood ${\mathrm{sO}}_{2}$ for SV 1 (i.e., the entire background SV) is found to show significant errors (18.3% ${\mathrm{sO}}_{2}$) and is attributed mainly to the adverse effects of noise in low fluence regions. By contrast, the inversion results are accurate for the reduced background SV 1* due to higher SNR.

## 3.3.

### Validation over a Range of Blood Oxygen Saturation

The inversion scheme is validated on image data sets where ${\mathrm{sO}}_{2}$ is varied over a range of physiologically plausible parameter values (Sec. 2.7). The inversions are computed without including the gradient term of the radiance and ${N}_{L}=0$ in order to minimize the computation time. To obtain the final ${\mathrm{sO}}_{2}$ value for each image data set, the inversion is run for 1500 iterations after which the average ${\mathrm{sO}}_{2}$ is obtained from the SVs. Figure 7(a) shows the true and recovered ${\mathrm{sO}}_{2}$ values for all SVs and all image data sets together with the line of unity (dashed line) and a $\pm 5\%$ error interval (dotted lines). All recovered ${\mathrm{sO}}_{2}$ values are in good agreement with the known values and exhibit an average error below 0.3% ${\mathrm{sO}}_{2}$. Figure 7(b) shows the difference between true and recovered ${\mathrm{sO}}_{2}$ for all SVs and image data sets sorted by SV. Only the results corresponding to the reduced background SV 1* are shown as this region exhibits sufficient SNR.

## 4.

## Discussion

The 3-D maps of absolute concentrations of ${\mathrm{HbO}}_{2}$, HHb and Mb, and the resulting blood ${\mathrm{sO}}_{2}$ recovered using a gradient-based MC inversion scheme showed excellent agreement with the true values. To achieve the best possible match of noise-affected PA images and the model, the inversion scheme was implemented without a non-negativity constraint for the chromophore concentrations. Even though negative concentrations were physiologically implausible, it was found that incorporating a non-negativity constraint greatly affected the recovered average concentrations and blood ${\mathrm{sO}}_{2}$ values. However, negative concentrations can lead to negative absorption coefficients. In the MC model, it led to the photon packets gaining weight as they traversed a voxel with negative absorption. If too many voxels exhibited negative ${\mu}_{a}$, unstable inversions can be observed as the photon weight diverges. While this was occasionally observed in this study, it was found that a reduction of the step size and an increase in the number of iterations remedied this problem. The inversion scheme described in this paper included an expression of the radiance and its adjoint in a basis of spherical harmonics. The influence of the adjoint formalism and the spherical harmonics approximation on the accuracy and convergence of the inversion was evaluated under the assumption that the scattering coefficient was known *a priori*. It was found that neither accuracy nor convergence speed were affected by the radiance term and its adjoint, i.e., the last term in Eq. (13). This was also observed when the radiance term was omitted (Fig. 5) and was confirmed by the relative magnitudes of the individual gradient terms. The radiance term (irrespective of the spherical harmonic approximations) was always significantly smaller than the remaining terms of Eq. (13). Omitting the computation of the adjoint radiance resulted in a major increase in computational speed. However, from the limited investigation presented here, it can only be concluded that the adjoint term may be neglected if the scattering coefficient was known. If the recovery of the scattering coefficient was of interest, a radiance approximation to a minimum of ${N}_{L}=1\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$ may be necessary.^{31}

The gradient-based inversion was found to benefit greatly from optimization algorithms in which parameters, such as the step sizes and the exponential decay rates of the Adam algorithm, were predefined to enable a fast and accurate convergence. A potential drawback of such methods was the need to test several sets of these parameters prior to an inversion to assess whether they have a positive impact on the convergence speed. Within the scope of this study, only minor and *ad hoc* parameter tuning was conducted. A more thorough investigation, including the development of automated parameter selection algorithms, may yield significantly faster convergence.

The chromophore-dependent step size and the fluence-dependent spatial step size scaling [Eqs. (16) and (17)] proved to be vital for achieving convergence. Without chromophore-dependent step sizes, Mb concentrations diverged to the upper and lower fit limits. Similarly, the fluence-dependent spatial step size scaling was crucial for achieving fast convergence in the regions of low fluence.

The selection of the excitation wavelengths could also be optimized further to improve inversion accuracy and convergence speed.^{49} However, such a study would exceed the scope of this paper. Despite potentially suboptimal excitation wavelengths, the inversion is shown to recover blood ${\mathrm{sO}}_{2}$ over a wide range [Fig. 7(a)] with high accuracy ($<1\%$ error in ${\mathrm{sO}}_{2}$) across the domain.

Gradient-based methods do not guarantee convergence to a global minimum, especially when the inversion is adversely affected by a noisy gradient. While the Adam optimization algorithm (compared to, for example, standard or momentum gradient descent) has been shown to greatly reduce the likelihood of finishing the inversion in a local minimum or on a saddle point, such a result cannot be ruled out entirely. It should also be noted that the application of this method to measured PA images does not require their segmentation into subregions. While this makes the method generally valid, some form of image segmentation may still be advantageous as it would reduce the number of variables and the risk of convergence to a local minimum and increase convergence speed. Moreover, explicit regularization of the objective function could further improve the accuracy and speed of the convergence.

While the general methodology of a gradient-based inversion using an adjoint formulation of an MC model has been demonstrated *in silico*, the application of this approach to experimental 3-D PA images, especially those acquired *in vivo*, requires further investigation. One of the perhaps most critical points is the recovery of the scattering coefficient as it is likely to have an impact on the importance of the radiance approximation using spherical harmonics. Other issues, such as the choice of inversion parameters and the selection of optimal excitation wavelengths, are also important in the translation of QPAT methods toward applications in the medical and life sciences.

## 5.

## Conclusions

An inversion scheme for recovering absolute chromophore concentrations and their ratios, such as blood ${\mathrm{sO}}_{2}$ from 3-D multiwavelength PA images was developed and validated *in silico*. The scheme was based on an adjoint formulation of an MC light-transport model and allowed an approximation of the radiance using spherical harmonics. It was found that the adjoint radiance was not required to obtain accurate inversion results, provided the scattering coefficient was constant. The speed of convergence was increased by incorporating the Adam optimization algorithm, chromophore-dependent step sizes, and fluence-dependent step-size scaling. This work represented an important step in the development of robust and generally applicable methods for quantitative functional and molecular PA imaging.

## 6.

## Appendix A: Definition of the Adjoint Model

The idea of the adjoint formalism is to define nonphysical quantities, an adjoint source ${q}^{*}(\overrightarrow{r},\lambda )$, adjoint radiance ${\varphi}^{*}(\overrightarrow{r},\overrightarrow{s},\lambda )$, and an adjoint fluence ${\mathrm{\varphi}}^{*}(\overrightarrow{r},\lambda )={\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{r},\overrightarrow{s},\lambda )\mathrm{d}\overrightarrow{s}$ that help in replacing the integral term containing the unknown $\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}$ in the definition of the gradient. In our case the gradient equation is

## Eq. (19)

$$\frac{\partial {\epsilon}_{{\lambda}_{l}}}{\partial {c}_{i}}=-[{p}_{0}^{m}({\overrightarrow{r}}_{i})-{p}_{0}({\overrightarrow{r}}_{i})]{V}_{\mathrm{vox}}[{\beta}_{k}H({\overrightarrow{r}}_{i})+\mathrm{\Gamma}({\overrightarrow{r}}_{i}){\alpha}_{k}\mathrm{\varphi}({\overrightarrow{r}}_{i})]+{\int}_{\mathrm{\Omega}}({p}_{0}^{m}-{p}_{0})\mathrm{\Gamma}(\overrightarrow{r}){\mu}_{a}(\overrightarrow{r})\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}\mathrm{d}\mathrm{\Omega}.$$The adjoint approach has been used in the context of PAT earlier, see e.g., Refs. 1718.–19, 26, and 27.

As a first step, the adjoint source term is defined as

## Eq. (20)

$${q}^{*}(\overrightarrow{r},\lambda )=[{p}_{0}^{m}(\overrightarrow{r},\lambda )-{p}_{0}(\overrightarrow{r},\lambda )]\mathrm{\Gamma}(\overrightarrow{r}){\mu}_{a}(\overrightarrow{r},\lambda ).$$Since the approach is targeted for multispectral QPAT, each wavelength requires its own definition of an adjoint source based on the difference between modeled ${p}_{0}(\overrightarrow{r},\lambda )$ and measured data ${p}_{0}^{m}(\overrightarrow{r},\lambda )$. We only denote one wavelength here and omit the dependence on $\lambda $ for the sake of brevity.

The adjoint source is usually defined as the “prefactor” of the unknown term that is to be replaced and contains the error between modeled and measured data. By defining the behavior of the adjoint radiance ${\varphi}^{*}$ and adjoint fluence ${\mathrm{\varphi}}^{*}$, a relationship between these and the desired $\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}$ can be established, which is outlined in the following derivation. One important aspect of the definition of the adjoint quantity is to leave the equations underlying the development similar to the ones of their physical counterpart, which is in this context the time-independent RTE.

The RTE is given as

## Eq. (21)

$$[\overrightarrow{s}\xb7\nabla +{\mu}_{a}(\overrightarrow{r})+{\mu}_{s}(\overrightarrow{r})]\varphi (\overrightarrow{r},\overrightarrow{s})-{\mu}_{s}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime})\varphi (\overrightarrow{r},{\overrightarrow{s}}^{\prime})\mathrm{d}{\overrightarrow{s}}^{\prime}=q(\overrightarrow{r},\overrightarrow{s}).$$Similarly, we define the adjoint RTE (ARTE) as

## Eq. (22)

$$[-\overrightarrow{s}\xb7\nabla +{\mu}_{a}(\overrightarrow{r})+{\mu}_{s}(\overrightarrow{r})]{\varphi}^{*}(\overrightarrow{r},\overrightarrow{s})-{\mu}_{s}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime}){\varphi}^{*}(\overrightarrow{r},{\overrightarrow{s}}^{\prime})\mathrm{d}{\overrightarrow{s}}^{\prime}={q}^{*}(\overrightarrow{r},\overrightarrow{s}).$$One advantage of defining the adjoint radiance in that way is that the propagation dynamics are identical to that of the normal radiance defined by the RTE since the left-hand side of both the RTE and the ARTE are practically identical, the only difference being the negative sign in the ARTE indicating a change of direction in light propagation. One way to interpret the negative sign is to follow the propagation of photons in the opposite direction, which does not affect the photon’s movement and the final results in terms of energy deposit. Thus, the mechanisms for absorption and scattering remain the same as in the RTE. Because the light transport is dominated by scattering and absorption and not whether photons move in the forward or backward direction, light propagation can be seen as reciprocal and hence the numerical framework implementing the ARTE is unaffected by the additional negative sign. Thus, the same simulation code as for the RTE can be used for the ARTE. Only the difference in the source distributions needs to be taken into account, with the adjoint source being 3-D, whereas normal source distributions are usually 2-D.

It is important to note that the adjoint fluence and adjoint radiance have different physical units as their physical counterparts. The adjoint radiance’s unit is ${\mathrm{J}/(\mathrm{m}}^{3}\text{\hspace{0.17em}}\mathrm{sr})$ and the adjoint fluence’s unit is ${\mathrm{J}/(\mathrm{m}}^{3})$.

## 7.

## Appendix B: Adjoint-Assisted Derivation of the Gradient

Using the above definition of the adjoint source in Eq. (20), the substitution of the unknown term $\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}$

## Eq. (23)

$${\int}_{\mathrm{\Omega}}({p}_{0}^{m}-{p}_{0})\mathrm{\Gamma}(\overrightarrow{r}){\mu}_{a}(\overrightarrow{r})\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}\mathrm{d}\mathrm{\Omega}=-{\alpha}_{k}{[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}{V}_{\mathrm{vox}},$$The derivation follows the ideas presented in previous works, in particular Ref. 27. In Ref. 27, $\frac{\partial \mathrm{RTE}}{\partial {c}_{i}}$ is combined with the ARTE

## Eq. (24)

$${\varphi}^{*}\xb7\frac{\partial \mathrm{RTE}}{\partial {c}_{i}}-\frac{\partial \varphi}{\partial {c}_{i}}\xb7\mathrm{ARTE}.$$The term $\frac{\partial \mathrm{RTE}}{\partial {c}_{i}}$ is

## Eq. (25)

$$[\overrightarrow{s}\xb7\nabla +{\mu}_{a}(\overrightarrow{r})+{\mu}_{s}(\overrightarrow{r})]\frac{\partial \varphi (\overrightarrow{r},\overrightarrow{s})}{\partial {c}_{i}}+\frac{\partial {\mu}_{a}}{\partial {c}_{i}}\varphi (\overrightarrow{r},\overrightarrow{s})-{\mu}_{s}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime})\frac{\partial \varphi (\overrightarrow{r},{\overrightarrow{s}}^{\prime})}{\partial {c}_{i}}\mathrm{d}{\overrightarrow{s}}^{\prime}=0,$$## Eq. (26)

$$(\overrightarrow{s}\xb7\nabla +{\mu}_{a}(\overrightarrow{r})+{\mu}_{s}(\overrightarrow{r}))\frac{\partial \varphi (\overrightarrow{r},\overrightarrow{s})}{\partial {c}_{i}}-{\mu}_{s}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime})\frac{\partial \varphi (\overrightarrow{r},{\overrightarrow{s}}^{\prime})}{\partial {c}_{i}}\mathrm{d}{\overrightarrow{s}}^{\prime}=-\alpha (\overrightarrow{r})\varphi (\overrightarrow{r},\overrightarrow{s})\delta (\overrightarrow{r}-{\overrightarrow{r}}_{i}).$$The combination of the ARTE and the RTE from Eq. (24) is (for brevity we omit the dependency on $\overrightarrow{r}$ and $\overrightarrow{s}$ for the moment)

## Eq. (27)

$${\varphi}^{*}(\overrightarrow{s}\xb7\nabla +{\mu}_{a}+{\mu}_{s})\frac{\partial \varphi}{\partial {c}_{i}}-{\varphi}^{*}{\mu}_{s}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime})\frac{\partial \varphi}{\partial {c}_{i}}\mathrm{d}{\overrightarrow{s}}^{\prime}\phantom{\rule{0ex}{0ex}}-\frac{\partial \varphi}{\partial {c}_{i}}(-\overrightarrow{s}\xb7\nabla +{\mu}_{a}+{\mu}_{s}){\varphi}^{*}+\frac{\partial \varphi}{\partial {c}_{i}}{\mu}_{s}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime}){\varphi}^{*}\mathrm{d}{\overrightarrow{s}}^{\prime}\phantom{\rule{0ex}{0ex}}=-\alpha {\varphi}^{*}\varphi \delta (\overrightarrow{r}-{\overrightarrow{r}}_{i})-\frac{\partial \varphi}{\partial {c}_{i}}{q}^{*}.$$The left-hand side can be simplified and can be given as

## Eq. (28)

$${\varphi}^{*}(\overrightarrow{s}\xb7\nabla )\frac{\partial \varphi}{\partial {c}_{i}}+\frac{\partial \varphi}{\partial {c}_{i}}(\overrightarrow{s}\xb7\nabla ){\varphi}^{*}-{\varphi}^{*}{\mu}_{s}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime})\frac{\partial \varphi}{\partial {c}_{i}}\mathrm{d}{\overrightarrow{s}}^{\prime}+\frac{\partial \varphi}{\partial {c}_{i}}{\mu}_{s}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime}){\varphi}^{*}\mathrm{d}{\overrightarrow{s}}^{\prime}=-\alpha {\varphi}^{*}\varphi \delta (\overrightarrow{r}-{\overrightarrow{r}}_{i})-\frac{\partial \varphi}{\partial {c}_{i}}{q}^{*}.$$The next steps of the derivation are from now on identical to Eqs. (4.11 ff) in Ref. 27. The left-hand side of Eq. (28) equates to zero, which can be seen after integrating first over all angles $s\in {S}^{n-1}$ and over the volume $\mathrm{\Omega}$ with surface $\partial \mathrm{\Omega}$:

## Eq. (29)

$${\int}_{\mathrm{\Omega}}{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s}\xb7\nabla )\frac{\partial \varphi}{\partial {c}_{i}}\mathrm{d}\overrightarrow{s}\mathrm{d}\mathrm{\Omega}+{\int}_{\mathrm{\Omega}}{\int}_{{S}^{2}}\frac{\partial \varphi}{\partial {c}_{i}}(\overrightarrow{s}\xb7\nabla ){\varphi}^{*}\mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}\phantom{\rule{0ex}{0ex}}-{\int}_{\mathrm{\Omega}}{\mu}_{s}{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s}){\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime})\frac{\partial \varphi ({\overrightarrow{s}}^{\prime})}{\partial {c}_{i}}\mathrm{d}{\overrightarrow{s}}^{\prime}\text{\hspace{0.17em}}\mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}\phantom{\rule{0ex}{0ex}}+{\int}_{\mathrm{\Omega}}{\mu}_{s}{\int}_{{S}^{2}}\frac{\partial \varphi (\overrightarrow{s})}{\partial {c}_{i}}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime}){\varphi}^{*}({\overrightarrow{s}}^{\prime})\mathrm{d}{\overrightarrow{s}}^{\prime}\text{\hspace{0.17em}}\mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}\phantom{\rule{0ex}{0ex}}=-{\int}_{\mathrm{\Omega}}\alpha \delta (\overrightarrow{r}-{\overrightarrow{r}}_{i}){\int}_{{S}^{2}}{\varphi}^{*}\varphi \mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}-{\int}_{\mathrm{\Omega}}{q}^{*}{\int}_{{S}^{2}}\frac{\partial \varphi}{\partial {c}_{i}}\mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}.$$To see that the left-hand side equates to zero the volume integral with the terms involving $\nabla $ can be transformed into a surface integral using this form of the divergence theorem:

## Eq. (30)

$${\int}_{\mathrm{\Omega}}ab\xb7\nabla c\mathrm{d}\mathrm{\Omega}+{\int}_{\mathrm{\Omega}}cb\xb7\nabla a\mathrm{d}\mathrm{\Omega}={\int}_{\partial \mathrm{\Omega}}b\xb7\widehat{n}ac\mathrm{d}\mathrm{\Omega},$$## Eq. (31)

$$a={\varphi}^{*}(\overrightarrow{s}),\phantom{\rule[-0.0ex]{1em}{0.0ex}}b=\overrightarrow{s}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\phantom{\rule[-0.0ex]{1em}{0.0ex}}c=\frac{\partial \varphi (\overrightarrow{s})}{\partial {c}_{i}}.$$Hence, using the divergence theorem, the first two terms in Eq. (28) are replaced by a single term

By definition, both ${\varphi}^{*}\to 0$ and $\partial \mathrm{\varphi}/\partial {c}_{i}\to 0$ on the boundary of the volume $\partial \mathrm{\Omega}$. Thus, the integrand on the right-hand side and hence the integral equate to zero.

This reduces Eq. (28) to

## Eq. (32)

$$-{\int}_{\mathrm{\Omega}}{\mu}_{s}{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s}){\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime})\frac{\partial \varphi ({\overrightarrow{s}}^{\prime})}{\partial {c}_{i}}\mathrm{d}{\overrightarrow{s}}^{\prime}\text{\hspace{0.17em}}\mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}\phantom{\rule{0ex}{0ex}}+{\int}_{\mathrm{\Omega}}{\mu}_{s}{\int}_{{S}^{2}}\frac{\partial \varphi (\overrightarrow{s})}{\partial {c}_{i}}{\int}_{{S}^{2}}\mathrm{\Theta}(\overrightarrow{s},{\overrightarrow{s}}^{\prime}){\varphi}^{*}({\overrightarrow{s}}^{\prime})\mathrm{d}{\overrightarrow{s}}^{\prime}\text{\hspace{0.17em}}\mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}\phantom{\rule{0ex}{0ex}}=-{\int}_{\mathrm{\Omega}}\alpha \delta (\overrightarrow{r}-{\overrightarrow{r}}_{i}){\int}_{{S}^{2}}{\varphi}^{*}\varphi \mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}-{\int}_{\mathrm{\Omega}}{q}^{*}{\int}_{{S}^{2}}\frac{\partial \varphi}{\partial {c}_{i}}\mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}.$$Because we can assume that all functions are integrable, the terms on the left-hand side of Eq. (32) can be rearranged after changing the order of integration, yielding

Hence, the left-hand side of Eq. (32) equates to zero, which leaves us with

## Eq. (33)

$${\int}_{\mathrm{\Omega}}{q}^{*}{\int}_{{S}^{2}}\frac{\partial \varphi}{\partial {c}_{i}}\mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}=-{\int}_{\mathrm{\Omega}}\alpha \delta (\overrightarrow{r}-{\overrightarrow{r}}_{i}){\int}_{{S}^{2}}{\varphi}^{*}\varphi \mathrm{d}\overrightarrow{s}\text{\hspace{0.17em}}\mathrm{d}\mathrm{\Omega}.$$The fluence $\mathrm{\varphi}$ is by definition the integral of the time-integrated radiance $\varphi (\overrightarrow{s})$ over all directions $\overrightarrow{s}$, that is,

## Eq. (34)

$$\mathrm{\varphi}(\overrightarrow{r})={\int}_{{S}^{2}}\varphi (\overrightarrow{r},\overrightarrow{s})\mathrm{d}\overrightarrow{s}.$$Applying the derivative with respect to ${c}_{i}$ yields

## Eq. (35)

$$\frac{\partial \mathrm{\varphi}(\overrightarrow{r})}{\partial {c}_{i}}={\int}_{{S}^{2}}\frac{\partial \varphi (\overrightarrow{r},\overrightarrow{s})}{\partial {c}_{i}}\mathrm{d}\overrightarrow{s},$$## Eq. (36)

$${\int}_{\mathrm{\Omega}}{q}^{*}\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}\mathrm{d}\mathrm{\Omega}=-\alpha ({\overrightarrow{r}}_{i}){[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}{V}_{\mathrm{vox}}.$$Because we have defined the adjoint source term as ${q}^{*}=({p}_{0}^{m}-{p}_{0})\mathrm{\Gamma}{\mu}_{a}$ the left-hand side of Eq. (36) is exactly the last term in Eq. (19)

## Eq. (37)

$${\int}_{\mathrm{\Omega}}({p}_{0}^{m}-{p}_{0})\mathrm{\Gamma}{\mu}_{a}\frac{\partial \mathrm{\varphi}}{\partial {c}_{i}}\mathrm{d}\mathrm{\Omega}=-\alpha ({\overrightarrow{r}}_{i}){[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}{V}_{\mathrm{vox}}.$$Inserting this result into the error-gradient Eq. (19) provides us with the subgradient term obtained using one wavelength

## Eq. (38)

$$\frac{\partial {\epsilon}_{{\lambda}_{l}}}{\partial {c}_{i}}=-[{p}_{0}^{m}({\overrightarrow{r}}_{i})-{p}_{0}({\overrightarrow{r}}_{i})]\alpha ({\lambda}_{l})\mathrm{\varphi}({\overrightarrow{r}}_{i}){V}_{\mathrm{vox}}+\alpha ({\overrightarrow{r}}_{i}){[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}{V}_{\mathrm{vox}}.$$In summary, the adjoint formalism enables the update of the concentration distribution using only terms obtained from running the forward model implemented by the RMC algorithm. The following section will focus on the question as to how the term ${\varphi}^{*}\varphi $ can be computed and approximated.

## 8.

## Appendix C: Radiance Approximations

To simulate the radiance, some discretization over angle is required. One option is to use a set of piecewise constant basis functions. However, due to the importance of ballistic and quasi-ballistic light propagation in PAT, a high number of discretization orders would be required to capture the directionality of the radiance in regions near the source. This would result in very large memory demands in finite-element implementations (see Refs. 29, 50, and 51, for more details on different approaches for estimating the radiance, their limitations, and suggested solutions. For a summary thereof, see Ref. 52). Inspired by the ${P}_{n}$ approximation^{22} and continuing the work presented by Refs. 19 and 52 we approximate the radiance in 3-D using spherical harmonics as basic functions, as in Ref. 31. Instead of discretizing the angular domain into segments, the radiance field $\varphi $ at any position $\overrightarrow{r}$ can be expanded using a series of spherical harmonics^{53}^{,}^{54}

## Eq. (39)

$$\varphi (\overrightarrow{r},\overrightarrow{s})=\sum _{l=0}^{{N}_{L}=\infty}\sum _{m=-l}^{l}{\psi}_{lm}(\overrightarrow{r}){Y}_{lm}(\overrightarrow{r},\overrightarrow{s}),$$## Eq. (40)

$${Y}_{lm}=\{\begin{array}{ll}\sqrt{2}\sqrt{\frac{2l+1}{4\pi}}\sqrt{\frac{(l-|m|)!}{(l+|m|)!}}{P}_{l}^{|m|}\text{\hspace{0.17em}}\mathrm{cos}(\theta )\mathrm{sin}(|m|\varphi )& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}m<0,\\ \sqrt{\frac{2l+1}{4\pi}}{P}_{l}^{m}[\mathrm{cos}(\theta )]& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}m=0,\\ \sqrt{2}\sqrt{\frac{2l+1}{4\pi}}\sqrt{\frac{(l-m)!}{(l+m)!}}{P}_{l}^{m}\text{\hspace{0.17em}}\mathrm{cos}(\theta )\mathrm{cos}(m\varphi )& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}m>0,\end{array}$$The advantage of expressing ${\varphi}^{*}\varphi $ in a spherical harmonics expansion lies in the fact that the ${Y}_{lm}$ forms an orthonormal basis, i.e.,

## Eq. (41)

$${\int}_{{S}^{2}}{Y}_{lm}\xb7{Y}_{{l}^{\prime}{m}^{\prime}}\mathrm{d}\overrightarrow{s}={\delta}_{l{l}^{\prime}}{\delta}_{m{m}^{\prime}}.$$Using this orthonormality condition greatly simplifies the term ${[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}$ from Eq. (38), when the radiance is expressed in spherical harmonics

## Eq. (42)

$${[{\int}_{{S}^{2}}{\varphi}^{*}(\overrightarrow{s})\varphi (\overrightarrow{s})\mathrm{d}\overrightarrow{s}]}_{\overrightarrow{r}={\overrightarrow{r}}_{i}}={\int}_{{S}^{2}}[\sum _{l=0}^{{N}_{L}}\sum _{m=-l}^{l}{\psi}_{lm}^{*}({\overrightarrow{r}}_{i}){Y}_{lm}^{*}({\overrightarrow{r}}_{i},\overrightarrow{s})][\sum _{{l}^{\prime}=0}^{{N}_{L}}\sum _{{m}^{\prime}=-{l}^{\prime}}^{{l}^{\prime}}{\psi}_{{l}^{\prime}{m}^{\prime}}({\overrightarrow{r}}_{i}){Y}_{{l}^{\prime}{m}^{\prime}}({\overrightarrow{r}}_{i},\overrightarrow{s})]\mathrm{d}\overrightarrow{s},$$## Eq. (43)

$$=[\sum _{{l}^{\prime}=0}^{{N}_{L}}\sum _{{m}^{\prime}=-{l}^{\prime}}^{{l}^{\prime}}{\psi}_{{l}^{\prime}{m}^{\prime}}({\overrightarrow{r}}_{i})]\left[\genfrac{}{}{0ex}{}{\underset{\u23df}{{\int}_{{S}^{2}}{Y}_{lm}^{*}({\overrightarrow{r}}_{i},\overrightarrow{s}){Y}_{{l}^{\prime}{m}^{\prime}}({\overrightarrow{r}}_{i},\overrightarrow{s})\mathrm{d}\overrightarrow{s}}}{{\delta}_{l{l}^{\prime}}{\delta}_{m{m}^{\prime}}}\right]$$## Eq. (44)

$$=\sum _{l=0}^{{N}_{L}}\sum _{m=-l}^{l}{\psi}_{lm}({\overrightarrow{r}}_{i}){\psi}_{lm}^{*}({\overrightarrow{r}}_{i}).$$Hence, with this approximation the gradient to update the distribution of chromophore $k$ finally becomes

## Eq. (45)

$$\frac{\partial {\epsilon}_{{\lambda}_{l}}}{\partial {c}_{i}}={V}_{\mathrm{vox}}\{-[{p}_{0}^{m}({\overrightarrow{r}}_{i})-{p}_{0}({\overrightarrow{r}}_{i})][{\beta}_{k}H({\overrightarrow{r}}_{i})+\mathrm{\Gamma}({\overrightarrow{r}}_{i}){\alpha}_{k}\mathrm{\varphi}({\overrightarrow{r}}_{i})]+{\alpha}_{k}\sum _{l=0}^{{N}_{L}}\sum _{m=-l}^{l}{\psi}_{lm}({\overrightarrow{r}}_{i}){\psi}_{lm}^{*}({\overrightarrow{r}}_{i})\}.$$## 9.

## Appendix D: Discretization

To solve the given equations numerically, the bases in which the data and model output are represented must be defined. Assuming a sampling of continuous fields in a point-wise basis ${\mathrm{\Psi}}_{j}(\overrightarrow{r})=\delta (\overrightarrow{r}-{\overrightarrow{r}}_{i})$, as shown in Ref. 18. Hence, the data projected onto this basis becomes a vector of coefficients ${\overrightarrow{p}}_{0}^{m}$

## Eq. (46)

$${h}_{j}=\u27e8{\mathrm{\Psi}}_{j},{p}_{0}^{m}(\overrightarrow{r})\u27e9={p}_{0}^{m}({\overrightarrow{r}}_{j}).$$Equally, all other continuous fields are discretized. Transforming an integral of any integrable function $f(\overrightarrow{r})$ over the continuous domain $\mathrm{\Omega}$ into discretized space introduces a volume element $\mathrm{d}\mathrm{\Omega}={V}_{\mathrm{vox}}$

## Acknowledgments

The authors would like to thank Simon Arridge and Ben Cox for valuable discussions. This project was funded by the Deutsche Forschungsgemeinschaft (DFG project grants LA3273/1-1, PR1226/5-1), the Royal Academy of Engineering (RAEng Fellowship RF1516\15\33), and the Engineering and Physical Sciences Research Council (EPSRC grant EP/N032055/1). We acknowledge financial support within the funding programme Open Access Publishing by the German Research Foundation (DFG).