Three-dimensional quantitative photoacoustic tomography using an adjoint radiance Monte Carlo model and gradient descent

Abstract. Quantitative photoacoustic tomography aims to recover maps of the local concentrations of tissue chromophores from multispectral images. While model-based inversion schemes are promising approaches, major challenges to their practical implementation include the unknown fluence distribution and the scale of the inverse problem. We describe an inversion scheme based on a radiance Monte Carlo model and an adjoint-assisted gradient optimization that incorporates fluence-dependent step sizes and adaptive moment estimation. The inversion is shown to recover absolute chromophore concentrations, blood oxygen saturation, and the Grüneisen parameter from in silico three-dimensional phantom images for different radiance approximations. The scattering coefficient is assumed to be homogeneous and known a priori.


Introduction
Biomedical photoacoustic (PA) tomography is a hybrid soft-tissue imaging modality that combines the high spatial resolution of ultrasound with the high contrast and specificity of optical imaging techniques. [1][2][3] It relies on the generation of acoustic waves inside the tissue, which result from the absorption of intensity-modulated light, such as laser pulses or frequency chirps, by the tissue chromophores. From time-resolved PA signals recorded at multiple measurement points around the object, three-dimensional (3-D) data sets of the initial pressure distribution (i.e., PA images) can be calculated using acoustic reconstruction algorithms. Quantitative PA tomography (QPAT) aims to exploit the wavelength dependence of the image intensity to recover the local concentrations of endogenous tissue chromophores and exogenous contrast agents from which functional parameters, such as blood oxygen saturation, can be derived. To relate the PA image intensity to local chromophore concentrations, computational models of the physical processes during the image generation in conjunction with inversion schemes represent one approach to QPAT. 4,5 A major challenge in QPAT is the unknown light fluence in the tissue, [5][6][7] which is a nonlinear function of the concentrations and the scattering coefficient. Its effects on PA images have been described as spectral coloring and structural distortion. 5 For an accurate quantification of concentrations and their ratios (e.g., blood oxygenation), the wavelength-dependent fluence distribution has to be accounted for.
Commonly used fluence correction methods include the application of empirical correction factors 8 or simple models under the assumption of homogeneous optical properties. 9 This is deemed sufficient to recover the absorption coefficient distribution from which maps of the chromophore concentrations can then be calculated using linear matrix inversions. The main limitation of these methods is the reliance on a priori knowledge of the distribution and wavelength dependence of the fluence, i.e., Φðx; λÞ. For in vivo images, this assumption is often invalid and can lead to significant quantification errors especially at greater depth. Alternative approaches include datadriven methods. In the work reported by Tzoumas et al. 10 the wavelength-dependent fluence is written on the basis of eigenspectra obtained using a principle component analysis of in silico training data, and Kirchner et al. 11 calculated the fluence maps by applying deep learning and local context encoding to a large number of training data. While these methods have the potential to offer fast inversions, they require large training sets and may thus lack generality.
Model-based inversions, incorporating light transport models to predict the fluence as a function of the spatial distribution of the absorption and scattering coefficients, remain the most promising approach to QPAT. The initial pressure distribution is obtained by multiplying the fluence with the distribution of the absorption coefficient and the Grüneisen parameter, and PA image data sets can be obtained using acoustic propagation models. The difference between the model output and the measured data, i.e., the objective function, is minimized by iteratively updating the absorption and scattering coefficients during the inversion until convergence is reached. To overcome the nonuniqueness that arises from the use of single-wavelength images, 5 multi-illumination approaches 12,13 or multiwavelength image acquisition, in combination with a priori knowledge of the wavelength dependence of the absorption and scattering coefficients, 14,15 are employed.
A major challenge in high-resolution 3-D QPAT is the large number of variables (>10 6 ). While gradient-free methods can be applied to small-scale problems, 16 inversions of a larger scale (tens of variables and more) quickly become computationally unfeasible. Gradient-based methods have the potential to overcome these limitations. They have been implemented using the adjoint formalism, [17][18][19] which is applied using a finite element model of the diffusion approximation (DA) to the inversion of measured 3-D phantom images. 20 While the DA is valid in the diffusive regime and can be implemented efficiently, 21,22 highresolution PAT can cover depths in the ballistic and quasiballistic regimes, where the DA may not be valid. 23 Methods that aim to solve the radiative transfer equation (RTE) directly are computationally expensive and have only been demonstrated in twodimensional (2-D) images so far. [24][25][26][27] Monte Carlo (MC) models have recently gained attention 28-31 due to their highly parallelized architecture and advances in graphics-processing units and have already been applied to 2-D QPAT inversions 19 and in initial studies with limited parameter space in 3-D. 16,32 In this paper, a method for inverting multiwavelength 3-D images based on an adjoint formulation of a radiance MC model is demonstrated in silico. The challenges that are addressed in this work are (1) the optimization of the objective function using inherently noisy gradients, (2) accounting for the effect of the concentration-dependent Grüneisen parameter, and (3) the representation of the radiance in terms of spherical harmonics. The capability of this approach to recover absolute chromophore concentrations and their ratios, e.g., blood oxygen saturation (blood sO 2 ), from high-resolution 3-D image data sets is demonstrated.

Methods
The forward model of the generation of the initial pressure shown in tomographic PA images is described in Sec. 2.1. The adjoint formalism 26 with which the gradients of the objective function are calculated is described in Sec. 2.2. The approximation of the radiance field as a finite sum of spherical harmonics 31 within an MC light transport model is described in Secs. 2.3 and 2.4. The numerical phantom and the simulation parameters are outlined in Secs. 2.5 and 2.6, respectively. To reduce the impact of the inherent MC noise on the parameter update and to maximize the convergence speed of the gradient descent, an adaptive moment estimation (Adam) optimization algorithm 33 is employed (Sec. 2.7).

Forward Model
Assuming that the effects of the limited detection aperture and acoustic propagation can be neglected, the image intensity represents the initial pressure distribution, p 0 , which is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 1 9 3 p 0 ðr; λÞ ¼ ΓðrÞHðrÞ; (1) where Γ is the Grüneisen parameter, which describes the PA efficiency; H is the absorbed optical energy density;r is the spatial coordinate; and λ is the excitation wavelength. The absorbed energy density is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 1 1 8 HðrÞ ¼ μ a ðr; λÞΦðr; λÞ; where μ a is the absorption coefficient and Φ is the light fluence, which is the radiance ϕ integrated over all angles: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 7 5 2 The absorption coefficient is related to the chromophore concentrations via the specific absorption coefficient, α k ðλÞ, i.e., E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 3 2 6 ; 6 9 6 μ a ðλ;rÞ ¼ X N k k c k ðrÞα k ðλÞ; (4) where N k is the number of chromophores and k indicates the chromophore type. The Grüneisen parameter Γ is assumed to be linearly dependent on chromophore concentrations, 15,34,35 i.e., E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 3 2 6 ; 6 0 2 where β k is an empirical and chromophore-specific coefficient. The MC method is chosen for modeling the light fluence as it provides an accurate approximation of the radiative transport equation for superficial (1 to 2 cm), high-resolution QPAT. 5 This involves the launch of photons (typically represented as packets of energy 36 ) according to a predefined source distribution. Their propagation within the domain is determined by the optical coefficients μ a ðrÞ and μ s ðrÞ, the scattering phase function Θðŝ;ŝ 0 ;rÞ, and the refractive index n. The deposition of energy when a photon traverses a voxel is determined by the absorption coefficient μ a . The angular dependence of scattering events is described by the Henyey-Greenstein phase function. 37

Adjoint-Assisted Optimization
Assuming Gaussian noise on the measured data, an estimate of the chromophore distributions is found by minimizing the objective function ε, which is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 3 2 6 ; 3 5 9 ε ¼ X N λ l Z Ω 1 2 ½p m 0 ðλ l ;rÞ − p 0 ðλ l ;rÞ 2 dΩ; (6) where Ω is the imaged volume domain, p m 0 ðλ l ;rÞ is the measured PA image at wavelength λ l , p 0 is the PA image obtained from the forward model, and N λ is the number of excitation wavelengths.
To find the chromophore concentration maps, c k ðrÞ, the derivative of ε with respect to c k at any positionr i is required, i.e., ∂ε ∂c i ≡ ∂ε ∂c k ðr i Þ . For the sake of brevity, only one chromophore will be considered in the remaining description, i.e., ∂ ∂c i ≡ ∂ ∂c 1 ðr i Þ . The objective function ε represents the sum of the objective functions, ε λ l , at each excitation wavelength, which is given as ε ¼ P N λ l ε λ l . For simplicity, only one wavelength, λ l , will be considered to describe the derivative: Since p 0 ¼ Γμ a Φ, after applying the chain rule, ∂p 0 ∂c i becomes E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 3 2 6 ; 1 0 9 ¼ β k δðr −r i ÞHðrÞ þ ΓðrÞα k ðλ l Þδðr −r i ÞΦðrÞ þ ΓðrÞμ a ðrÞ ∂Φ ∂c i ; (9) where δðr −r i Þ is the Dirac delta function. The gradient of the fluence with respect to chromophore concentration at a particular position, ∂Φ ∂c i , is generally unknown, and one can make use of the adjoint formalism. 17,26 Briefly, the adjoint approach defines a source term q Ã for the adjoint radiance ϕ Ã E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 6 3 ; 6 4 4 q Ã ðr; λÞ ¼ ΓðrÞμ a ðr; λÞ½p m 0 ðr; λÞ − p 0 ðr; λÞ; (10) in which the adjoint radiance is expressed in terms of the difference between the measured images and the modeled images of p 0 . Using this adjoint source definition enables the substitution of the term containing the unknown ∂Φ ∂c i in Eq. (8) with a term containing the radiance ϕ and its adjoint counterpart ϕ Ã E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 6 3 ; 5 5 6 where ∫ S 2 is the integral over solid angles, and S 2 is the unit sphere. The term V vox is the result of the discretization of the data using a piecewise constant basis to sample (see Sec. 9). The gradient required to update the concentration of one chromophore at one wavelength is therefore given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 6 3 ; 4 2 8 The adjoint model and its derivation are described in general and in more detail in Secs. 6 and 7.

Radiance Approximation
As the radiance ϕðsÞ and the adjoint radiance ϕ Ã ðsÞ are functions of solid angle and are defined on the surface of the unit sphere, both quantities can be expressed on the basis of spherical harmonic functions. The expansion of the radiance into spherical harmonics is based on previous work 31 and is inspired by the P n approximations, described in Ref. 22 (similar to a Fourier expansion in 1-D or 2-D) and is outlined in detail in Sec. 8. Using a finite expansion of ½∫ S 2 ϕ Ã ðsÞϕðsÞdsr ¼r i in real spherical harmonics, the gradient equation is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 6 3 ; 2 0 3 where ψ lm ðr i Þ is the radiance field approximated by the spherical harmonics function of degree l, order m at positionr i and ψ Ã lm its adjoint counterpart. Equation (13) is implemented in a gradient-based optimization scheme (described in Sec. 2.7), which updates the concentrations iteratively to minimize the mismatch between measured and modeled data [Eq. (6)].
The last term of the gradient in Eq. (13) contains an expansion of the radiance and the adjoint radiance in spherical harmonics E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 3 2 6 ; 7 1 9 α k X N L l¼0 X l m¼−l ψ lm ðr i Þψ Ã lm ðr i Þ: (14) Here, N L ¼ ∞ would give the most accurate radiance approximation, but due to constraints with respect to computation time and memory, the degree of the spherical harmonics, N L , needs to be finite. However, it is not clear a priori up to which value of N L the corresponding coefficients needs to be stored to approximate the radiance with sufficient accuracy. This has been investigated by evaluating the inversion scheme for three different configurations: (1) N L ¼ 0, i.e., using only the fluence, (2) for N L ¼ 4, i.e., the most accurate representation of the radiance, and (3) omitting the radiance, i.e., ψ lm ¼ ψ Ã lm ¼ 0 for all l, m, thus neglecting the gradient term provided by the adjoint radiance. All other parameters remain the same during the inversions.

Radiance Monte Carlo Simulations
Most MC simulators provide only the light fluence, which is the radiance integrated over all directions and time. To satisfy Eq. (12), a radiance MC algorithm (RMC) 19,31 is used. To obtain the radiance ϕðrÞ, the directional information of the photon traversing a voxel at positionr is stored by depositing the photon weight into the relevant spherical harmonics coefficients ψ lm ðrÞ. The RMC code is implemented in the Julia programming language, 38 which controls and dispatches the execution of kernels on both the CPU (written in Julia), and the GPU (written in NVIDIA's CUDA language). 31 Because the definition of the adjoint model does not change the dynamics of photon propagation, the same RMC simulation code provides the adjoint radiance ϕ Ã ðrÞ.

In Silico Phantom
An MC model has been used to calculate the multiwavelength 3-D PA images that represented measured data and are referred to as reference images or reference data throughout this paper. The domain of the model is divided into subvolumes (SVs) that represented simplified anatomical structures, such as a subcutaneous tumor and a number of discrete blood vessels. The depths of the structures are similar to those observed in in vivo images acquired using a Fabry-Perot scanner with a planar detection geometry. [39][40][41][42] Absorption is assumed to originate from three chromophores, i.e., oxyhemoglobin ðHbO 2 Þ, deoxyhemoglobin (HHb), and methylene blue (Mb), as an exogenous contrast agent. It should be noted that the method can potentially be applied to any number of chromophores. The computational model of the phantom consists of nine different SVs, each with homogeneous optical properties (Fig. 1). This includes six tube-like structures to mimic blood vessels, a tumor consisting of an ellipsoidal rim and core SV, and the background. The tubes are positioned adjacent to the tumor at depths of 1.5 and 7.5 mm, have a circular cross section with a radius of 0.4 mm, and are filled with HHb and HbO 2 . The blood oxygen saturation (sO 2 ) is defined as the ratio of the concentration of oxyhemoglobin and the total hemoglobin concentration The tube-like structures are assumed to contain blood sO 2 ranging from 75% to 98% to represent the typical values found in veins and arteries. 43 The total hemoglobin concentration is 2.3 mM. 44 Two concentric ellipsoids represent the tumor at a depth ranging from 3.0 mm to 6.0 mm. The outer (inner) ellipsoid's axes are a ¼ 4.5ð2.5Þ mm, b ¼ 3.5ð2.0Þ mm, and c ¼ 2.0ð0.8Þ mm, respectively. The tumor SVs contained 20% blood (0.46 mM). 45 The tumor shell has an sO 2 of 80%, whereas that of the core is 40% to mimic necrotic tissue. The tumor also contains an exogenous contrast agent, Mb, at a concentration of 10 μM. The background SV contains a blood volume fraction of 1.5% with an sO 2 of 60%. Other parameters, such as the scattering anisotropy (g ¼ 0.9), the refractive index (n i ¼ 1.33 inside the domain, n e ¼ 1.5 outside of the domain), and μ s ðr; λÞ are held constant and uniform across the domain. The absorption spectra of HHb, HbO 2 , and Mb are shown in Fig. 2. The wavelength dependence of the reduced scattering coefficient μ 0 s ðλÞ ¼ μ s ðλÞð1 − gÞ is approximated using The Grüneisen parameter of water is set to 0.124. The coefficient β k describing the total hemoglobin concentration dependence of the Grüneisen parameter is set to β Hb;HbO ¼ 0.02146 L∕mmol. 35 It is assumed that the Mb concentration distribution does not change the Grüneisen parameter (β Mb ¼ 0). The remaining parts of the volume are assumed to be filled with materials, such as water and lipids, whose absorption is considered negligible at the selected excitation wavelengths.

Monte Carlo Simulation Parameters
To obtain reference images, i.e., data sets that represent measured multiwavelength PA images, the domain was discretized into 200 × 200 × 100 (i.e., 4 × 10 6 ) isotropic voxels of size V vox ¼ 10 −3 mm 3 yielding a total volume of 20 × 20 × 10 mm 3 . The source profile was a 2-D Gaussian function with a width of σ ¼ 4 mm. About 2 · 10 9 photon packets were used in the MC simulation of the light fluence. The angle-dependent radiance was not calculated. Reference image data sets were calculated for three different excitation wavelengths that coincided with the absorption peaks of Mb (664 nm), HHb (758 nm), and the isosbestic point of hemoglobin absorption (798 nm). Gaussian noise (σ ¼ 0.1% of the maximum image intensity) was added to the reference data, resulting in negative image intensities in regions of low p 0 .
The domain discretization used during the inversion was identical to that used for the reference data set, which may raise the question of whether this constitutes a so-called inverse crime. In MC models, the discretization is used merely as a basis for sampling physical quantities while photon packets can propagate freely in continuous space. This is in contrast to other methods, such as finite elements, where the discretization has a direct impact on the accuracy of the solution. Taking also into account the stochastic nature of MC models, it can be concluded that using identical discretizations does not constitute an inverse crime.
During an inversion, 1 × 10 7 photon packets were used for the calculation of the radiance and fluence; 5 × 10 6 photons were used for the calculation of the corresponding adjoint quantities. The typical running time for one inversion iteration, including the adjoint model with N L ¼ 4, was 84 s on a high-end consumer GPU (NVIDIA GeForce Titan X Pascal). This was reduced to 17 s when the radiance term in Eq. (13) was neglected, i.e.,  N L ¼ 0 and no adjoint MC runs. Since k ¼ 3 independent chromophore concentrations were associated with each voxel, the model contained a total of 12 million variables.

Gradient-Based Optimization
The gradient-based optimization was initialized assuming a homogeneous c HHb ¼ c HbO ¼ 0.023 mM, i.e., sO 2 ¼ 50%, and c Mb ¼ 0.0 mM. Owing to the stochastic nature of MC models, the gradients [Eq. (13)] were subjected to noise. To compensate for this, the Adam optimization algorithm 33 was employed. It was developed for the first-order optimization of noisy objective functions in high-dimensional parameter spaces and can be seen as an extension of the momentum algorithm. 47,48 The Adam algorithm calculated an exponential moving average of the gradient (first moment) and the squared gradient (second moment) using the decay rates β 1;Adam and β 2;Adam with an additional biascorrection step. The final update was calculated by multiplying the step-size parameter γ Adam with the first-order moment divided by the square root of the second-order moment. The detailed description can be found in Ref. 33. This algorithm was found to dramatically increase the convergence speed, compared to standard gradient descent. Its efficiency depended on the values of a set of parameters, including the step-size γ Adam , the decay rates β 1;Adam and β 2;Adam , and an additional ε Adam , which avoided division by zero. The decay rates and ε were set to recommended default values (β 1;Adam ¼ 0.9, β 2;Adam ¼ 0.999, and ε Adam ¼ 10 −8 ), and the step size was assigned different values depending on the chromophore according to Eqs. (16) and (17).
To ensure fast convergence for all chromophores, the step size for Mb is set to be significantly smaller than that of HHb∕HbO 2 since Mb concentrations are in the range of μM while those of HHb∕HbO 2 are in the range of mM. The chromophore-dependent step size is calculated as follows: where γ ref is the step size of a reference chromophore, the value of which was set ad hoc to γ ref ¼ γ HbO 2 ¼ 100; α chrom∕ref;λ is the specific absorption coefficient of the respective chromophore at wavelength λ; and c max;ref∕chrom is the anticipated maximum concentration of the respective chromophore, which is set to physiological reasonable values (c max; The gradient is also expressed as a function of fluence in Eq. (13), either directly or via H and Δp. This would result in slow convergence in regions of low fluence. To compensate for this, a spatially dependent step size is used, which increases the step size by normalizing it by the mean fluence over all wavelengths: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 7 ; 6 3 ; 1 7 4 γðrÞ chrom;scaled ¼ Note thatΦ norm ðrÞ is the normalized mean fluence: wherer max ¼ arg maxr P λ Φðr; λÞ represents the location where the total fluence is at a maximum. The parameter ε Φ avoids division by zero and determines the maximum change in the step size in regions of low fluence. In this study, ε Φ ¼ 10 −4 leads to a sufficient speed-up in convergence for the deepest tubes. The optimization of the step size with respect to the different chromophores and the local fluence is often referred to as preconditioning.
A single iteration of the gradient-based update consisted of the execution of the MC model and its adjoint counterpart for all three wavelengths. The inversion was run for 1500 iterations to investigate the convergence. After each iteration, the updated concentrations for HHb, HbO 2 , and Mb were limited to a reasonable range of values (also known as projected gradient descent) to avoid spurious overshooting or undershooting that could lead to physiologically unrealistic concentrations. To compensate for the effects of noise in low-absorbing regions (i.e., negative p 0 in reference images), negative chromophore concentrations, and hence negative μ a values, were allowed during the gradient descent. To ensure stability, the range of negative concentration values was limited to a tenth of the maximum positive concentrations (−0.35 to 3.5 mM for c HHb and c HbO 2 , −0.02 to 0.2 mM for c Mb ). The 3-D blood sO 2 maps were calculated from the recovered c HbO 2 and c HHb images. The scattering distribution was assumed to be known a priori.
To verify that the inversion scheme is valid over a range of physiologically plausible parameter values, multiwavelength reference images were calculated and inverted for different sO 2 values in the inner tumor region and the background. Two scenarios were chosen, each comprising five different combination of sO 2 values. First, as the tumor core (being completely enclosed by the rim) may be assumed to be most strongly affected by spectral coloring, its sO 2 value was varied from 10% to 90% in 20% increments (SV 6), whereas all other parameters remained fixed. Second, the sO 2 value of the background (SV ID 1) was varied from 10% to 90% in 20% increments.

Results
The accuracy of the recovered chromophore concentration and sO 2 maps are reported in Secs. 3.1 and 3.2. In Sec. 3.3, the results obtained using multiple inversions of image data sets, in which the chromophore concentrations and their ratios are varied, are reported.

Absolute Concentrations
Examples of 3-D volume-rendered images of the absolute concentrations of HbO 2 , HHb, and Mb recovered after 1500 iterations are shown in Figs. 3(a)-3(c). The color scales are thresholded to render the background with its comparatively low chromophore concentrations transparent. To reduce the effect of the added Gaussian noise on the rendered images (particularly in regions of low fluence), a 3-D median filter is applied (noniterative, edge-and face-connected). The error functional as a function of the number of iterations is shown in Fig. 3(d). The value of the error functional cannot reach zero due to the inherent MC noise of the forward model used during the inversion. The dashed orange line indicates the minimum value of the error functional that is reached with 1 × 10 7 photons, and the green dotted line indicates the total noise level consisting of MC noise and Gaussian noise added to the reference data. The MC noise is obtained from forward calculations with prior knowledge of the correct chromophore distributions.
In Fig. 4, cross-sectional xz-images of the true and recovered concentration of the three chromophores and the absolute error    Table 1 contains the true and recovered concentrations averaged over all voxels of each SV defined in Sec. 2.5. The brackets indicate the standard deviation (concise notation). The recovered c HHb and c HbO 2 in SV 2 to 9 are in excellent agreement with the true values. While the recovered c Mb are generally in good agreement with the true values, SV 2 to 4 and SV 7 to 9 exhibit negative and small concentrations with large standard deviations. The background (SV 1) also exhibits large errors and standard deviations due to the low signal-to-noise ratio (SNR). In the background region closer to the light source [140 × 140 × 70 central voxels inside the 200 × 200 × 100 image domain, illustrated in Fig. 4(a)] where the SNR is greater, the recovered concentrations are in good agreement with the true values (SV 1* in Table 1).
As described in Sec. 2.4, the inversion is implemented using an approximation of the radiance based on spherical harmonics of varying degree N L , including the omission of the adjoint term. The inversions are found to converge to the same final values while propagating along different routes. Convergence is reached irrespective of the radiance approximation (see Fig. 5). Figure 6 shows the cross-sectional xz-images (y ¼ 10 mm) of the known and recovered blood sO 2 together with the absolute error. The accuracy is clearly affected by noise as shown in the difference image in Fig. 6(c). While most voxels in SV 2 to 9 exhibit an error within AE5% sO 2 , it is noticeably larger for objects at greater depth. Concentrations in the background SV 1 are also affected by noise, particularly in regions further away from the source. However, the accuracy improves near the source where SNR is increased. The average values of blood sO 2 for each SV are also calculated and are summarized in Table 1. Blood sO 2 in SV 2 to 9, i.e., corresponding to blood-filled tubes and the tumor SVs, are found to lie within 0.3% of the true values (i.e., while sO 2 errors of individual voxels can be quite large due to noise, averaging over lots of voxels greatly improves accuracy). The average blood sO 2 for SV 1 (i.e., the entire background SV) is found to show significant errors (18.3% sO 2 ) and is attributed mainly to the adverse effects of noise in low fluence regions. By contrast, the inversion results are accurate for the reduced background SV 1* due to higher SNR.

Validation over a Range of Blood Oxygen Saturation
The inversion scheme is validated on image data sets where sO 2 is varied over a range of physiologically plausible parameter values (Sec. 2.7). The inversions are computed without including the gradient term of the radiance and N L ¼ 0 in order to minimize the computation time. To obtain the final sO 2 value for each image data set, the inversion is run for 1500 iterations after which the average sO 2 is obtained from the SVs.  Figure 7(b) shows the difference between true and recovered sO 2 for all SVs and image data sets sorted by SV. Only the results corresponding to the reduced background SV 1* are shown as this region exhibits sufficient SNR. Table 1 True and recovered (inversion) absolute chromophore concentrations and blood sO 2 in the SVs of the phantom. The concentrations represent the average value over all voxels within each SV; the values in brackets indicate the standard deviations (concise notation). SV 1-background, SV 1 * -background close to the source, SV 5 and SV 6-outer and inner tumor SVs, respectively, SV 2 to 4 and SV 7 to 9-blood-filled tubes.

Discussion
The 3-D maps of absolute concentrations of HbO 2 , HHb and Mb, and the resulting blood sO 2 recovered using a gradientbased MC inversion scheme showed excellent agreement with the true values. To achieve the best possible match of noise-affected PA images and the model, the inversion scheme was implemented without a non-negativity constraint for the chromophore concentrations. Even though negative concentrations were physiologically implausible, it was found that incorporating a non-negativity constraint greatly affected the   6 The 2-D cross-sectional yz-images of the true (a) and recovered sO 2 (b), together with the absolute error (c) at the center of the phantom (y ¼ 10 mm). The images on the left and in the center share the same color bar. In the right color bar, voxels exceeding (dropping below) an absolute difference between true and recovered sO 2 of 15% are shown in green (yellow).
Journal of Biomedical Optics 066001-8 June 2019 • Vol. 24 (6) recovered average concentrations and blood sO 2 values. However, negative concentrations can lead to negative absorption coefficients. In the MC model, it led to the photon packets gaining weight as they traversed a voxel with negative absorption. If too many voxels exhibited negative μ a , unstable inversions can be observed as the photon weight diverges. While this was occasionally observed in this study, it was found that a reduction of the step size and an increase in the number of iterations remedied this problem. The inversion scheme described in this paper included an expression of the radiance and its adjoint in a basis of spherical harmonics. The influence of the adjoint formalism and the spherical harmonics approximation on the accuracy and convergence of the inversion was evaluated under the assumption that the scattering coefficient was known a priori. It was found that neither accuracy nor convergence speed were affected by the radiance term and its adjoint, i.e., the last term in Eq. (13). This was also observed when the radiance term was omitted (Fig. 5) and was confirmed by the relative magnitudes of the individual gradient terms. The radiance term (irrespective of the spherical harmonic approximations) was always significantly smaller than the remaining terms of Eq. (13). Omitting the computation of the adjoint radiance resulted in a major increase in computational speed. However, from the limited investigation presented here, it can only be concluded that the adjoint term may be neglected if the scattering coefficient was known. If the recovery of the scattering coefficient was of interest, a radiance approximation to a minimum of N L ¼ 1 deg may be necessary. 31 The gradient-based inversion was found to benefit greatly from optimization algorithms in which parameters, such as the step sizes and the exponential decay rates of the Adam algorithm, were predefined to enable a fast and accurate convergence. A potential drawback of such methods was the need to test several sets of these parameters prior to an inversion to assess whether they have a positive impact on the convergence speed. Within the scope of this study, only minor and ad hoc parameter tuning was conducted. A more thorough investigation, including the development of automated parameter selection algorithms, may yield significantly faster convergence.
The chromophore-dependent step size and the fluence-dependent spatial step size scaling [Eqs. (16) and (17)] proved to be vital for achieving convergence. Without chromophore-dependent step sizes, Mb concentrations diverged to the upper and lower fit limits. Similarly, the fluence-dependent spatial step size scaling was crucial for achieving fast convergence in the regions of low fluence.
The selection of the excitation wavelengths could also be optimized further to improve inversion accuracy and convergence speed. 49 However, such a study would exceed the scope of this paper. Despite potentially suboptimal excitation wavelengths, the inversion is shown to recover blood sO 2 over a wide range [ Fig. 7(a)] with high accuracy (<1% error in sO 2 ) across the domain.
Gradient-based methods do not guarantee convergence to a global minimum, especially when the inversion is adversely affected by a noisy gradient. While the Adam optimization algorithm (compared to, for example, standard or momentum gradient descent) has been shown to greatly reduce the likelihood of finishing the inversion in a local minimum or on a saddle point, such a result cannot be ruled out entirely. It should also be noted that the application of this method to measured PA images does not require their segmentation into subregions. While this makes the method generally valid, some form of image segmentation may still be advantageous as it would reduce the number of variables and the risk of convergence to a local minimum and increase convergence speed. Moreover, explicit regularization of the objective function could further improve the accuracy and speed of the convergence.
While the general methodology of a gradient-based inversion using an adjoint formulation of an MC model has been demonstrated in silico, the application of this approach to experimental 3-D PA images, especially those acquired in vivo, requires further investigation. One of the perhaps most critical points is the recovery of the scattering coefficient as it is likely to have an impact on the importance of the radiance approximation using spherical harmonics. Other issues, such as the choice of inversion parameters and the selection of optimal excitation wavelengths, are also important in the translation of QPAT methods toward applications in the medical and life sciences.

Conclusions
An inversion scheme for recovering absolute chromophore concentrations and their ratios, such as blood sO 2 from 3-D 0 (a) (b) Fig. 7 Validation of the inversion scheme over a range of blood sO 2 . (a) Average recovered sO 2 as a function of the true sO 2 of all SVs and image data sets together with the line of unity (dashed line) and a AE5% error interval (dotted lines). In the five image data sets, the sO 2 in the inner tumor material varies from 10% to 90% in 20% increments. In another five image data sets, the background sO 2 ranges from 10% to 90% (20% increments). (b) Box-and-whisker-plot of the absolute difference between true and recovered sO 2 for each SV.
Journal of Biomedical Optics 066001-9 June 2019 • Vol. 24 (6) multiwavelength PA images was developed and validated in silico. The scheme was based on an adjoint formulation of an MC light-transport model and allowed an approximation of the radiance using spherical harmonics. It was found that the adjoint radiance was not required to obtain accurate inversion results, provided the scattering coefficient was constant. The speed of convergence was increased by incorporating the Adam optimization algorithm, chromophore-dependent step sizes, and fluence-dependent step-size scaling. This work represented an important step in the development of robust and generally applicable methods for quantitative functional and molecular PA imaging.

Appendix A: Definition of the Adjoint Model
The idea of the adjoint formalism is to define nonphysical quantities, an adjoint source q Ã ðr; λÞ, adjoint radiance ϕ Ã ðr;s; λÞ, and an adjoint fluence Φ Ã ðr; λÞ ¼ ∫ S 2 ϕ Ã ðr;s; λÞds that help in replacing the integral term containing the unknown ∂Φ ∂c i in the definition of the gradient. In our case the gradient equation is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 9 ; 6 3 ; 5 3 7 The adjoint approach has been used in the context of PAT earlier, see e.g., Refs. 17-19, 26, and 27.
As a first step, the adjoint source term is defined as Since the approach is targeted for multispectral QPAT, each wavelength requires its own definition of an adjoint source based on the difference between modeled p 0 ðr; λÞ and measured data p m 0 ðr; λÞ. We only denote one wavelength here and omit the dependence on λ for the sake of brevity.
The adjoint source is usually defined as the "prefactor" of the unknown term that is to be replaced and contains the error between modeled and measured data. By defining the behavior of the adjoint radiance ϕ Ã and adjoint fluence Φ Ã , a relationship between these and the desired ∂Φ ∂c i can be established, which is outlined in the following derivation. One important aspect of the definition of the adjoint quantity is to leave the equations underlying the development similar to the ones of their physical counterpart, which is in this context the time-independent RTE.
The RTE is given as Similarly, we define the adjoint RTE (ARTE) as Θðs;s 0 Þϕ Ã ðr;s 0 Þds 0 ¼ q Ã ðr;sÞ: One advantage of defining the adjoint radiance in that way is that the propagation dynamics are identical to that of the normal radiance defined by the RTE since the left-hand side of both the RTE and the ARTE are practically identical, the only difference being the negative sign in the ARTE indicating a change of direction in light propagation. One way to interpret the negative sign is to follow the propagation of photons in the opposite direction, which does not affect the photon's movement and the final results in terms of energy deposit. Thus, the mechanisms for absorption and scattering remain the same as in the RTE. Because the light transport is dominated by scattering and absorption and not whether photons move in the forward or backward direction, light propagation can be seen as reciprocal and hence the numerical framework implementing the ARTE is unaffected by the additional negative sign. Thus, the same simulation code as for the RTE can be used for the ARTE. Only the difference in the source distributions needs to be taken into account, with the adjoint source being 3-D, whereas normal source distributions are usually 2-D.
It is important to note that the adjoint fluence and adjoint radiance have different physical units as their physical counterparts. The adjoint radiance's unit is J∕ðm 3 srÞ and the adjoint fluence's unit is J∕ðm 3 Þ.
can be derived. The derivation follows the ideas presented in previous works, in particular Ref. 27. In Ref. 27, ∂RTE ∂c i is combined with the ARTE E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 4 ; 3 2 6 ; 3 5 1 The term ∂RTE ∂c i is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 5 ; 3 2 6 ; 2 9 7 ½s · ∇ þ μ a ðrÞ þ μ s ðrÞ ∂ϕðr;sÞ ∂c i þ ∂μ a ∂c i ϕðr;sÞ Θðs;s 0 Þ ∂ϕðr;s 0 Þ ∂c i ds 0 ¼ 0; as ∂q ∂c i ¼ 0 since the external light source does not depend on chromophore concentrations. The basic idea underlying the following steps is to rearrange all the terms in Eq. (24) so that all terms on the left-hand side can be set to zero after integrating over space and angles. First, we insert ∂μ a ∂c i ¼ αδðr −r i Þ and rearrange Eq.
The combination of the ARTE and the RTE from Eq. (24) is (for brevity we omit the dependency onr ands for the moment) Θðs;s 0 Þϕ Ã ds 0 The left-hand side can be simplified and can be given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 8 ; 6 3 ; 6 5 2 Θðs;s 0 Þϕ Ã ds 0 The next steps of the derivation are from now on identical to Eqs. (4.11 ff) in Ref. 27. The left-hand side of Eq. (28) equates to zero, which can be seen after integrating first over all angles s ∈ S n−1 and over the volume Ω with surface ∂Ω: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 9 ; 6 3 ; 5 0 9 Θðs;s 0 Þϕ Ã ðs 0 Þds 0 ds dΩ To see that the left-hand side equates to zero the volume integral with the terms involving ∇ can be transformed into a surface integral using this form of the divergence theorem: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 3 0 ; 6 3 ; 3 3 1 along with the following substitutions: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 3 1 ; 6 3 ; 2 7 7 a ¼ ϕ Ã ðsÞ; b ¼s and c ¼ ∂ϕðsÞ Hence, using the divergence theorem, the first two terms in Eq. (28) are replaced by a single term E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 7 ; 6 3 ; 2 0 9 ðs ·nÞϕ Ã ðsÞ ∂ϕðsÞ ∂c i ds dΩ: By definition, both ϕ Ã → 0 and ∂Φ∕∂c i → 0 on the boundary of the volume ∂Ω. Thus, the integrand on the right-hand side and hence the integral equate to zero. This reduces Eq. (28) to E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 3 2 ; 3 2 6 ; 7 4 Θðs;s 0 Þϕ Ã ðs 0 Þds 0 ds dΩ Because we can assume that all functions are integrable, the terms on the left-hand side of Eq. (32) can be rearranged after changing the order of integration, yielding The fluence Φ is by definition the integral of the timeintegrated radiance ϕðsÞ over all directionss, that is, Inserting this result into the error-gradient Eq. (19) provides us with the subgradient term obtained using one wavelength In summary, the adjoint formalism enables the update of the concentration distribution using only terms obtained from Journal of Biomedical Optics 066001-11 June 2019 • Vol. 24 (6) running the forward model implemented by the RMC algorithm.
The following section will focus on the question as to how the term ϕ Ã ϕ can be computed and approximated.

Appendix C: Radiance Approximations
To simulate the radiance, some discretization over angle is required. One option is to use a set of piecewise constant basis functions. However, due to the importance of ballistic and quasi-ballistic light propagation in PAT, a high number of discretization orders would be required to capture the directionality of the radiance in regions near the source. This would result in very large memory demands in finite-element implementations (see Refs. 29, 50, and 51, for more details on different approaches for estimating the radiance, their limitations, and suggested solutions. For a summary thereof, see Ref. 52). Inspired by the P n approximation 22 and continuing the work presented by Refs. 19 and 52 we approximate the radiance in 3-D using spherical harmonics as basic functions, as in Ref. 31. Instead of discretizing the angular domain into segments, the radiance field ϕ at any positionr can be expanded using a series of spherical harmonics 53,54 where ψ lm ðrÞ are the coefficients corresponding to the real spherical harmonics Y lm ðr;sÞ, expressed as where l is the degree of the spherical harmonic, m is the order, and P m l is the associated Legendre polynomials. The coefficient ψ lm ðrÞ scales the total weight deposited by all simulated photons at position (voxel)r for the associated spherical harmonic.
The advantage of expressing ϕ Ã ϕ in a spherical harmonics expansion lies in the fact that the Y lm forms an orthonormal basis, i.e.,