Reconstruction-classification method for quantitative photoacoustic tomography

We propose a combined reconstruction-classification method for simultaneously recovering absorption and scattering in turbid media from images of absorbed optical energy. This method exploits knowledge that optical parameters are determined by a limited number of classes to iteratively improve their estimate. Numerical experiments show that the proposed approach allows for accurate recovery of absorption and scattering in 2 and 3 dimensions, and delivers superior image quality with respect to traditional reconstruction-only approaches.


Introduction
Photoacoustic tomography (PAT) is an emerging technique for in vivo imaging of soft biological tissue [1]. This hybrid modality uses ultrasound to detect optical contrast, combining the high resolution of acoustic methods with the spectroscopic capability of optical imaging. To generate a PA image, a short laser pulse is shone into the object, the ultrasonic waves emitted following the heating of the tissue are measured, and an image of the absorbed optical energy field is recovered. Whereas purely optical methods suffer from poor spatial resolution, acoustic waves propagate with minimal scattering and PAT can achieve 100 micron resolution at depths of several centimetres. However, PA images provide only qualitative information about the tissue, and are not directly related to tissue morphology and functionality. The principal difficulty is that the PA image is the product of both the optical absorption coefficient (which is directly related to underlying tissue composition) and the light distribution (which is not). This severely restricts the range of applications for which PAT is suitable.
Quantitative photoacoustic tomography (QPAT) aims to provide clinically valuable images of the optical absorption and scattering coefficients, or chromophore (light-absorbing molecules) concentrations from conventional PA images via an image reconstruction method [2]. A model of light propagation is required to relate the absorbed optical energy to the light fluence and tissue parameters. The primary challenge of QPAT is solving the non-linear imaging problem. In particular, recovering the scattering coefficient is especially difficult to due to its weak dependence on the absorbed energy density.
In this paper, we develop a method for solving the image reconstruction problem for QPAT by alternating reconstruction and segmentation steps in an automated iterative process. We introduce a probabilistic model that describes optical properties in terms of a limited number of optically distinct classes, which may correspond to tissues or chromophores. These are identified and characterized by a classification, or segmentation, algorithm. This approach allows for the use of information retrieved by the classification in the reconstruction stage, and vice versa. The aim of the reconstruction is to choose solutions for which the image parameters take values close to a finite set of discrete points. The aim of the classification algorithm is to progressively improve the parametric optical model, and correct for errors in the initial assumptions. Multinomial models have been employed previously in the related fields Diffuse Optical Tomography [3] and Electrical Impedance Tomography [4]. For QPAT, the main advantage is that this approach enables accurate recovery of both the absorption and scattering coefficients, simultaneously.

Quantitative photoacoustic imaging
A conventional PAT image is proportional to the absorbed optical energy where r is a position vector within the domain Ω, µ a and µ s are the optical absorption and reduced scattering coefficients, φ is the optical fluence, andΓ is the Grüneisen parameter. The Grüneisen parameter represents the efficiency with which the tissue converts heat into acoustic pressure, and is often taken to be constantΓ(r) = 1, ∀r ∈ Ω. The fluence is dependent on the optical parameters and illumination pattern in the whole domain. The problem of recovering the optical parameters µ a , µ s from a conventional PAT image is known as the quantitative problem. The optical absorption µ a is of particular interest because it is fundamentally related to underlying tissue physiology and functionality, and encodes clinically useful information such as tissue oxygenation levels and chromophore concentrations. Conversely, the absorbed energy density H depends nontrivially on optical absorption, thus is not directly related to tissue morphology because it is distorted, structurally and spectrally, by the non-uniform light fluence.

The diffusion model of light transport
In order to recover the optical parameters µ a , µ s , a model of light propagation within the tissue is required. For highly scattering media and far from boundaries and sources, a low order spherical harmonic approximation to the radiative transfer equation is suitable. The diffusion approximation is given by [5] (µ a − ∇ · κ(r)∇) φ(r) = q(r), where q(r) is an isotropic source term, and κ = 1/3µ s is the diffusion coefficient. We set Robin boundary conditions where A accounts for the refractive index mismatch at the boundary.

Minimization-based QPAT imaging
In this paper, we adopt a gradient-based minimization approach to image reconstruction. Typically, both µ a and µ s are unknown and need to be recovered simultaneously from the absorbed energy density. An objective function is defined, which measures the distance between the conventional PAT image H m and the data predicted by the model for the current estimates H(µ a , µ s ).
In order to treat the problem for a generic geometry, the Finite Element Method (FEM) is employed, whereby a weak formulation of the diffusion approximation (2) is considered. A discretization of the domain is defined, and the fluence and optical parameters are expressed in terms of piecewise linear basis functions u i (r): χ ≈ i χ i u i (r) for χ ∈ µ a , µ s , φ , where χ i are nodal coefficients and i = 1, . . . , N.
We assume that the data d m is the absorbed energy density H m , projected onto a particular basis Ψ j , Choices for Ψ j include: 1. Point sampling Ψ j (r) = δ(r − r j ), 2. Piecewise-linear sampling Ψ j = u j , 3. Sinc sampling Ψ j = sinc( r − r j ).
Substituting into the the objective function (4) leads to the discrete form of the objective function (7) If a single illumination source is used and both absorption and scattering are undetermined, the problem is ill posed [2]. In this study, the non-uniqueness of the solution was removed by using multiple illumination patterns [6], thus the objective function must be summed over the number of sources. In the following, we have omitted this sum for ease of notation. Prior information regarding the solution can be included by adding a regularization term In the Bayesian framework, an image is obtained by maximizing the posterior probability of the parameters, given the data: Under this interpretation, the regularization term R is given by the negative log of the prior probability distribution

Gradient calculations
Cox et al. [7] have shown that, for the continuous case, the gradient of (4) with respect to µ a at position r 0 is given by where φ * is the adjoint light field. In the following, we derive the expression for the gradient in the discrete case. The sampled forward model can be expressed as a vector H = H j , j = 1, . . . , N , where C j is a sparse matrix with entries i, k where the support of the basis functions Ψ j (r), u i (r), u k (r) overlap. Taking the derivative of (7) with respect to µ a i , we have Using the expression for the absorbed energy density (12), where e i is a vector of zeros with a single 1 in position i. Sub-stituting into (13) gives The first term in equation (15) is where E i is given by a reordering of C j Note that while C j is symmetric, in general E i is not.
It remains to determine ∂φ ∂µ a i . The discrete form of the DA model (2) assumes the form [8]: where Taking the derivative of equation (18) with respect to the ith coefficient of µ a , is given by the derivative of the system matrix. We define the adjoint field φ * as the solution to the equation is the adjoint source. Taking φ * · (23) − ∂φ Substituting into (15) gives the expression for the derivative with respect to µ a i The derivative with respect to µ s i can be derived analogously: where Note that calculation of the gradient only requires two runs of the forward model. The forward problem was solved using the Toast++ software package [8].
Choosing point-sampling Ψ j (r) = δ(r − r j ), gives simply C j = E i = I. In this study, we chose piecewise-linear sam-

Reconstruction-classification method for QPAT
A reconstruction-classification scheme is devised, which enables the recovery µ a and µ s by approaching the image reconstruction and segmentation problems simultaneously. At each reconstruction step, we minimize a regularized objective function, where the regularization term is given by a mixture model. At each classification step, the result of the previous reconstruction step is employed to update the class parameters for the multinomial model. We alternate between reconstruction and classification steps for a fixed number of iterations.

Mixture model for µ a and µ s
In this section we introduce a probability model for µ a and µ s , which encodes prior knowledge about the optical parameters and allows us to bias the solution of the imaging problem accordingly. We assume that an array of labels ζ i can be determined for each node, such that The labels constitute hidden variables on which the image parameters are dependant. For each class j = 1, . . . , J, a mean vector m j = μ a j ,μ s j ∈ R 2 is defined, and the closeness of the optical parameters to the mean values is described by a covariance matrix Σ j ∈ R 2×2 . We assume that if ζ i j = 1, the probability distribution for x i = µ a i , µ s i is given by a multivariate Gaussian distribution where θ j indicates the set of class parameters (m j , Σ j ).
The prior probability distribution of the class properties θ j is given by the conjugate prior to the Gaussian distribution. Prior information about the distribution of the class means or covariances can be encoded by choosing the parameters of the conjugate prior accordingly. Using a non-informative prior for the class means we have p(m j ) ∝ 1. The conjugate prior distribution for the covariance of a normal distribution is given by the normal inverse Wishart distribution where d is the dimension of the domain, ν j indicates the number of degrees of freedom, and Γ j is a scaling matrix. If the prior is non-informative, then ν j = 0 and Γ j = 0, and the probability distribution of the class parameters becomes which is known as Jeffreys prior.
The probability that the set of labels ζ i = {ζ i1 , ..., ζ i j , ..., ζ iJ } is assigned to the ith node is given by a multinomial distribution where λ j is the overall probability that a node is assigned to the jth class. Therefore the joint probability for (x i , ζ i ) is given by the product By marginalizing over all possible values of the indicator variables ζ i j , a mixture of Gaussians model for the optical parameters is obtained Finally, for independent nodes the prior of the image is given by

Reconstruction step
The objective function takes the form of equation (8), where at iteration t of the reconstruction-classification algorithm the regularization is given by (equations (10) and (39)) where τ is a regularization parameter and is obtained by the fixing the labels to the maximum a posteriori estimate, given the results of the previous iteration which is calculated in the classification step (see section 3.1.2). The weighting matrix Lx is the Cholesky decomposition of Σx −1 , where Σx ∈ R 2N×2N is a sparse matrix of which the ith 2 × 2 block along the diagonal is Σ j if the ith element belongs to the j th class. In order to sphere the solution space, that is to render the space dimensionless, we performed a change of variables µ a → µ a /µ a 0 and µ s → µ a /µ s 0 , where (µ a 0 , µ s 0 ) is the initial guess for the optical parameters (in this study, we initialized to the homogeneous background). Given the size of problem, we chose a gradient-based optimization method in order to reduce memory use and computational expense [9]. The minimization was performed using the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method [10], with a storage memory of 6 iterations.

Classification
The purpose of the classification step is to update the multinomial model, using the result of the previous reconstruction step. First, the expected values of the labels ζ t+1 are computed for the current class parameters (θ t , λ t ) and image x t = (µ a t , µ s t ) (E-step). Then the model parameters are updated by maximizing the posterior probability (M-step) p(θ, λ|x t ) ∝ p(x t |θ, λ)p(θ, λ). (43)

E-step:
The responsibility r t i j is a measure of the probability that the ith node is assigned to the jth class. Using Bayes' theorem and the Gaussian mixture model (38) we have The expectation for the indicator values is Therefore the MAP estimate for the labels is which can be used in equation (42).

Class means initialization
The number of classes J and the class means m j were initialized by automatically segmenting the result of the first reconstruction step and averaging over the segmented areas. To segment the image (for example, see figure 1a) we looked at a binned histogram of the image of µ a and chose the value µ a h for which the number of occurrences was highest (figure 1c, column 1). We found the first node index h for which the value µ a h occurs, and identified the corresponding scattering value µ s h . Having chosen a covariance matrix Σ h , we computed a map of the multivariate normal probability of the (µ a , µ s ) images, with mean (µ a h , µ s h ) ( figure 1c, column 2). Then we selected a tolerance level tol h at which to truncate the probability map, and selected all nodes with probability higher than the tolerance as belonging to the same class as node h ( figure 1c, column 3). We repeated this process on the remaining nodes until all nodes were classified. Thus the number of classes was set to the number of iterations, and the average of the optical parameters over each class was used to initialize the class means (figure 1b).

Visualization of the results
Results obtained using the reconstruction-classification method are displayed alongside scatter plots of the nodal values recovered in the 2D feature space (µ a , µ s ) (for example, see figure final column in 4). The positions of the class means m j = (μ a j ,μ s j ) are identified by a cross, and the class covariances Σ j are represented by ellipses. These are colour coded by class, and are indicative of the clustering of image nodal values around the class means.

2D validation and reconstruction
We chose a numerical phantom defined on a 2D circular mesh with 1331 nodes and radius 25 mm. Four illumination sources were placed on the boundary at angles 0, π/2, π and 3π/2 rad. In all cases the illumination profile was a normalized Gaussian with radius (distance from the centre at which the profile drops to 1/e) 6 mm. The background optical parameters were set to µ a = 0.01 mm −1 and µ s = 1 mm −1 . Two circular perturbations of radius 6 mm were added in positions (6 mm, 10 mm) and (−6 mm, −10 mm) (figure 3a). The values of the perturbations were µ a = 0.02 mm −1 , µ s = 1.5 mm −1 and µ a = 0.03 mm −1 , µ s = 1.25 mm −1 , respectively. The absorbed energy field was simulated for each illumination and 1% white Gaussian noise was added (figure 3b). The class covariances were initialized to where the first variable was the absorption and the second was the reduced scattering. The parameters of the Jeffreys prior were set to Γ j = Σ j ∀ j, ν(1) = 1 for the background class, and ν(2, 3) = 10 for the perturbation classes. The number of classes and optical parameters were initialized using the class means initialization method (section 3.2) with tol h = 10 −5 and Σ h = Σ j (53), and the labels were initialized to 1 for the background class and zero for all other classes. The tolerance of the L-BFGS algorithm was set to tol = 10 −11 and the total number of reconstruction-classification iterations was set to MaxIt = 10 ( figure 4). The regularization parameter τ = 10 −10 was chosen by inspection. For comparison, images were reconstructed without introducing a prior (figure 5); the images were reconstructed by minimizing (7) using the L-BFGS method with tol = 10 −12 .

3D validation and reconstruction
We chose a 3D phantom analogous to the 2D case, defined on a cylinder with 27084 nodes, radius 25 mm and height 25 mm. Two spherical inclusions of radius 6 mm were placed in (6 mm, 10 mm, 0 mm) and (−6 mm, −10 mm, 0 mm) (figure 6a). Illuminations sources were Gaussian in the xy-plane constant in the z-axis, with radius 6 mm and length 25 mm (figures 6b, 6c). PAT images were simulated for 4 illuminations at the cardinal points, and 1% noise was added to the absorbed energy  (indicated by a red cross); second column, probability density function with mean(µ ah , µ sh ) and covariance Σ h ; third column, labels identifying nodes with probability density higher than tolerance value tol h ; each row corresponds to an iteration and a distinct class, so in this case J = 3.

Summary of findings
We applied the proposed reconstruction-classification algorithm to a 2D numerical phantom with 3 tissues, a background and 2 perturbations (figure 3). The optical absorption was recovered reliably within a small number of iterations, and the scattering was recovered with sufficient accuracy after approximately 10 iterations ( figure 4). We compared the optical model with images obtained by the reconstruction-classification method, and by a traditional reconstruction-only (no regularization) method (figure 5). We found that the reconstructionclassification method delivered superior image quality, particularly with regards to the scattering parameter. We applied the reconstruction-classification algorithm to a much larger 3D problem ( figure 6) and observed similar results (figure 7) as in the 2D case.

Choice of parameters
The parametric optical model and classification algorithm introduce a number of parameters which require tuning by the  user. In addition to the regularization parameter, the parameters of the Jeffreys prior Γ and ν and the initial guess of the class variances Σ j must be set before performing the classification. However, their significance is fairly intuitive, and with experience of a certain type of problem the choice of parameters becomes natural. Visualizing the class covariance matrix Σ j as an ellipse, changing the value of Γ varies its eccentricity, and changing ν varies the length of its axes. Further, given that in the first iteration the optical absorption is recovered with superior accuracy than the scattering, it is preferable to initialize the variance of the former to a smaller value than the latter, indicating greater confidence in the imaging solution.

Initialization of the class means
The purpose of the means initialization scheme is to increase automation of the method, so that minimum user intervention and no prior knowledge of the number of tissues or their optical properties is required. The algorithm simply performs a segmentation of the image, and then takes averages over the segmented areas to initialize the class properties (figure 1). Alternative segmentation techniques could have been employed, however the advantage of the proposed approach is that it directly exploits the mixture of Gaussians model to identify the tissues. Our choice to investigate a node h with µ a belonging to the bin with maximum number of occurrences leads to the background tissue being identified first, followed by the perturbation tissues. The choice of the node index h could have been randomized, so that tissues are identified in random order. This approach is equally valid, however we found that in cases where tissue values were close together (such as after a single reconstruction-classification iteration) it was preferable to identify the largest classes first because the mean was estimated with greater accuracy for the classes with a larger number of samples. Further, for a given image and tolerance level, our choice renders the result of the segmentation process unique and reproducible.

Recovery of the scattering
From the comparison with the reconstruction-only case with no regularization (figure 5), it is evident that the introduction of the parametric prior enables better recovery of the scattering. The inconsistency between the quality of the recovered absorption and scattering parameters in the non-regularized case is due to the weaker dependence of the latter on the absorbed energy density with respect to the former. This results in the scattering gradient being approximately an order of magnitude smaller than the absorption gradient. Although the problem can be mitigated by sphering the solution space, variations in the data due to the scattering often fall below the noise floor. In the reconstruction-classification case, typically the absorption is recovered with good accuracy within a small number of iterations. Thus, the absorption takes values very close to the class means (resulting in small clusters), and the variance along the µ a direction converges to a small value. Given that the regular- ization term is weighted by the inverse of the covariance matrix, the dependence of the absorption gradient on the data becomes weaker at each iteration, until its magnitude is comparable or smaller to that of the scattering. In the iterations that follow, the descent of the data term of the objective function is primarily due to updating the scattering, which converges to the correct values.

Computational demands
Computational performance was found to be strongly dependent on the problem size. In the 2D case with 1331 nodes (figure 4), the total reconstruction time (10 outer reconstructionclassification iterations) using Matlab on a 16 core PC with 128GiB RAM was only 77 seconds. In the 3D case with 27084 nodes (figure 7), the total reconstruction time increased linearly with the number of nodes, and on the same workstation was approximately 3.7 hours. The increase in computation time was mostly due to much longer processing times for the L-BFGS algorithm in the reconstruction step.

Experimental application
In experimental situations, prior information on tissue properties may be held, such as knowledge of the characteristic optical absorption and scattering spectra of chromophores of interest. These may be obtained from the literature [12], or gained through tissue sample measurements. This information could be used in one of two ways. Firstly, a library of typical chromophores could be used to initialize the class parameters, instead of the proposed class means initialization method. The classification process could then perform the function of correcting for uncertainty, errors or local variations in the real optical properties with respect to the prior information. Alternatively, it could be used to label the chromophores found by the segmentation process, and identify these as certain tissues such as for example 'oxygenated blood' or 'fat', on the basis of the closeness of the recovered means to the characteristic properties.

Additional priors
In this study we assumed independence between nodal values, however the mixture of Gaussian model could be used in conjunction with a spatial prior. Knowledge of smoothness or sparsity properties of the solution could be employed to introduce a homogeneous spatial regularizer such as first-order Tikhonov [13] or Total Variation [6,14]. Knowledge of structural information, such as that provided by an alternative imaging method or anatomical library, could be exploited by introducing a spatially varying probability map for the optical properties.

Conclusions
In this paper, we proposed a novel method for performing image reconstruction in QPAT. We introduced a parametric class model for the optical parameters, and implemented a minimization-based reconstruction algorithm. We suggested an automated method by which to initialize the parameters of the class model, and proposed a classification algorithm by which to progressively update and improve those parameters after each reconstruction step. We demonstrated though 2D and 3D numerical examples that the reconstruction-classification method allows for the simultaneous recovery of optical absorption and scattering. In particular, we found that this approach delivered superior accuracy in the recovery of the scattering with respect to traditional gradient-based reconstruction.