## 1.

## Introduction

Photoacoustic tomography (PAT) is an emerging technique for *in vivo* imaging of soft biological tissue.^{1} This hybrid modality uses ultrasound to detect optical contrast, combining the high resolution of acoustic methods with the spectroscopic capability of optical imaging. To generate a PA image, a short laser pulse is shined into the object, the ultrasonic waves emitted following the heating of the tissue are measured, and an image of the absorbed optical energy field is recovered. Whereas purely optical methods suffer from poor spatial resolution, acoustic waves propagate with minimal scattering, and PAT can achieve $100\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$ resolution at depths of several centimeters. However, PA images provide only qualitative information about the tissue, and are not directly related to tissue morphology and functionality. The principal difficulty is that the PA image is the product of both the optical absorption coefficient (which is directly related to underlying tissue composition) and the light distribution (which is not). This severely restricts the range of applications for which PAT is suitable.

Quantitative photoacoustic tomography (QPAT) aims to provide clinically valuable images of the optical absorption and scattering coefficients, or chromophore (light-absorbing molecules) concentrations from conventional PA images via an image reconstruction method.^{2}^{,}^{3} A model of light propagation is required to relate the absorbed optical energy to the light fluence and tissue parameters. The primary challenge of QPAT is solving the nonlinear imaging problem. In particular, recovering the scattering coefficient is especially difficult due to the weak dependence of the absorbed energy density on scattering.

In this paper, we develop a method for solving the image reconstruction problem for QPAT by alternating reconstruction and segmentation steps in an automated iterative process. We introduce a probabilistic model that describes optical properties in terms of a limited number of optically distinct classes, which may correspond to tissues or chromophores. These are identified and characterized by a classification, or segmentation, algorithm. This approach allows for the use of information retrieved by the classification in the reconstruction stage and vice versa. The aim of the reconstruction is to choose solutions for which the image parameters take values close to a finite set of discrete points. The aim of the classification algorithm is to progressively improve the parametric optical model and correct for errors in the initial assumptions. Multinomial models have been employed previously in the related fields diffuse optical tomography^{4} and electrical impedance tomography.^{5} For QPAT, the main advantage is that this approach enables accurate recovery of both the absorption and scattering coefficients simultaneously.

## 2.

## Numerical Methods

## 2.1.

### Quantitative Photoacoustic Imaging

A conventional PAT image is proportional to the absorbed optical energy

## (1)

$$H(\mathit{r})=\widehat{\mathrm{\Gamma}}(\mathit{r}){\mu}_{a}(\mathit{r})\phi [{\mu}_{a}(\mathit{r}),{\mu}_{s}^{\prime}(\mathit{r})]\phantom{\rule[-0.0ex]{1em}{0.0ex}}\mathit{r}\in \mathrm{\Omega},$$## 2.2.

### Diffusion Model of Light Transport

In order to recover the optical parameters $({\mu}_{a},{\mu}_{s}^{\prime})$, a model of light propagation within the tissue is required. For highly scattering media and those far from boundaries and sources, a low-order spherical harmonic approximation to the “radiative transfer equation” is suitable. The “diffusion approximation” is given by^{6}

We set Robin boundary conditions

## (3)

$$\phi (\mathit{r})+\frac{1}{2A}\kappa (\mathit{r})\widehat{n}\xb7\nabla \phi (\mathit{r})=0\phantom{\rule[-0.0ex]{1em}{0.0ex}}\mathit{r}\in \delta \mathrm{\Omega},$$## 2.3.

### Minimization-Based Quantitative Photoacoustic Tomography Imaging

In this paper, we adopt a gradient-based minimization approach to image reconstruction. Typically, both ${\mu}_{a}$ and ${\mu}_{s}^{\prime}$ are unknown and need to be recovered simultaneously from the absorbed energy density. An objective function is defined, which measures the distance between the conventional PAT image ${H}^{m}$ and the data predicted by the model for the current estimates $H({\mu}_{a},{\mu}_{s}^{\prime})$

## (4)

$$\mathcal{E}=\frac{1}{2}{\int}_{\mathrm{\Omega}}{[{H}^{m}-H({\mu}_{a},{\mu}_{s}^{\prime})]}^{2}\mathrm{d}\mathrm{\Omega}.$$We assume that the data ${\mathit{d}}^{m}$ is the absorbed energy density ${H}^{m}$, projected onto a particular basis $\{{\mathrm{\Psi}}_{j}\}$,

## (5)

$${\mathit{d}}^{m}=\{{d}_{j}^{m},j=1,\dots ,N\},\phantom{\rule[-0.0ex]{2em}{0.0ex}}{d}_{j}^{m}={\int}_{\mathrm{\Omega}}{H}^{m}(\mathit{r}){\mathrm{\Psi}}_{j}(\mathit{r})\mathrm{d}\mathrm{\Omega}=\u27e8{\mathrm{\Psi}}_{j},{H}^{m}\u27e9.$$1. Point sampling ${\mathrm{\Psi}}_{j}(\mathit{r})=\delta (\mathit{r}-{\mathit{r}}_{j})$,

2. Piecewise-linear sampling ${\mathrm{\Psi}}_{j}={u}_{j}$,

3. Sinc sampling ${\mathrm{\Psi}}_{j}=\mathrm{sinc}(|\mathit{r}-{\mathit{r}}_{j}|)$.

Substituting into the objective function [Eq. (4)] leads to the discrete form of the objective function

## (6)

$$\mathcal{E}=\frac{1}{2}\sum _{j}{[{d}_{j}^{m}-\u27e8{\mathrm{\Psi}}_{j},H({\mu}_{a},{\mu}_{s}^{\prime})\u27e9]}^{2}=\frac{1}{2}\sum _{j}{[{d}_{j}^{m}-\u27e8{\mathrm{\Psi}}_{j},{\mu}_{a}\phi \u27e9]}^{2}.$$If a single illumination source is used and both absorption and scattering are undetermined, the problem is ill posed.^{2} In this study, the nonuniqueness of the solution was removed by using multiple illumination patterns,^{7}8.^{–}^{9} thus the objective function must be summed over the number of sources. In the following, we have omitted this sum for ease of notation. Prior information regarding the solution can be included by adding a regularization term

## (7)

$$\mathcal{E}=\frac{1}{2}\sum _{j}{({d}_{j}^{m}-\u27e8{\mathrm{\Psi}}_{j},{\mu}_{a}\phi \u27e9)}^{2}+\mathcal{R}({\mathit{\mu}}_{\mathit{a}}^{\prime},{\mathit{\mu}}_{\mathit{s}}^{\prime}).$$## (8)

$$\mathrm{p}({\mathit{\mu}}_{\mathit{a}},{\mathit{\mu}}_{\mathit{s}}^{\prime}|{\mathit{d}}^{m})\propto \mathrm{p}({\mathit{d}}^{m}|{\mathit{\mu}}_{\mathit{a}},{\mathit{\mu}}_{\mathit{s}}^{\prime})\mathrm{p}({\mathit{\mu}}_{\mathit{a}},{\mathit{\mu}}_{\mathit{s}}^{\prime}).$$## 2.4.

### Gradient Calculations

Cox et al.^{10} have shown that, for the continuous case, the gradient of Eq. (4) with respect to ${\mu}_{a}$ at position ${\mathit{r}}^{0}$ is given by

## (10)

$${\frac{\partial \mathcal{E}}{\partial {\mu}_{a}}|}_{{\mathit{r}}^{0}}=-{\phi ({H}^{m}-H)|}_{{\mathit{r}}^{0}}+{\phi \xb7{\phi}^{*}|}_{{\mathit{r}}^{0}},$$## (11)

$$[{\mu}_{a}-\nabla \xb7\kappa (\mathit{r})\nabla ]{\phi}^{*}(\mathit{r})={\mu}_{a}({H}^{m}-H).$$In the following, we derive the expression for the gradient in the discrete case. The sampled forward model can be expressed as a vector $\mathit{H}=\{{H}_{j},j=1,\dots ,N\}$

## (12)

$${H}_{j}={\int}_{\mathrm{\Omega}}H(\mathit{r}){\mathrm{\Psi}}_{j}(\mathit{r})\mathrm{d}\mathrm{\Omega}=\u27e8{\mathrm{\Psi}}_{j},H\u27e9.\phantom{\rule{0ex}{0ex}}=\sum _{ik}{\mu}_{a\text{\hspace{0.17em}}i}{\phi}_{k}{\int}_{\mathrm{\Omega}}{\mathrm{\Psi}}_{j}(\mathit{r}){u}_{i}(\mathit{r}){u}_{k}(\mathit{r})\mathrm{d}\mathrm{\Omega}={\mathit{\phi}}^{\mathrm{T}}{\mathbf{C}}^{j}{\mathit{\mu}}_{\mathit{a}},$$## (13)

$$\frac{\partial \mathcal{E}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}}=-\sum _{j}\left(\frac{\partial {H}_{j}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}}\right)({d}_{j}^{m}-{H}_{j}).$$## (14)

$$\frac{\partial {H}_{j}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}}={\mathit{e}}_{i}^{\mathrm{T}}{\mathbf{C}}^{j}\mathit{\phi}+{\mathit{\mu}}_{\mathit{a}}^{\mathrm{T}}{\mathbf{C}}^{j}\frac{\partial \mathit{\phi}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}},$$## (15)

$$\frac{\partial \mathcal{E}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}}=-\sum _{j}({\mathit{e}}_{i}^{\mathrm{T}}{\mathbf{C}}^{j}\mathit{\phi}+{\mathit{\mu}}_{\mathit{a}}^{\mathrm{T}}{\mathbf{C}}^{j}\frac{\partial \mathit{\phi}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}})({d}_{j}^{m}-{H}_{j}).$$## (16)

$$\sum _{j}{\mathit{e}}_{i}^{\mathrm{T}}{\mathbf{C}}^{j}\mathit{\phi}({d}_{j}^{m}-{H}_{j})=\sum _{j,i,k}{e}_{i}{C}_{ik}^{j}{\phi}_{k}({d}_{j}^{m}-{H}_{j})\phantom{\rule{0ex}{0ex}}=\sum _{j,k}{\phi}_{k}({d}_{j}^{m}-{H}_{j}){\int}_{\mathrm{\Omega}}{\mathrm{\Psi}}_{j}(\mathit{r}){u}_{i}(\mathit{r}){u}_{k}(\mathit{r})\mathrm{d}\mathrm{\Omega}\phantom{\rule{0ex}{0ex}}={\mathit{\phi}}^{\mathrm{T}}{\mathbf{E}}^{i}({\mathit{d}}^{m}-\mathit{H}),$$## (17)

$${E}_{kj}^{i}={\int}_{\mathrm{\Omega}}{\mathrm{\Psi}}_{j}(\mathit{r}){u}_{i}(\mathit{r}){u}_{k}(\mathit{r})\mathrm{d}\mathrm{\Omega}.$$It remains to determine $\partial \mathit{\phi}/\partial {\mu}_{a\text{\hspace{0.17em}}i}$. The discrete form of the DA model [Eq. (2)] assumes the form^{11}

## (19)

$${M}_{jk}=\sum _{i}{\mu}_{a\text{\hspace{0.17em}}i}{\int}_{\mathrm{\Omega}}{u}_{i}{u}_{j}{u}_{k}\mathrm{d}\mathrm{\Omega},$$## (20)

$${K}_{jk}=\sum _{i}{\kappa}_{i}{\int}_{\mathrm{\Omega}}{u}_{i}\nabla {u}_{j}\xb7\nabla {u}_{k}\mathrm{d}\mathrm{\Omega},$$## (23)

$$(\mathbf{M}+\mathbf{K}+\mathbf{F})\frac{\partial \mathit{\phi}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}}=-{\mathbf{V}}_{{\mu}_{a}}^{i}\mathit{\phi},$$## (24)

$${V}_{{\mu}_{a},jk}^{i}={\int}_{\mathrm{\Omega}}{u}_{i}{u}_{j}{u}_{k}\mathrm{d}\mathrm{\Omega}$$## (26)

$${\mathit{Q}}^{*}=\sum _{j}{\mathit{\mu}}_{\mathit{a}}^{\mathrm{T}}{\mathbf{C}}^{j}({d}_{j}^{m}-{H}_{j})$$## (27)

$$\sum _{j}{\mathit{\mu}}_{\mathit{a}}^{\mathrm{T}}{\mathbf{C}}^{j}\frac{\partial \mathit{\phi}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}}({d}_{j}^{m}-{H}_{j})=-{\mathit{\phi}}^{\mathrm{T}}{\mathbf{V}}_{{\mu}_{a}}^{i}{\mathit{\phi}}^{*}.$$## (28)

$$\frac{\partial \mathcal{E}}{\partial {\mu}_{a\text{\hspace{0.17em}}i}}={\mathit{\phi}}^{\mathrm{T}}[{\mathbf{V}}_{{\mu}_{a}}^{i}{\mathit{\phi}}^{*}-{\mathbf{E}}^{i}({\mathit{d}}^{m}-\mathit{H})].$$## (29)

$$\frac{\partial \mathcal{E}}{\partial {\mu}_{s\text{\hspace{0.17em}}i}^{\prime}}=-\frac{\partial {\kappa}_{i}}{\partial {\mu}_{s\text{\hspace{0.17em}}i}^{\prime}}{\mathit{\phi}}^{\mathrm{T}}{\mathbf{V}}_{{\mu}_{s}^{\prime}}^{i}{\mathit{\phi}}^{*},$$## (30)

$${V}_{{\mu}_{s}^{\prime},jk}^{i}={\int}_{\mathrm{\Omega}}{u}_{i}\nabla {u}_{j}\xb7\nabla {u}_{k}\mathrm{d}\mathrm{\Omega}$$^{11}

Choosing point-sampling ${\mathrm{\Psi}}_{j}(\mathit{r})=\delta (\mathit{r}-{\mathit{r}}_{j})$ gives simply ${\mathbf{C}}^{j}={\mathbf{E}}^{i}=\mathbf{I}$. In this study, we chose piecewise-linear sampling ${\mathrm{\Psi}}_{j}={u}_{j}$, so we had ${\mathbf{C}}^{j}={\mathbf{E}}^{i}={\mathbf{V}}_{{\mu}_{a}}^{i}$ and

## 3.

## Reconstruction-Classification Method for Quantitative Photoacoustic Tomography

A reconstruction-classification scheme is devised, which enables the recovery ${\mu}_{a}$ and ${\mu}_{s}^{\prime}$ by approaching the image reconstruction and segmentation problems simultaneously. At each reconstruction step, we minimize a regularized objective function, where the regularization term is given by a mixture model. At each classification step, the result of the previous reconstruction step is employed to update the class parameters for the multinomial model. We alternate between reconstruction and classification steps for a fixed number of iterations (Fig. 1).

## 3.1.

### Mixture Model for ${\mu}_{a}$ and ${\mu}_{s}^{\prime}$

In this section, we introduce a probability model for ${\mu}_{a}$ and ${\mu}_{s}^{\prime}$, which encodes prior knowledge about the optical parameters and allows us to bias the solution of the imaging problem accordingly. We assume that an array of labels ${\mathit{\zeta}}_{i}$ can be determined for each node, such that

## (32)

$${\zeta}_{ij}=\{\begin{array}{ll}1& \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}{i}^{\prime}\mathrm{th}\text{\hspace{0.17em}}\text{node is assigned to the}\text{\hspace{0.17em}}{j}^{\prime}\mathrm{th}\text{\hspace{0.17em}}\text{class},\\ 0& \text{otherwise}.\end{array}$$We assume that if ${\zeta}_{ij}=1$, the probability distribution for ${\mathit{x}}_{i}=({\mu}_{a\text{\hspace{0.17em}}i},{\mu}_{s\text{\hspace{0.17em}}i}^{\prime})$ is given by a multivariate Gaussian distribution

## (33)

$$\mathrm{p}({\mathit{x}}_{i}|{\mathit{\theta}}_{j})=\mathcal{N}({\mathit{m}}_{j},{\mathrm{\Sigma}}_{j}),$$The prior probability distribution of the class properties ${\mathit{\theta}}_{j}$ is given by the conjugate prior to the Gaussian distribution. Prior information about the distribution of the class means or covariances can be encoded by choosing the parameters of the conjugate prior accordingly. Using a noninformative prior for the class means we have $\mathrm{p}({\mathit{m}}_{j})\propto 1$. The conjugate prior distribution for the covariance of a normal distribution is given by the normal inverse Wishart distribution (NIW):

## (34)

$$\mathrm{NIW}({\nu}_{j},{\mathrm{\Gamma}}_{j})={|{\mathrm{\Sigma}}_{j}|}^{-(\nu +d+1)/2}\text{\hspace{0.17em}}\mathrm{exp}[-\frac{1}{2}\text{\hspace{0.17em}}\mathrm{Tr}({\mathrm{\Gamma}}_{j}{\mathrm{\Sigma}}_{j}^{-1})],$$The probability that the set of labels ${\mathit{\zeta}}_{i}=\{{\zeta}_{i1},\dots ,{\zeta}_{ij},\dots ,{\zeta}_{iJ}\}$ is assigned to the $i$’th node is given by a multinomial distribution

where ${\lambda}_{j}$ is the overall probability that a node is assigned to the $j$’th class. Therefore, the joint probability for $({\mathit{x}}_{i},{\mathit{\zeta}}_{i})$ is given by the product## (37)

$$\mathrm{p}({\mathit{x}}_{i},{\mathit{\zeta}}_{i}|\mathit{\theta},\mathit{\lambda})=\mathrm{p}({\mathit{x}}_{i}|{\mathit{\zeta}}_{i},\mathit{\theta})\mathrm{p}({\mathit{\zeta}}_{i}|\mathit{\lambda})=\prod _{j}{[{\lambda}_{j}\mathrm{p}({\mathit{x}}_{i}|{\theta}_{j})]}^{{\zeta}_{ij}}.$$## (38)

$$\mathrm{p}({\mathit{x}}_{i}|\mathit{\theta},\mathit{\lambda})={\int}_{{\zeta}_{i}}\mathrm{p}({\mathit{x}}_{i},{\mathit{\zeta}}_{i}|\mathit{\theta},\mathit{\lambda})\mathrm{d}{\zeta}_{i}=\sum _{j}{\lambda}_{j}\mathrm{p}({\mathit{x}}_{i}|{\theta}_{j}).$$## (39)

$$\mathrm{p}(\mathit{x}|\mathit{\theta},\mathit{\lambda})=\prod _{i}\sum _{j}{\lambda}_{j}\mathrm{p}({\mathit{x}}_{i}|{\theta}_{j}).$$## 3.1.1.

#### Reconstruction step

The objective function takes the form of Eq. (7), where at iteration $t$ of the reconstruction-classification algorithm, the regularization is given by Eqs. (9) and (39)

## (40)

$${\mathcal{R}}^{t}({\mathit{\mu}}_{\mathit{a}},{\mathit{\mu}}_{\mathit{s}}^{\prime})=-\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}(\mathit{x}|{\mathit{\theta}}^{t},{\mathit{\lambda}}^{t})=-\mathrm{log}\text{\hspace{0.17em}}\mathcal{N}(\overline{\mathit{x}},{\mathrm{\Sigma}}_{\overline{x}})=\frac{\tau}{2}{\parallel {\mathbf{L}}_{\overline{x}}(\mathit{x}-\overline{\mathit{x}})\parallel}^{2},$$## (41)

$${\overline{\mathit{x}}}_{i}={\sum _{j}{\zeta}_{ij}\xb7{\mathit{m}}_{j}|}_{\mathrm{MAP}(\zeta )}={m}_{{j}^{\prime}}\in {\mathbb{R}}^{2}$$## (42)

$$\mathrm{MAP}(\mathit{\zeta})=\mathrm{arg}\underset{\zeta}{\mathrm{max}}\text{\hspace{0.17em}}\mathrm{p}(\mathit{\zeta}|{\mathit{x}}^{t-1},{\mathit{\theta}}^{t-1},{\mathit{\lambda}}^{t-1}),$$In order to sphere the solution space, that is, to render the space dimensionless, we performed a change of variables ${\mathit{\mu}}_{\mathit{a}}\to {\mathit{\mu}}_{\mathit{a}}/{\mathit{\mu}}_{\mathit{a}0}$ and ${\mathit{\mu}}_{\mathit{s}}^{\prime}\to {\mathit{\mu}}_{\mathit{a}}/{\mathit{\mu}}_{\mathit{s}0}^{\prime}$, where $({\mathit{\mu}}_{\mathit{a}\text{\hspace{0.17em}}0},\text{\hspace{0.17em}}{\mathit{\mu}}_{\mathit{s}\text{\hspace{0.17em}}0}^{\prime})$ is the initial guess for the optical parameters (in this study, we initialized to the homogeneous background). Given the size of the problem, we chose a gradient-based optimization method in order to reduce memory use and computational expense.^{12} The minimization was performed using the limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) method,^{13} with a storage memory of six iterations.

## 3.1.2.

#### Classification

The purpose of the classification step is to update the multinomial model using the result of the previous reconstruction step. First, the expected values of the labels ${\mathit{\zeta}}^{t+1}$ are computed for the current class parameters $({\mathit{\theta}}^{t},{\mathit{\lambda}}^{t})$ and image ${\mathit{x}}^{t}=({\mathit{\mu}}_{\mathit{a}}^{t},{\mathit{\mu}}_{\mathit{s}}^{\prime t})$ (E-step). Then the model parameters are updated by maximizing the posterior probability (M-step)

## (43)

$$\mathrm{p}(\mathit{\theta},\mathit{\lambda}|{\mathit{x}}^{t})\propto \mathrm{p}({\mathit{x}}^{t}|\mathit{\theta},\mathit{\lambda})\mathrm{p}(\mathit{\theta},\mathit{\lambda}).$$E-step: The “responsibility” ${r}_{ij}^{t}$ is a measure of the probability that the $i$’th node is assigned to the $j$’th class. Using Bayes’ theorem and the Gaussian mixture model [Eq. (38)], we have

The expectation for the indicator values is## (44)

$$\mathrm{p}({\zeta}_{ij}=1|{\mathit{x}}_{i}^{t},{\mathit{\theta}}^{t},{\mathit{\lambda}}^{t})=\frac{\mathrm{p}({\mathit{x}}_{i}|{\zeta}_{ij}=1,{\mathit{\theta}}^{t})\mathrm{p}({\zeta}_{ij}=1)}{\mathrm{p}({\mathit{x}}_{i}|\mathit{\theta},\mathit{\lambda})}\phantom{\rule{0ex}{0ex}}=\frac{{\lambda}_{j}^{t}\mathrm{p}({\mathit{x}}_{i}^{t}|{\theta}_{j}^{t})}{\sum _{j}{\lambda}_{j}^{t}\mathrm{p}({\mathit{x}}_{i}^{t}|{\theta}_{j}^{t})}={r}_{nj}^{t}.$$Therefore, the MAP estimate for the labels is which can be used in Eq. (42).## (45)

$$E({\zeta}_{ij}|{\mathit{x}}_{i}^{t},{\mathit{\theta}}^{t},{\mathit{\lambda}}^{t})=\int {\zeta}_{ij}\mathrm{p}({\zeta}_{ij}=1|{\mathit{x}}_{i}^{t},{\mathit{\theta}}^{t},{\mathit{\lambda}}^{t})\mathrm{d}{\zeta}_{ij}\phantom{\rule{0ex}{0ex}}=0\times \mathrm{p}({\zeta}_{ij}=0|{\mathit{x}}_{i}^{t},{\mathit{\theta}}^{t},{\mathit{\lambda}}^{t})+1\times \mathrm{p}({\zeta}_{ij}=1|{\mathit{x}}_{i}^{t},{\mathit{\theta}}^{t},{\mathit{\lambda}}^{t})\phantom{\rule{0ex}{0ex}}={\mathit{r}}_{ij}^{t}.$$M-step: The parameters $(\mathit{\theta},\mathit{\lambda})$ are chosen in order to maximize the log posterior

Averaging over all possible values of $\mathit{\zeta}$ gives## (47)

$$({\mathit{\theta}}^{t+1},{\mathit{\lambda}}^{t+1})=\mathrm{arg}\underset{(\mathit{\theta},\mathit{\lambda})}{\mathrm{max}}\text{\hspace{0.17em}}\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}({\mathit{x}}^{t}|\mathit{\theta},\mathit{\lambda})+\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}(\mathit{\theta},\mathit{\lambda}).$$Using “Jensen’s inequality”## (48)

$$\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}({\mathit{x}}^{t}|\mathit{\theta},\mathit{\lambda})+\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}(\mathit{\theta},\mathit{\lambda})={\int}_{\zeta}\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}({\mathit{x}}^{t},\mathit{\zeta}|\mathit{\theta},\mathit{\lambda})\mathrm{d}\zeta +\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}(\mathit{\theta},\mathit{\lambda}).$$^{14}and ignoring terms that do not depend on $(\mathit{\theta},\mathit{\lambda})$, we obtain a lower bound for the log priorMaximizing $\mathcal{B}(\mathit{\theta},\mathit{\lambda})$ for $\sum _{j}{\lambda}_{j}=1$ and using noninformative priors, we obtain the update rules for the model parameters## (49)

$$\mathcal{B}(\mathit{\theta},\mathit{\lambda})=\sum _{i}\sum _{j}{r}_{ij}^{t}\text{\hspace{0.17em}}\mathrm{log}[{\lambda}_{j}\mathrm{p}({\sigma}_{n}|{\theta}_{j})]+\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}(\mathit{\lambda})+\mathrm{log}\text{\hspace{0.17em}}\mathrm{p}(\mathit{\theta})\phantom{\rule{0ex}{0ex}}=\sum _{i}\sum _{j}{r}_{ij}^{t}[\mathrm{log}({\lambda}_{j})+\mathrm{log}(|{\mathrm{\Sigma}}_{j}|)-\frac{1}{2}{({\mathit{x}}_{i(n)}-{\mathit{m}}_{j})}^{\prime}{\mathrm{\Sigma}}_{j}^{-1}({\mathit{x}}_{i(n)}-{\mathit{m}}_{j})]\phantom{\rule{0ex}{0ex}}+\sum _{j}[({\alpha}_{j}-1)\mathrm{log}({\lambda}_{j})-\frac{{\nu}_{j}+d+1}{2}\text{\hspace{0.17em}}\mathrm{log}|{\mathrm{\Sigma}}_{j}|].$$

## 3.2.

### Class Means Initialization

The number of classes $J$ and the class means ${\mathit{m}}_{j}$ were initialized by automatically segmenting the result of the first reconstruction step and averaging over the segmented areas. To segment the image [e.g., see Fig. 2(a)], we looked at a binned histogram of the image of ${\mathit{\mu}}_{\mathit{a}}$ and chose the value ${\mu}_{a\text{\hspace{0.17em}}h}$ for which the number of occurrences was highest [Fig. 2(c), column 1]. We found the first node index $h$ for which the value ${\mu}_{a\text{\hspace{0.17em}}h}$ occurs, and identified the corresponding scattering value ${\mu}_{s\text{\hspace{0.17em}}h}^{\prime}$. Having chosen a covariance matrix ${\mathrm{\Sigma}}_{h}$, we computed a map of the multivariate normal probability of the $({\mathit{\mu}}_{\mathit{a}},{\mathit{\mu}}_{\mathit{s}}^{\prime})$ images, with mean $({\mu}_{a\text{\hspace{0.17em}}h},{\mu}_{s\text{\hspace{0.17em}}h}^{\prime})$ [Fig. 2(c), column 2]. A suitable choice for ${\mathrm{\Sigma}}_{h}$ is the initial covariance of the classes. Then we selected a tolerance level ${\mathrm{tol}}_{h}$ at which to truncate the probability map, and selected all nodes with probability higher than the tolerance as belonging to the same class as node $h$ [Fig. 2(c), column 3]. We repeated this process on the remaining nodes until all nodes were classified. Thus, the number of classes was set to the number of iterations, and the average of the optical parameters over each class was used to initialize the class means [Fig. 2(b)].

## 3.3.

### Visualization of the Results

Results obtained using the reconstruction-classification method are displayed alongside scatter plots of the nodal values recovered in the two-dimensional (2-D) feature space $({\mu}_{a},{\mu}_{s}^{\prime})$ [e.g., see Fig. 2(c), final column in 4]. The positions of the class means ${\mathit{m}}_{j}=({\overline{\mu}}_{a\text{\hspace{0.17em}}j},{\overline{\mu}}_{s\text{\hspace{0.17em}}j}^{\prime})$ are identified by a cross, and the class covariances ${\mathrm{\Sigma}}_{j}$ are represented by ellipses. These are color-coded by class, and are indicative of the clustering of image nodal values around the class means.

## 4.

## Results

## 4.1.

### Two-Dimensional Validation and Reconstruction

We chose a numerical phantom defined on a 2-D circular mesh with 1331 nodes and radius 25 mm. Four illumination sources were placed on the boundary at angles 0, $\pi /2$, $\pi $, and $3\pi /2\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{rad}$. In all cases, the illumination profile was a normalized Gaussian with radius (distance from the center at which the profile drops to $1/e$) 6 mm. The background optical parameters were set to ${\mu}_{a}=0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$ and ${\mu}_{s}^{\prime}=1\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$. Two circular perturbations of radius 6 mm were added in positions (6 mm, 10 mm) and $(-6\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm},-10\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm})$ [Fig. 3(a)]. The values of the perturbations were ${\mu}_{a}=0.02\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$, ${\mu}_{s}^{\prime}=1.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$ and ${\mu}_{a}=0.03\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$, ${\mu}_{s}^{\prime}=1.25\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-1}$, respectively. The absorbed energy field was simulated for each illumination, and 1% white Gaussian noise was added [Fig. 3(b)]. The class covariances were initialized to

## (53)

$${\mathrm{\Sigma}}_{j}=\left(\begin{array}{ll}{10}^{-6}& 0\\ 0& {10}^{-1}\end{array}\right)\phantom{\rule[-0.0ex]{1em}{0.0ex}}\forall \text{\hspace{0.17em}}\text{\hspace{0.17em}}j=1,\dots ,3,$$## 4.2.

### Three-Dimensional Validation and Reconstruction

We chose a three-dimensional (3-D) phantom analogous to the 2-D case, defined on a cylinder with 27,084 nodes, radius 25 mm, and height 25 mm. Two spherical inclusions of radius 6 mm were placed in (6, 10, and 0 mm) and ($-6$, $-10$, and 0 mm) [Fig. 6(a)]. Illumination sources were Gaussian in the $xy$-plane constant in the $z$-axis, with radius 6 mm and length 25 mm [Figs. 6(b) and 6(c)]. PAT images were simulated for four illuminations at the cardinal points, and 1% noise was added to the absorbed energy [Fig. 6(d)]. The optical, covariance, and reconstruction parameters were set to the same values used in the 2-D case. The class initialization parameters were set to ${\mathrm{tol}}_{h}={10}^{-7}$ and ${\mathrm{\Sigma}}_{h}={\mathrm{\Sigma}}_{j}$. Images were reconstructed by performing 10 iterations of the reconstruction-classification method (Fig. 7).

## 5.

## Discussion

## 5.1.

### Summary of Findings

We applied the proposed reconstruction-classification algorithm to a 2-D numerical phantom with three tissues, a background, and two perturbations (Fig. 3). The optical absorption was recovered reliably within a small number of iterations, and the scattering was recovered with sufficient accuracy after approximately 10 iterations (Fig. 4). We compared the optical model with images obtained by the reconstruction-classification method and by a traditional reconstruction-only (no regularization) method (Fig. 5). We found that the reconstruction-classification method delivered superior image quality, particularly with regards to the scattering parameter. We applied the reconstruction-classification algorithm to a much larger 3-D problem (Fig. 6) and observed similar results (Fig. 7) as in the 2-D case.

## 5.2.

### Choice of Parameters

The parametric optical model and classification algorithm introduce a number of parameters that require tuning by the user. In addition to the regularization parameter, the parameters of the Jeffreys prior $\mathrm{\Gamma}$ and $\nu $ and the initial guess of the class variances ${\mathrm{\Sigma}}_{j}$ must be set before performing the classification. However, their significance is fairly intuitive, and with experience of a certain type of problem, the choice of parameters becomes natural. Visualizing the class covariance matrix ${\mathrm{\Sigma}}_{j}$ as an ellipse, changing the value of $\mathrm{\Gamma}$ varies its eccentricity, and changing $\nu $ varies the length of its axes. Further, given that in the first iteration the optical absorption is recovered with superior accuracy to the scattering, it is preferable to initialize the variance of the former to a smaller value than the latter, indicating greater confidence in the imaging solution.

## 5.3.

### Initialization of the Class Means

The purpose of the means initialization scheme is to increase automation of the method so that minimum user intervention and no prior knowledge of the number of tissues or their optical properties is required. The algorithm simply performs a segmentation of the image, then takes averages over the segmented areas to initialize the class properties (Fig. 1). Alternative segmentation techniques could have been employed; however, the advantage of the proposed approach is that it directly exploits the mixture of Gaussians model to identify the tissues. Our choice to investigate a node $h$ with ${\mu}_{a}$ belonging to the bin with a maximum number of occurrences leads to the background tissue being identified first, followed by the perturbation tissues. The choice of the node index $h$ could have been randomized so that tissues would be identified in random order. This approach is equally valid; however, we found that in cases where tissue values were close together (such as after a single reconstruction-classification iteration), it was preferable to identify the largest classes first because the mean was estimated with greater accuracy for the classes with a larger number of samples. Further, for a given image and tolerance level, our choice renders the result of the segmentation process unique and reproducible.

## 5.4.

### Recovery of the Scattering

From the comparison with the reconstruction-only case with no regularization (Fig. 5), it is evident that the introduction of the parametric prior enables better recovery of the scattering. The inconsistency between the quality of the recovered absorption and scattering parameters in the nonregularized case is due to the weaker dependence of the latter on the absorbed energy density with respect to the former. This results in the scattering gradient being approximately an order of magnitude smaller than the absorption gradient. Although the problem can be mitigated by sphering the solution space, variations in the data due to the scattering often fall below the noise floor. In the reconstruction-classification case, typically the absorption is recovered with good accuracy within a small number of iterations. Thus, the absorption takes values very close to the class means (resulting in small clusters), and the variance along the ${\mu}_{a}$ direction converges to a small value. Given that the regularization term is weighted by the inverse of the covariance matrix, the dependence of the absorption gradient on the data becomes weaker at each iteration, until its magnitude is comparable or smaller to that of the scattering. In the iterations that follow, the descent of the data term of the objective function is primarily due to updates to the scattering, which converges to the correct values.

## 5.5.

### Computational Demands

Computational performance was found to be strongly dependent on the problem size. In the 2-D case with 1331 nodes (Fig. 4), the total reconstruction time (10 outer reconstruction-classification iterations) using MATLAB on a 16-processor PC with 128 GB RAM was only 77 s. In the 3-D case with 27,084 nodes (Fig. 7), the total reconstruction time increased linearly with the number of nodes and was approximately 3.7 h on the same workstation. The increase in computation time was mostly due to much longer processing times for the L-BFGS algorithm in the reconstruction step.

## 5.6.

### Experimental Application

In experimental situations, prior information on tissue properties may be held, such as knowledge of the characteristic optical absorption and scattering spectra of chromophores of interest. These may be obtained from the literature^{15} or gained through tissue sample measurements. This information could be used in one of two ways. First, a library of typical chromophores could be used to initialize the class parameters instead of the proposed class means initialization method. The classification process could then perform the function of correcting for uncertainty, errors, or local variations in the real optical properties with respect to the prior information. Alternatively, it could be used to label the chromophores found by the segmentation process and identify these as certain tissues such as, e.g., “oxygenated blood” or “fat,” on the basis of the closeness of the recovered means to the characteristic properties.

## 5.7.

### Additional Priors

In this study, we assumed independence between nodal values; however, the mixture of Gaussian models could be used in conjunction with a spatial prior. Knowledge of smoothness or sparsity properties of the solution could be employed to introduce a homogeneous spatial regularizer such as first-order Tikhonov^{16} or total variation.^{7}^{,}^{17} Knowledge of structural information, such as that provided by an alternative imaging method or anatomical library, could be exploited by introducing a spatially varying probability map for the optical properties.

## 6.

## Conclusions

In this paper, we proposed a method for performing image reconstruction in QPAT. We introduced a parametric class model for the optical parameters and implemented a minimization-based reconstruction algorithm. We suggested an automated method by which to initialize the parameters of the class model and proposed a classification algorithm by which to progressively update and improve those parameters after each reconstruction step. We demonstrated though 2-D and 3-D numerical examples that the reconstruction-classification method allows for the simultaneous recovery of optical absorption and scattering. In particular, we found that this approach delivered superior accuracy in the recovery of the scattering with respect to traditional gradient-based reconstruction.