## 1.

## Introduction

Fluorescence diffusion optical tomography (FDOT) is one of the newer imaging techniques with promising application potential in medicine. FDOT provides the possibility of functional imaging, i.e., it not only visualizes anatomical structures but also provides information about physiological states and processes.

FDOT utilizes the ability of fluorescent dyes to absorb light in a certain wavelength range and to emit photons at a higher wavelength. The excitation light is injected into the sample through a set of sources. A source can either be in contact with the sample’s surface (e.g., a waveguide) or it delivers the light in a contactless manner using collimated or divergent light beams. The excitation light is scattered and absorbed while it spreads in the tissue. At sites where a fluorophore is present and active (e.g., inside a tumor), a part of the absorbed light leads to reemission at another wavelength. This secondary light is again scattered through the tissue, and the part that reaches the boundary can be measured by photon detectors.

Due to the diffuse propagation of the photons in tissue,^{1} light emerging from the fluorescent dye widely spreads before it reaches the boundary. This is in contrast to other established imaging techniques like x rays where the rays travel through the sample of interest in nearly straight lines. The photon diffusion has to be considered in a suitable forward model that is the basis for the reconstruction algorithm that seeks to determine the distribution of the fluorophore from boundary measurements.

The reconstructed results usually improve when increasing the number of sensors. However, this is true only up to a certain extent, as the diffuse nature of the photon propagation inherently limits the independence of information of different sensors and hence the obtainable resolution. On the contrary, the cost for the detector hardware, the acquisition time, and the computation effort, as well as the memory needed for reconstruction, increase as the number of source/detector combinations grows larger. The goal is to find a good compromise between image quality and both hardware and reconstruction feasibility.

Graves
^{2} have performed some investigations on how the number of sources and detectors and their respective distance influences the reconstruction. Later, the method was extended by Lasser, ^{3} who applied it to
$360\text{-}\mathrm{deg}$
projection tomography.

This paper presents a different approach to adapting the optode configuration of a fluorescence tomography system in the sense that it does not just compare different optode configurations but provides an information measure for every single optode, therefore offering greater flexibility. Furthermore, it will be shown how the adaptation can be modified such that the reconstruction is focused on a given region of interest.

## 2.

## Methods

## 2.1.

### Forward Model

One of the most accurate ways to model light propagation is to utilize Boltzmann’s transport equation for kinetic gases. The photons can then be treated like independent gas particles leading to the radiation transfer equation. Unfortunately, the photon intensity in the radiation transfer equation is a field dependent on the spatial coordinates and the direction (i.e., two angles) into which the photons travel. This leads to a discretization with a huge number of degrees of freedom and requires extensive computing power and memory.

Therefore, it is common to use an approximation of the transfer equation known as the diffusion equation.^{4} Including a spatially variable fluorophore concentration
$c$
, the diffusion equation reads

## 1

$$-\nabla \cdot [\kappa (x,\lambda )\nabla \phi \left(x\right)]+[{\mu}_{a,i}(x,\lambda )+c\left(x\right)\epsilon \left(\lambda \right)+\frac{i\omega}{v}]\phi \left(x\right)=q\left(x\right),$$For fluorescence applications, two diffusion equations—one describing the propagation of the excitation photons ( $\lambda ={\lambda}_{\mathit{ex}}$ , $\phi ={\phi}_{\mathit{ex}}$ ), and one describing the emission field ( $\lambda ={\lambda}_{\mathit{em}}$ , $\phi ={\phi}_{\mathit{em}}$ )—can be coupled. We prefer to write this in an operator (or matrix-like) notation, where ${A}_{\mathit{ex}}$ and ${A}_{\mathit{em}}$ describe the propagation of the excitation and emission field, respectively:

## 3.

## 4

$$B\left(c\right){\phi}_{\mathit{ex}}\left(x\right)=\frac{Q}{1-i\omega \tau}c\left(x\right){\epsilon}_{\mathit{ex}}{\phi}_{\mathit{ex}}\left(x\right),$$^{5}

Although more elaborate detector models (e.g., Ref. 6) could be used, in this paper, a measurement $d$ is defined as the number of photons leaving the sample at a certain point ${x}_{D}$ per unit time:

## 5

$$d\u2254-v{\int}_{\partial \Omega}\delta (x-{x}_{D}){\kappa}_{\mathit{em}}\left(x\right)\frac{\partial {\phi}_{\mathit{em}}\left(x\right)}{\partial n}\phantom{\rule{0.3em}{0ex}}\mathrm{d}x\stackrel{\left(2\right)}{=}v{\int}_{\partial \Omega}\frac{1}{2R}\delta (x-{x}_{D}){\phi}_{\mathit{em}}\left(x\right)\phantom{\rule{0.3em}{0ex}}\mathrm{d}x.$$## 2.2.

### Sensitivity

In order to solve the inverse problem, i.e., the reconstruction of the distribution of the fluorophore’s concentration $c\left(x\right)$ from measurements on the boundary, it is necessary to know the influence of a change in the concentration distribution on the measurements. In other words, the so-called sensitivity, given by the derivative of the system 3 with respect to $c\left(x\right)$ , is needed. Since the measurement $d$ is a linear functional of the emitted photon density ${\phi}_{\mathit{em}}$ , it suffices to calculate the derivative of ${\phi}_{\mathit{em}}$ with respect to $c$ . To this end, we write the coupled system of partial differential equations describing the dependence of ${\phi}_{\mathit{em}}$ (and ${\phi}_{\mathit{ex}}$ ) on $c$ as a nonlinear operator equation $F[c,\phi \left(c\right)]=0$ , where $\phi =({\phi}_{\mathit{ex}},{\phi}_{\mathit{em}})$ and

## 6

$$F:(c,\phi )\to \{\begin{array}{c}{A}_{\mathit{ex}}\left(c\right){\phi}_{\mathit{ex}}-q,\\ {A}_{\mathit{em}}\left(c\right){\phi}_{\mathit{em}}-B\left(c\right){\phi}_{\mathit{ex}}.\end{array}\phantom{\}}$$^{7}states that $\phi \left(c\right)$ is Fréchet-differentiable with respect to $c$ , and that the derivative ${\phi}^{\prime}\left(c\right)$ satisfies

## 7

$${\partial}_{\phi}F[c,\phi \left(c\right)]{\phi}^{\prime}\left(c\right)=-{\partial}_{c}F[c,\phi \left(c\right)],$$## 8

$${\partial}_{\phi}F[c,\phi \left(c\right)]\delta \phi =\{\begin{array}{l}{A}_{\mathit{ex}}\left(c\right)\delta {\phi}_{\mathit{ex}},\\ {A}_{\mathit{em}}\left(c\right)\delta {\phi}_{\mathit{em}}-B\left(c\right)\delta {\phi}_{\mathit{ex}},\end{array}\phantom{\}}$$It remains to calculate the Fréchet derivative ${\partial}_{c}F[c,\phi \left(c\right)]$ acting on the variation $\delta c$ . Taking the derivative of the system 3 with respect to $c$ and setting

## 9

$${\kappa}_{\mathit{ex}}^{\prime}=\frac{{\epsilon}_{\mathit{ex}}}{3{({\mu}_{a,i,\mathit{ex}}+c{\epsilon}_{\mathit{ex}}+{\mu}_{s}^{\prime})}^{2}},$$## 10

$${\partial}_{c}F[c,\phi \left(c\right)]\delta c=\{\begin{array}{l}-\nabla \cdot (-{\kappa}_{\mathit{ex}}^{\prime}\delta c\nabla {\phi}_{\mathit{ex}})+\delta c{\epsilon}_{\mathit{ex}}{\phi}_{\mathit{ex}},\\ -{\kappa}_{\mathit{ex}}^{\prime}\delta c{\partial}_{n}{\phi}_{\mathit{ex}},\\ -\nabla \cdot (-{\kappa}_{\mathit{em}}^{\prime}\delta c\nabla {\phi}_{\mathit{em}})+\delta c{\epsilon}_{\mathit{em}}{\phi}_{\mathit{em}}-\frac{Q}{1-i\omega \tau}\delta c{\epsilon}_{\mathit{ex}}{\phi}_{\mathit{ex}},\\ -{\kappa}_{\mathit{em}}^{\prime}\delta c{\partial}_{n}{\phi}_{\mathit{em}}.\end{array}\phantom{\}}$$Therefore, in order to calculate the sensitivity of the measurement $d$ for given $c$ with respect to a perturbation $\delta c$ , we first compute ${\phi}_{\mathit{ex}}\left(c\right)$ and ${\phi}_{\mathit{em}}\left(c\right)$ as the solution of 3 and then solve the boundary value problem

## 11

$$\{\begin{array}{c}-\nabla \cdot ({\kappa}_{\mathit{ex}}\nabla \delta {\phi}_{\mathit{ex}})+({\mu}_{a,i,\mathit{ex}}+c{\epsilon}_{\mathit{ex}}+{\mu}_{s}^{\prime})\delta {\phi}_{\mathit{ex}}=-\nabla \cdot ({\kappa}_{\mathit{ex}}^{\prime}\delta c\nabla {\phi}_{\mathit{ex}})-\delta c{\epsilon}_{\mathit{ex}}{\phi}_{\mathit{ex}},\\ {\phi}_{\mathit{ex}}+2R{\kappa}_{\mathit{ex}}{\partial}_{n}{\phi}_{\mathit{ex}}=2R{\kappa}_{\mathit{ex}}^{\prime}\delta c{\partial}_{n}{\phi}_{\mathit{ex}},\end{array}\phantom{\}}$$## 12

$$\{\begin{array}{c}-\nabla \cdot ({\kappa}_{\mathit{em}}\nabla \delta {\phi}_{\mathit{em}})+({\mu}_{a,i,\mathit{em}}+c{\epsilon}_{\mathit{em}}+{\mu}_{s}^{\prime})\delta {\phi}_{\mathit{em}}=-\nabla \cdot ({\kappa}_{\mathit{em}}^{\prime}\delta c\nabla {\phi}_{\mathit{em}})-\delta c{\epsilon}_{\mathit{em}}{\phi}_{\mathit{em}}+\frac{Q{\epsilon}_{\mathit{ex}}}{1-i\omega \tau}(\delta c{\phi}_{\mathit{ex}}+c\delta {\phi}_{\mathit{ex}}),\\ {\phi}_{\mathit{em}}+2R{\kappa}_{\mathit{em}}{\partial}_{n}{\phi}_{\mathit{em}}=2R{\kappa}_{\mathit{em}}^{\prime}\delta c{\partial}_{n}{\phi}_{\mathit{em}},\end{array}\phantom{\}}$$## 13

$$\delta d=v{\int}_{\partial \Omega}\frac{1}{2R}\delta (x-{x}_{D})\delta {\phi}_{\mathit{em}}\left(x\right)\phantom{\rule{0.3em}{0ex}}\mathrm{d}x.$$In a finite element context, the discretization of the concentration using piecewise-constant ansatz functions in 3, 11, 12 leads to the Jacobian or sensitivity matrix, which is denoted by $J$ in this paper. The element ${J}_{ij}$ describes the effect of a concentration change in the $j$ ’th finite element on the $i$ ’th measurement.

In certain applications, it is feasible to operate with difference measurements. A measurement ${d}_{0}$ is made with a baseline concentration ${c}_{0}$ , and a second measurement ${d}_{1}$ is performed after the concentration distribution has changed to ${c}_{1}$ . If the difference in concentration is small, the following linearization can be used for reconstruction:

where $\Delta d$ is a vector of difference measurements, and $\Delta c$ is the vector of concentrations in the finite elements. This formulation will be used throughout this paper. However, the $\Delta $ is neglected from now on, and we understand all measurement and concentrations as differences from a base state.## 3.

## Adaptation of the Measurement Setup

Entropy-based optimization methods have a quite long history in image processing and reconstruction^{8} and have been applied to various fields of tomography, as can be seen from Refs. 9, 10, 11, 12, 13, to name just a few. The basic idea is to treat the unknown parameter—in our case, the fluorophore concentration—as a random variable with a certain probability density. Then one seeks to reconstruct that parameter distribution leading to the maximum entropy, for example.

The optimization approach followed in this paper is based on the idea that the different measurements should be as independent as possible, i.e., every measurement should result in new information that can be used for the inverse problem. A way to quantify this independence is by using the mutual information (MI).

Let $\mathcal{M}$ denote the set of all measurement indices, i.e., each element of $\mathcal{M}$ uniquely defines one pair of source/detector. Further, let ${\mathcal{S}}_{i}\subset \mathcal{M}$ be the indices of those measurements that are made with the $i$ ’th source. Without loss of generality, it can be assumed that the measurements are ordered such that for one fixed source $i$ , the sensitivity matrix and the measurements can be partitioned as

## 15

$$J=\left(\begin{array}{c}{J}_{1}\\ {J}_{2}\end{array}\right),\phantom{\rule{1em}{0ex}}\left(\begin{array}{c}{d}_{1}\\ {d}_{2}\end{array}\right)=\left(\begin{array}{c}{J}_{1}\\ {J}_{2}\end{array}\right)c,$$In order to calculate the entropy, one has to interpret the model parameters as random variables and one has to make assumptions on their probability distribution. For the sake of simplicity, it is assumed that the model parameters (i.e., the concentrations in the finite elements) are independent and normally distributed with equal variance ${\sigma}_{c}$ . This will render the model covariance matrix diagonal: $\mathrm{cov}\left(c\right)={\sigma}_{c}I$ . The data covariance matrix is coupled via the relation $d=Jc$ , and it holds that

## 16

$$\mathrm{cov}\left(d\right)=\mathrm{cov}\left(Jc\right)=J\mathrm{cov}\left(c\right){J}^{T}={\sigma}_{c}J{J}^{T},$$## 17

$$\mathrm{cov}\left(\begin{array}{c}{d}_{1}\\ {d}_{2}\end{array}\right)={\sigma}_{c}\left(\begin{array}{cc}{J}_{1}{J}_{1}^{T}& {J}_{1}{J}_{2}^{T}\\ {J}_{2}{J}_{1}^{T}& {J}_{2}{J}_{2}^{T}\end{array}\right).$$The entropy (or uncertainty) of the full data is given through the multivariate normal distribution^{14} as

## 18

$$H\left(d\right)=\frac{1}{2}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\left\{{\left(2\pi e\right)}^{M}\phantom{\rule{0.2em}{0ex}}\mathrm{det}\left[\mathrm{cov}\left(d\right)\right]\right\}.$$## 19

$$H\left({d}_{1}\right)=\frac{1}{2}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\left\{{\left(2\pi e\right)}^{M-{S}_{i}}\phantom{\rule{0.2em}{0ex}}\mathrm{det}\left[\mathrm{cov}\left({d}_{1}\right)\right]\right\},$$## 20

$$H\left({d}_{1}\right|{d}_{2})=\frac{1}{2}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\left\{{\left(2\pi e\right)}^{M-{S}_{i}}\phantom{\rule{0.2em}{0ex}}\mathrm{det}\left[\mathrm{cov}\left({d}_{1}\right|{d}_{2})\right]\right\},$$## 21

$$\mathrm{cov}\left({d}_{1}\right|{d}_{2})={\sigma}_{c}[{J}_{1}{J}_{1}^{T}-{J}_{1}{J}_{2}^{T}{\left({J}_{2}{J}_{2}^{T}\right)}^{-1}{J}_{2}{J}_{1}^{T}].$$As a remark, we note that the term ${J}_{2}^{\u2020}\u2254{J}_{2}^{T}{\left({J}_{2}{J}_{2}^{T}\right)}^{-1}$ appearing in Eq. 21 is the pseudo-inverse of ${J}_{2}$ . If ${J}_{2}^{\u2020}{J}_{2}=I$ were fulfilled exactly, the covariance of ${d}_{1}$ would be zero. In such a case, the model parameters would lead to measurement data ${d}_{2}$ that already contains all the information, and ${d}_{1}$ can be predicted from it. Furthermore, all model parameters could be reconstructed exactly from the knowledge of the measurements ${d}_{2}$ alone.

Now, a source can be removed safely, if the conditional entropy $H\left({d}_{1}\right|{d}_{2})$ is low, because in that case, measuring ${d}_{2}$ significantly decreases the uncertainty in ${d}_{1}$ . In other words, the information in the measurements ${d}_{1}$ (made with the source optode under test) is also largely explained by the measurements ${d}_{2}$ (made without this source). This can also be expressed by introducing the mutual information $\mathrm{MI}({d}_{1},{d}_{2})\u2254H\left({d}_{1}\right)-H\left({d}_{1}\right|{d}_{2})$ which quantifies the reduction of uncertainty in ${d}_{1}$ when ${d}_{2}$ is known beforehand. If the measurements ${d}_{2}$ of the currently considered source have a high mutual information with all other measurements, the source can be removed from the pool, as its information is also present in ${d}_{1}$ to a large extent.

Writing the negative mutual information together with Eqs. 19, 20, 21 gives

## 22

$$-\mathrm{MI}({d}_{1},{d}_{2})=\frac{1}{2}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\{\mathrm{det}[{J}_{1}{J}_{1}^{T}-{J}_{1}{J}_{2}^{T}{\left({J}_{2}{J}_{2}^{T}\right)}^{-1}{J}_{2}{J}_{1}^{T}]/\mathrm{det}\left({J}_{1}{J}_{1}^{T}\right)\}.$$## 23

$${\mathrm{det}}^{-1}\left({J}_{1}{J}_{1}^{T}\right)=\mathrm{det}\left[{\left({J}_{1}{J}_{1}^{T}\right)}^{-1\u22152}\right]\mathrm{det}\left[{\left({J}_{1}{J}_{1}^{T}\right)}^{-1\u22152}\right].$$## 24

$$-\mathrm{MI}({d}_{1},{d}_{2})=\frac{1}{2}\phantom{\rule{0.2em}{0ex}}\mathrm{log}\left\{\mathrm{det}[I-{\left({J}_{1}{J}_{1}^{T}\right)}^{-1\u22152}{J}_{1}{J}_{2}^{T}{\left({J}_{2}{J}_{2}^{T}\right)}^{-1}{J}_{2}{J}_{1}^{T}{\left({J}_{1}{J}_{1}^{T}\right)}^{-1\u22152}]\right\}.$$A major drawback of this technique is the fact that the computation of the mutual information requires significant computing effort due to the necessity of matrix inversions and the computations of the determinants in Eq. 24. This renders such an approach computationally infeasible, which is why an alternative method was implemented instead.

## 3.1.

### Redundancy Reduction

A different method was originally developed by Michelini and Lomax and published by Curtis
^{16} They quantified the independence of two measurements by computing the inner product and the angle, respectively, between the respective rows of the sensitivity matrix
$J$
. Then the algorithm has to find that set of measurements that is closest to an orthogonal set.

Using the same notation as in the previous section, the square of the cosine of the angle between two measurements, one made with source $i$ and one made with another source, is given by the term

## 25

$$\frac{{({j}_{m},{j}_{n})}^{2}}{{\Vert {j}_{m}\Vert}^{2}{\Vert {j}_{n}\Vert}^{2}},\phantom{\rule{1em}{0ex}}m\u220a{\mathcal{S}}_{i},\phantom{\rule{1em}{0ex}}n\u220a\mathcal{M}\backslash {\mathcal{S}}_{i},$$Now, consider the average square of the cosine between all measurements made with source $i$ and those measurements made with another source. This will lead to the expression

## 26

$${r}_{i}\u2254\frac{1}{|\mathcal{M}\backslash {\mathcal{S}}_{i}|}\sum _{n\u220a\mathcal{M}\backslash {\mathcal{S}}_{i}}\frac{1}{\left|{\mathcal{S}}_{i}\right|}\sum _{m\u220a{\mathcal{S}}_{i}}\frac{{({j}_{m},{j}_{n})}^{2}}{{\Vert {j}_{m}\Vert}^{2}{\Vert {j}_{n}\Vert}^{2}}=\frac{1}{|\mathcal{M}\backslash {\mathcal{S}}_{i}|\left|{\mathcal{S}}_{i}\right|}\sum _{n\u220a\mathcal{M}\backslash {\mathcal{S}}_{i}}\sum _{m\u220a{\mathcal{S}}_{i}}\frac{{({j}_{m},{j}_{n})}^{2}}{{\Vert {j}_{m}\Vert}^{2}{\Vert {j}_{n}\Vert}^{2}},$$*redundancy*of source $i$ . The quantity

## 27

$${q}_{i}\u22541-{r}_{i}=1-\frac{1}{|\mathcal{M}\backslash {\mathcal{S}}_{i}|\left|{\mathcal{S}}_{i}\right|}\sum _{n\u220a\mathcal{M}\backslash {\mathcal{S}}_{i}}\sum _{m\u220a{\mathcal{S}}_{i}}\frac{{({j}_{m},{j}_{n})}^{2}}{{\Vert {j}_{m}\Vert}^{2}{\Vert {j}_{n}\Vert}^{2}},$$The optimization algorithm starts with a set of feasible optodes. It then iteratively calculates the quality measure for every source and removes the one with the lowest measure from the optode pool. This is done until a given stopping criterion is met. The optodes left in the pool are considered to be the best source optodes for the given geometry. The same procedure can be applied to the set of detectors, too. The measurement setup is then found by combining the sets of best sources and best detectors.

## 3.1.1.

#### Geometric Averaging

The averaging of the single optode redundancies in the former section were introduced in an intuitive way using the arithmetic mean. However, one could also think about using other approaches—for example, based on the geometric mean:

## 28

$${q}_{g,i}\u2254{\left\{\prod _{n\u220a\mathcal{M}\backslash {\mathcal{S}}_{i}}[1-\frac{1}{\left|{\mathcal{S}}_{i}\right|}\sum _{m\u220a{\mathcal{S}}_{i}}\frac{{({j}_{m},{j}_{n})}^{2}}{{\Vert {j}_{m}\Vert}^{2}{\Vert {j}_{n}\Vert}^{2}}]\right\}}^{1\u2215|\mathcal{M}\backslash {\mathcal{S}}_{i}|}.$$## 3.2.

### Relation to Entropy Optimization

In this section, a link between the mutual information optimization and the redundancy minimization technique is established. To relate the redundancy to entropy, Eq. 26 has to be brought into a matrix formulation. Using the partitioned sensitivity defined in Eq. 15 and the notation

## 29

$${\mathrm{diag}}^{-1\u22152}\left(J{J}^{T}\right)=\left(\begin{array}{ccc}\frac{1}{\Vert {j}_{1}\Vert}& & \\ & \ddots & \\ & & \frac{1}{\Vert {j}_{M}\Vert}\end{array}\right),$$## 30

$${r}_{i}=\frac{1}{|\mathcal{M}\backslash {\mathcal{S}}_{i}|\left|{\mathcal{S}}_{i}\right|}\sum _{k=1}^{|\mathcal{M}\backslash {\mathcal{S}}_{i}|}\sum _{l=1}^{\left|{\mathcal{S}}_{i}\right|}{\left[{\mathrm{diag}}^{-1\u22152}\left({J}_{1}{J}_{1}^{T}\right){J}_{1}{J}_{2}^{T}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1\u22152}\left({J}_{2}{J}_{2}^{T}\right)\right]}_{k,l}^{2}.$$## 31

$${r}_{i}=\frac{1}{|\mathcal{M}\backslash {\mathcal{S}}_{i}|\left|{\mathcal{S}}_{i}\right|}\mathrm{tr}\left[{\mathrm{diag}}^{-1\u22152}\left({J}_{1}{J}_{1}^{T}\right){J}_{1}{J}_{2}^{T}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1}\left({J}_{2}{J}_{2}^{T}\right){J}_{2}{J}_{1}^{T}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1\u22152}\left({J}_{1}{J}_{1}^{T}\right)\right].$$## 32

$${q}_{i}=\frac{1}{|\mathcal{M}\backslash {\mathcal{S}}_{i}|}\mathrm{tr}[I-\frac{1}{\left|{\mathcal{S}}_{i}\right|}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1\u22152}\left({J}_{1}{J}_{1}^{T}\right){J}_{1}{J}_{2}^{T}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1}\left({J}_{2}{J}_{2}^{T}\right){J}_{2}{J}_{1}^{T}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1\u22152}\left({J}_{1}{J}_{1}^{T}\right)].$$A similar derivation can be done for the geometric quality measure introduced in Eq. 28. The final result will be:

## 33

$${q}_{g,i}={\left\{\mathrm{det}\phantom{\rule{0.2em}{0ex}}\mathrm{diag}[I-\frac{1}{\left|{\mathcal{S}}_{i}\right|}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1\u22152}\left({J}_{1}{J}_{1}^{T}\right){J}_{1}{J}_{2}^{T}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1}\left({J}_{2}{J}_{2}^{T}\right){J}_{2}{J}_{1}^{T}\phantom{\rule{0.2em}{0ex}}{\mathrm{diag}}^{-1\u22152}\left({J}_{1}{J}_{1}^{T}\right)]\right\}}^{1\u2215|\mathcal{M}\backslash {\mathcal{S}}_{i}|}.$$When the results of Eqs. 32, 33, 24 are compared, one notices interesting similarities in the structure of these equations, although they are not the same. The mutual information formula 24 combines the whole conditional covariance matrix into a single quality measure using the determinant. The other two equations based on the redundancy operate on the diagonal matrix parts (the variances) only and neglect the covariance completely.

The original formulation based on the arithmetic mean [Eq. 32] has the disadvantage that there is no strong relationship between the trace of a (symmetric positive-definite) covariance matrix and its determinant in general. In fact, it is rather easy to construct examples where the trace between two setups increases while the determinant decreases.

The newly introduced geometric averaging [Eq. 33] resembles the entropy optimization much more closely. By using the Cauchy-Schwarz inequality ${\mathrm{cov}}^{2}(x,y)\u2a7d\mathrm{var}\left(x\right)\mathrm{var}\left(y\right)$ , it is obvious that a reduction in the variances have to reduce the covariance simultaneously. Thus, we can argue that a decrease in the geometrically weighted redundancy quality measure ${q}_{g,i}$ will decrease $(-\mathrm{MI})$ and therefore increase the mutual information. In other words, an optode that is highly redundant is likely to exhibit a high mutual information content between measurement associated to that optode and all other measurements.

## 3.3.

### Focusing

In certain applications, it can be advantageous to bias the arrangement of the optodes in order to reach a higher sensitivity—which usually goes along with a higher resolution and/or a better contrast-to-noise ratio—in a specified region. This can be used to focus the reconstruction on certain organs, for example.

A simple approach to achieve focusing is by weighting the columns of the sensitivity matrix with a predefined weighting mask $f$ :

where ${J}_{F}$ denotes the focused sensitivity matrix. This resultant matrix will then be used in the adaptation algorithm described earlier.In the simplest case, $f$ is a binary vector that has entries one in the region of interest and zero everywhere else. Smooth variations of the mask are possible as well. Generally, we can assume that $0\u2a7d{f}_{i}\u2a7d1$ .

## 4.

## Results

## 4.1.

### Adapted Configurations

The optode adaptation was performed on a cylinder with a height of $90\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ and a radius of $30\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ , which mimics a small animal. The values of the optical properties can be found in Table 1 . Equations and estimates of these parameters can be found in Refs. 5, 17, 18.

## Table 1

Values of optical parameters used for the forward simulation (Ref. 5, 17, 18).

μs′ mm−1 | μa,i mm−1 | c⋅ε mm−1 | R | |
---|---|---|---|---|

Excitation | 0.275 | 0.036 | $83.5\times {10}^{-3}$ | 2.51 |

Emission | 0.235 | 0.029 | $28.1\times {10}^{-3}$ | 2.51 |

A regular grid with 48 source and 48 detector nodes was specified as an initial pool of feasible optode positions. The optodes were arranged in a zigzag-like pattern on six rings with a spacing of $10\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ [see Fig. 1 ]. The adaptation algorithm needs the desired number of sources and detectors as stopping-criterion, both of which were set to eight.

Three different focus regions were chosen to demonstrate the focusing capability. Region A consists of the voxels in the cylinder slice given by $10\phantom{\rule{0.3em}{0ex}}\mathrm{mm}<z<20\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ , region B is another slice defined by $-7.5\phantom{\rule{0.3em}{0ex}}\mathrm{mm}<z<7.5\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ , and region C is the half-cylinder slice $-20\phantom{\rule{0.3em}{0ex}}\mathrm{mm}<z<-10\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ and $x>0\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ . The adaptation was performed on a finite element mesh with approximately 30,000 elements.

Looking at the outcome of the adaptation procedure, one notices that the algorithm tends to concentrate the final sources and optodes near the focus regions, which is desired because the sensitivity is usually higher toward the optodes. The result of adapting to the uppermost focus region in Fig. 1 could also have been suggested intuitively. On the other hand, the best configuration in Fig. 1 is symmetric around the cylinder’s axis but asymmetric to its midplane.

We also compared the geometric averaging method to the original formulation published in Ref. 16. The result of the adaptation using the arithmetic averaging is to a large extent equivalent to the geometric averaging method, and so we indicate only the differences in Fig. 1. Focusing to regions A and B resulted in exactly the same optode set. Only when focusing on the half-cylinder slice, two detectors changed their location. Their position in Fig. 1 has been labeled G for the geometric and A for the original arithmetic averaging. This is an expected result, as the geometric averaging method replaces only one of the sums by a product and thus there should not be a dramatic difference in the best optode configuration. From these outcomes, one can conclude that both the geometric and the arithmetic adaptation methods result in optode configurations that are meaningful and comprehensible.

Figures 2, 2, 2 show the rank of the optodes, i.e., the iteration in which they were removed from the pool of feasible optodes for the geometric averaging method. The lighter the color, the longer the optode remains in the feasible pool. There is a general tendency to remove optodes far from the focus region early from the adaptation process, as can be seen by looking at the lower optode rings in Fig. 2 or at rings 1 and 6 in Fig. 2. However, this is not the case anymore when the region of interest is set to the half-cylinder slice, where the first optodes to be removed are the ones on ring 2 opposite the focus region. There are also optodes on ring 5 that stay longer in the feasible pool. This can be explained by the fact that off-plane information could improve the resolution in the direction of the cylinder axis.

## 4.2.

### Comparison of Reconstructed Images

To provide evidence that the adapted arrangements improve reconstruction results, simple symmetrical optode arrangements were compared to the results of the adaptation algorithms. To quantify the reconstructed images in an objective manner, a reconstruction with the full set of optodes was used as gold standard, as this is the best reconstruction one can achieve under the given circumstances.

First, four fluorescent inclusions with a diameter of $5\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ were placed inside the focus region B. Every second source and detector optode on the rings 3 and 4 were chosen as an intuitive arrangement. This configuration resulted in a relative reconstruction error of 46% compared to the reconstruction with all optodes. The reconstruction with the best set of optodes for region B (which is the same for both adaptation methods) yielded a relative error of 42%.

For the second test case, a single $5\text{-}\mathrm{mm}$ sphere was placed inside focus region C. The relative error of the intuitive arrangement—which was all optodes on ring 2—to the best possible reconstruction was 25.6%. Using the geometrically weighted adapted optode set resulted in a relative error of 16.9%. The best configuration using the adaptation routine with arithmetic averaging yielded an error of 17.4%.

For the focus region A, the adapted optode configurations are identical to the intuitive one.

As an objective quality measure for the sensitivity matrix of the adapted optodes, its singular values or its condition number can be used. The largest and smallest singular values together with the ratio between them can be found in Table 2 . The full optode configuration has a rather high ratio of $6\cdot {10}^{9}$ and is thus rather ill-conditioned. The focused designs show a singular value (SV) ratio that is reduced by a factor of ${10}^{4}$ to ${10}^{6}$ . This is exactly what is intended by the redundancy minimization algorithm, as the removal of nonorthogonal rows from the sensitivity matrix improves conditioning. As the adaptation method with geometric and arithmetic averaging resulted in nearly optimal optode configurations, the difference in singular values when focusing on region C is rather small.

## Table 2

Singular value (SV) analysis for the full sensitivity matrix and the adapted configurations with focusing on different regions. The table lists the largest and smallest singular values as well as the ratio between them (the condition number), which is a measure of stability for matrix inversion.

Averaging | Design | Max SV | Min SV | Ratio |
---|---|---|---|---|

Full pool of optodes | $3.98\times {10}^{-5}$ | $6.19\times {10}^{-15}$ | $6.42\times {10}^{9}$ | |

Focus on region A | $3.60\times {10}^{-5}$ | $1.57\times {10}^{-10}$ | $2.29\times {10}^{5}$ | |

Geometric | Focus on region B | $3.26\times {10}^{-5}$ | $4.24\times {10}^{-9}$ | $7.67\times {10}^{3}$ |

Focus on region C | $3.52\times {10}^{-5}$ | $2.51\times {10}^{-9}$ | $1.40\times {10}^{4}$ | |

Focus on region A | $3.60\times {10}^{-5}$ | $1.57\times {10}^{-10}$ | $2.29\times {10}^{5}$ | |

Arithmetic | Focus on region B | $3.26\times {10}^{-5}$ | $4.24\times {10}^{-9}$ | $7.67\times {10}^{3}$ |

Focus on region C | $3.51\times {10}^{-5}$ | $2.31\times {10}^{-9}$ | $1.52\times {10}^{4}$ |

## 4.3.

### Robustness

The robustness of the adaptation method was tested by a Monte Carlo simulation. First, the best adapted configurations using either the arithmetic or the geometric averaging were defined as reference sets. In each run of the Monte Carlo simulation, the optodes in the initial pool were shifted randomly up to $\pm 1\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ along the $z$ axis and the cylinder perimeter (which is a shift of 10% of the distance between two optodes in $z$ direction and 15% along the perimeter), after which the adaptation procedure was used on these shifted optodes. Table 3 lists the number of 50 Monte Carlo trials for which the adapted set did not differ by more than four optodes from the reference set with an unshifted initial optode pool.

## Table 3

Number of Monte Carlo trials with shifted initial optode positions that resulted in an adapted optode set that did not differ from the reference set by more than nmiss (cumulated).

Averaging | Focus | nmiss | ||||
---|---|---|---|---|---|---|

0 | 1 | 2 | 3 | 4 | ||

Region A | 50 | 50 | 50 | 50 | 50 | |

Geometric | Region B | 11 | 19 | 30 | 36 | 43 |

Region C | 2 | 25 | 37 | 46 | 49 | |

Region A | 50 | 50 | 50 | 50 | 50 | |

Arithmetic | Region B | 11 | 23 | 34 | 40 | 45 |

Region C | 27 | 47 | 50 | 50 | 50 |

The shifting of the optodes has obviously no influence when focusing on region A, as both averaging methods are able to return the best solution in every Monte Carlo trial. In the other two cases where the best solution is not so obvious, the stability is decreased a bit. This is especially true when focusing on region B. One reason is that in this setup, the best set of optodes can be rotated around the cylinder’s axis or mirrored at the $xy$ -plane, and the resultant configuration should be equivalent to the reference solution with respect to redundancy in measurement data.

For Fig. 3 , the frequency of every optode in the final set was counted, coded in the marker size, and drawn on the cylinder surface. The bigger the optode circle (source) or square (detector) is drawn, the more often this optode will be in the outcome of the adaptation process and the more stable it is. The reference result for which the initial pool of optodes was left unshifted is marked with an $x$ . It is well visible that the optodes are always placed near the focus region.

Last, Table 4 shows the mean value and standard deviation of the quality measure from Eqs. 27, 28 for the best and worst source and detector optode. The standard deviation is quite small, which indicates that even if the adaptation method does not find the optodes from the reference set, the resultant optodes still have a high quality, i.e., their measurements exhibit a small redundancy.

## Table 4

Mean and standard deviation of the quality measure for the best and worst source and detector optodes over all Monte Carlo runs.

Averaging | Focus | Best source | Worst source | Best detector | Worst detector |
---|---|---|---|---|---|

Region A | $0.9549\pm 0.0016$ | $0.9482\pm 0.0013$ | $0.9113\pm 0.0019$ | $0.9025\pm 0.0022$ | |

Geometric | Region B | $0.9544\pm 0.0035$ | $0.9393\pm 0.0033$ | $0.9109\pm 0.0018$ | $0.9003\pm 0.0018$ |

Region C | $0.9094\pm 0.0069$ | $0.8404\pm 0.0062$ | $0.8554\pm 0.0064$ | $0.7786\pm 0.0061$ | |

Region A | $0.9565\pm 0.0015$ | $0.9503\pm 0.0012$ | $0.9148\pm 0.0017$ | $0.9069\pm 0.0020$ | |

Arithmetic | Region B | $0.9553\pm 0.0031$ | $0.9420\pm 0.0033$ | $0.9140\pm 0.0022$ | $0.9037\pm 0.0019$ |

Region C | $0.9146\pm 0.0047$ | $0.8527\pm 0.0055$ | $0.8645\pm 0.0046$ | $0.7940\pm 0.0045$ |

## 4.4.

### Off-Focus Signal Suppression

Figure 4 shows example reconstructions based on artificial measurement data for three spherical perturbations with a diameter of $5\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ , using the three different focusing strategies together with geometric averaging. The exact locations of the perturbations are shown in Fig. 4. The optical properties are the same as listed in Table 1. The synthetic data was generated using a finer finite element mesh, and 5% noise was added. It can be noticed that every configuration is able to suppress the signal from outside its focus region quite effectively. The reconstruction is best for the lower sphere because the optode configuration was adapted to a smaller focus volume there.

## 5.

## Discussion

The location of the sensors and detectors is a critical design parameter for FDOT hardware. A configuration determines the sensitivity of the measurements in a given region and also the obtainable resolution. The method presented in this paper is based on a simulation model and can thus be used prior to building hardware.

Comparisons of different hardware configurations for fluorescence tomography that we are aware of were previously reported by Graves
^{2} and Lasser
^{3} Their approaches were based on a singular value (SV) analysis of the so-called weight matrix. This matrix is essentially the sensitivity matrix used in this paper, but was obtained using the normalized Born approximation.^{19}

In contrast to the previous methods, the adaptation algorithm used herein operates on the complete derivative of the diffusion approximation 7. Therefore, also the change of the excitation field due to a perturbation in the fluorophore distribution is considered, which is neglected in commonly used first-order Born approximations.

The SV analysis implemented by Graves and Lasser requires a singular value decomposition (SVD) of the sensitivity matrix. This demands a large amount of computation time. As an example, the calculation of a single SVD for a matrix of size $48\cdot 48\times \mathrm{30,000}$ takes about $50\phantom{\rule{0.3em}{0ex}}\mathrm{min}$ on a dual-core processor. Therefore, the SV optimization is limited to 2-D applications or to rather simple 3-D geometries, which can be modeled with fewer finite elements. The redundancy reduction algorithm does not suffer from these limitations, as it needs to compute only inner products of rows of the sensitivity matrix. The matrix is built up efficiently using an implicit function formulation that requires one additional finite element solution per optode only. Furthermore, both the assembly of the sensitivity matrix as well as the calculation of the inner products can be parallelized with moderate effort (the basic principle can be found in Ref. 20, for example), which allows accelerating the method even more.

The algorithm we implemented starts with a complete optode arrangement and iteratively discards the one optode having the lowest quality measure. Unfortunately, the algorithm cannot determine in the current step whether it would be advantageous later if the worst optode was kept and another one was dismissed from the pool instead. Thus, the drawback of this simple top-down strategy is that it cannot guarantee to find the global minimum. However, a full search is an nondeterministic polynomial time (NP) problem and would require the computation of around
${10}^{18}$
different configurations for the rather simple geometry demonstrated earlier. This is computationally not feasible. A viable alternative could be the implementation of stochastic methods that have a long tradition in numerical optimization. A recent promising approach involves formulating this problem as a distributed control problem with sparsity constraints,^{21, 22} which computes an “optode field” that is nonzero only at discrete points. These could be taken as optimal locations where optodes should be placed. In addition, the solution would indicate the optimal strength of the sources as well.

A great advantage of the redundancy minimization is that it provides a quality measure for every single optode rather than comparing complete configurations, as is the case in SV analysis. This offers the possibility to choose a superior configuration first, which could even be obtained with a completely different method, and to adapt the arrangement further through the removal of optodes exhibiting a poor quality measure.

To our knowledge, the possibility of choosing an optode configuration such that the reconstruction is focused on an *a priori* given region inside the tissue has not been investigated for fluorescence tomography before. Similar strategies have been reported for other diffusion-limited imaging modalities such as seismic tomography^{16} and magnetic induction tomography.^{23}

The effect of focusing on the reconstruction is demonstrated in Figure 4. The best reconstruction result is obtained when the optode configuration is adapted to a small volume around the field of view, which is demonstrated in Fig. 4. In all three reconstructions, the concentration changes outside the focus region are suppressed very well. This feature might prove advantageous in certain cases—for example, if the autofluorescence signal from other organs needs to be damped.

In Sec. 3.2, we attempted to link the redundancy measure to entropy and mutual information, respectively. The three quality criteria given in Eqs. 32, 33, 24 are very similar in their structure, although not equal. The mutual information criterion operates on the full covariance matrix, while the redundancy criteria use its diagonal part only.

The MI optimization is computationally expensive due to the need for matrix decompositions, matrix inversions, and the calculation of the determinant. The redundancy reduction is much more efficient, as it mainly depends on the calculation of inner products of matrix rows. However, the decrease in numerical effort goes hand in hand with the neglect of the off-diagonal matrix entries.

In this paper, one of the most simple stopping criteria, the final number of sources and detectors, was chosen. It is worth mentioning that the algorithm is flexible enough to include more elaborate criteria easily. For example, it could be desirable to specify the minimum required resolution or the minimum contrast-to-noise ratio. In further work, the feasibility of dynamic stopping criteria will also be investigated. The principle is to calculate an initial image quality criterion (e.g., the resolution) and then again iteratively remove optodes from the feasible pool. As soon as the image quality decreases significantly, the procedure is stopped.

## 6.

## Conclusion

We have presented an algorithm to adaptively remove sensor and detector locations for fluorescence diffusion optical tomography. The possibility to bias the resultant design to be sensitive in a given area was demonstrated. In contrast to previously reported algorithms, the current formulation has a strong connection to entropy optimization.

## Acknowledgments

This work was supported by Project F3207-N18 granted by the Austrian Science Fund. The authors want to thank the referees for their helpful comments, which led to a significant improvement in the revision of this paper.

## References

*Applied Mathematical Sciences*, Springer-Verlag, New York (1995). Google Scholar