## 1.

## Introduction

## 1.1.

### Image Reconstruction and Regularization

Optical topography uses diffuse reflectance measurements on the surface of an object to derive its internal optical properties.^{1, 2} It can be used to measure functional brain activity, because the optical properties depend on the concentration of oxyhemoglobin and deoxyhemoglobin. The first optical topography images, such as those obtained using the Hitachi ETG-100 system,^{3} were obtained by interpolating the measurements into the image space. However, it is becoming increasingly common for optical topography images to be reconstructed by solving the inverse problem. Boas
^{4, 5} has shown that this provides improved spatial resolution and quantitative accuracy compared to interpolation. However, the inverse problem is ill-posed, which means that there is not a single, well-behaved solution. In order to obtain a meaningful solution, the problem must be regularized, for example, using Tikhonov regularization.

Tikhonov regularization requires a regularization parameter, here called $\lambda $ . This is most often determined heuristically, by subjectively selecting a value of $\lambda $ that appears by eye to give the “best” image. A number of more objective methods have been proposed for selecting $\lambda $ in optical topography and other inverse problems. In this paper, we review 11 methods that have been proposed for selecting $\lambda $ and apply them to the optical topography inverse problem. We concentrate on functional imaging of brain activity, where the measured changes are typically small and measurements are available before and after a small change in the optical properties. Under these conditions, the nonlinear image reconstruction problem can be linearized using the Rytov approximation. However, the problem remains highly ill-posed and underdetermined.

The above methods were initially applied to a deblurring problem, which is an ill-posed problem for which the solution is known. Thereafter, the same methods were applied to experimental optical topography data.

## 1.2.

### Regularization of Inverse Problems

Deblurring and image reconstruction are both discrete ill-posed inverse problems of the form $A\mathbf{x}=\mathbf{b}$ , where $\mathbf{b}$ is the data vector (length $m$ ), $\mathbf{x}$ is the vector of unknown parameters (length $n$ ) and A is a matrix of size $m\times n$ . For the deblurring problem, $\mathbf{b}$ is the blurred image, $\mathbf{x}$ is the original image, and A represents the blurring matrix. For the optical topography problem, $A$ is the sensitivity matrix, which maps the changes in the measured data $\mathbf{b}$ to the changes in the optical properties $\mathbf{x}$ .

The least-squares solution
$\widehat{\mathbf{x}}$
is simply
${\mathrm{min}}_{x}{\Vert A\mathbf{x}-\mathbf{b}\Vert}_{2}$
, but this is highly affected by noise and must therefore be regularized. A common method, and the one we choose to use here, is the zero-order Tikhonov regularization^{6, 7, 8}

## 1.

*data norm*. The norm ${\Vert \mathbf{x}\Vert}_{2}^{2}$ is a measure of the noise in the image and is sometimes called the regularized norm or solution norm. Here it is called the

*image norm*.

If $\lambda $ is increased, then the contribution of the image norm to the solution is increased and the solution becomes less sensitive to perturbations in the data. A smaller $\lambda $ emphasises the contribution from the data norm, effectively assuming that the quality of the data is good; thus, the solution is allowed to conform more closely to the measured data. In the case of Tikhonov regularization, $\lambda $ governs the level of smoothness enforced in the image.

A third norm, the *predictive norm*, is given by
${\Vert A{\mathbf{x}}_{\mathbf{\lambda}}-A{\mathbf{x}}_{\mathbf{\text{exact}}}\Vert}_{2}^{2}$
, where
${\mathbf{x}}_{\mathbf{\text{exact}}}$
is the exact solution. It requires knowledge of the exact solution, which in most real cases is unknown, or knowledge of the noise statistics.^{9}

We briefly mention the singular value decomposition (SVD), a method for reducing a matrix into constituent parts, which also provides a way of analyzing the ill-posedness of a problem.^{8, 10} The SVD of matrix
$A=US{V}^{T}={\sum}_{i=1}^{n}{u}_{i}{\sigma}_{i}{v}_{i}^{T}$
, where
$U$
and
$V$
are square, orthonormal matrices and
$S$
is a
$m\times n$
diagonal (though nonsquare) matrix. The components of
$S$
,
${\sigma}_{i}$
, are known as the *singular values* and are arranged in order of decreasing magnitude. The problem can be regularized either by setting all
${\sigma}_{i}<\lambda =0$
, or by weighting them, for example, by a factor
${f}_{i}={\sigma}_{i}^{2}\u2215({\sigma}_{i}^{2}+{\lambda}^{2})$
which is equivalent to Tikhonov regularization.

The rate at which
${\sigma}_{i}$
decreases is an indication of the ill-posedness of the problem. One measure of this is given by the discrete Picard condition (DPC).^{11}

DEFINITION 1.1. *The data vector*
$b$
*satisfies the discrete Picard condition if the data space coefficients*
$\mid {u}_{i}^{T}\mathbf{b}\mid $
*on average decay to zero faster than the singular values*
${\sigma}_{i}$
.

If the DPC is violated for a given problem, then one should question the validity of the solution. In ill-posed problems, we find that the DPC holds initially and then fails at some point ${i}_{\mathrm{DPC}}$ , where the data become dominated by errors. If this is the case, and if the regularization parameter $\lambda $ is accurately selected, then the regularized solution should provide a valid solution. Examining ${i}_{\mathrm{DPC}}$ provides a method of characterizing the ill-posedness of the problem.

## 2.

## Methods for Selecting Regularization Parameter

## 2.1.

### Criteria

We are seeking a method for selecting the regularization parameter for optical topography. We begin by reviewing a number of methods that have been proposed in the literature, but reject some immediately because they are not suitable for optical topography. Our criteria are as follows:

1. The method should not require any subjective input from the user.

2. The method should only require knowledge that is available during clinical optical topography. For example, it should not require knowledge of the size of the feature being examined.

3. The method should not assume particular features in the image. For example, it should not assume there is a single, spatially isolated change.

## 2.2.

### Heuristic Method

The most straightforward—and most widely used—method for selecting
$\lambda $
is to examine solutions for a range of
$\lambda $
heuristically by eye and to select the one that results in the most acceptable reconstruction. This method is subjective and nonrepeatable.^{12} A common variant is to take
$\lambda $
as being equal to the noise present in the data.^{13} We retain this method in our analysis as a measure against which to compare other, more objective methods.

## 2.3.

### Methods that Optimize Data and Image

## 2.3.1.

#### L-curve

The L-curve is probably the most commonly employed objective method for finding the regularization parameter when solving ill-posed problems.^{8, 14, 15, 16} It is a log-log plot, for different
$\lambda $
, of the image norm against the data norm. We take the value of
$\lambda $
, which corresponds to the point of maximum curvature on the graph, which is normally the point on the graph that is nearest to the origin and so mutually minimizes both the image norm and the data norm.

## 2.3.2.

#### Fixed noise figure (NF)

The noise figure (NF) is the ratio of the signal-to-noise ratio in the measurements to the signal-to-noise ratio in the image.^{12, 17, 18} The regularization parameter is found by plotting NF as a function of
$\lambda $
, and we select the NF whose
$\lambda $
returns the most acceptable solution. This method replaces the selection of a regularization parameter as in the heuristic method by the selection of a fixed NF value. The ratio of the signal-to-noise ratios is more constant across different experimental setups than the heuristic method (i.e., when a fixed NF value is used the regularization parameter can be different for the different image reconstructions. However, we require a fully objective measure and therefore exclude this method from further analysis.

## 2.4.

### Methods that Optimize the Data

## 2.4.1.

#### Generalized cross-validation (GCV)

GCV is based on the principle that, if a data point is omitted, then we should be able to estimate the missing data value from the regularized solution obtained from this reduced data set.^{19, 20} We minimize (GCV)
$\left(\lambda \right)={\Vert A{\mathbf{x}}_{\mathbf{\lambda}}-\mathbf{b}\Vert}_{2}^{2}\u2215{\left[\text{trace}(I-A{A}_{\lambda}^{\u2020})\right]}^{2}$
, where
${A}_{\lambda}^{\u2020}$
is the Tikhonov regularized pseudoinverse of
$A$
. The numerator is the data norm, and the denominator is inversely related to the number of singular values used in the regularized solution. Minimizing this then favors low data norms while penalizing solutions that require many singular values. GCV therefore finds the
$\lambda $
that provides a solution that can fit the data using the smallest possible number of parameters, thereby minimizing the contribution from small singular values.

## 2.4.2.

#### Unbiased predictive risk estimator (UPRE)

The UPRE method seeks to minimize the predictive risk.^{9} The data noise is assumed to be random white noise of known variance. The accuracy of the method therefore depends on the accuracy of the estimate of noise. Furthermore, because the UPRE is an unbiased estimator, its expected value is the same as the expected value of the predictive risk, but it does not necessarily change with
$\lambda $
in the same way as the predictive risk. We cannot therefore guarantee that the solution error is small.

## 2.4.3.

#### Discrepancy principle (DP)

This method selects the
$\lambda $
for which the data norm is equal to the data variance.^{9} Like the UPRE, it depends on our knowledge of the noise statistics: if the data variance is unknown and must be estimated, the method may not necessarily return the optimal regularization parameter. If we do have knowledge of the data variance, then we may see improved performance as this additional information is used.

## 2.4.4.

#### Normalized cumulative periodogram (NCP)

This method favors the regularization parameter for which the residual vector resembles white noise.^{21} It is derived from the periodogram, which is the power spectrum of the residual and is obtained by taking the squares of the absolute values of the discrete Fourier transform for half the residual vector length. The NCP is the cumulative periodogram normalized by the sum of its elements. If the residuals are pure white noise, then the NCP is a straight line; hence, the selected regularization parameter is the one that minimizes the distance of the NCP to a straight line.^{21}

## 2.5.

### Methods that Optimize the Image

## 2.5.1.

#### F-slope

The f-slope is a plot of the image norm against
$\mathrm{ln}(1\u2215\lambda )$
.^{22} We select the
$\lambda $
at the flattest part of the curve, which corresponds to the smallest difference between adjacent solution norms. This method only analyses the image norm and not the data norm.

## 2.5.2.

#### Quasi-optimality criterion (QOC)

The regularization parameter is found by minimizing
${Q}_{\lambda}={\Vert {\lambda}^{2}[d{\mathbf{x}}_{\mathbf{\lambda}}\u2215d\left({\lambda}^{2}\right)]\Vert}_{2}$
.^{8, 23} In an iterative method, this minimizes the difference between the current and previous solutions. In a noniterative approach, QOC minimizes the update to the initial guess.

## 2.5.3.

#### Full width half maximum (FWHM)

The FWHM of the region of contrast is calculated for different regularization parameters.^{24} This method is only applicable to images that contain a single isolated region, and thus, we reject it. Adler and Guardo^{17} proposed an alternative method of defining FWHM known as the blur radius as a method to select
$\lambda $
. This method has the same disadvantages as FWHM; thus, we reject it also.

## 2.5.4.

#### Contrast-to-noise ratio (CNR)

The CNR is plotted as a function of the regularization parameter, where contrast is the ratio of the peak value of the image, after background subtraction, to the background value, and noise is the image norm.^{24} We seek the regularization parameter that maximizes CNR.

Regińska^{25} has shown that the minimum of
${\Psi}_{\alpha}\left(\lambda \right)={\Vert \mathbf{x}\Vert}_{2}\cdot {\Vert A{\mathbf{x}}_{\mathbf{\lambda}}-\mathbf{b}\Vert}_{2}^{\alpha}$
, where
$\alpha >0$
, is similar to the point that minimizes the L-curve. We propose a modification to the CNR method where, rather than maximizing CNR, we maximize CNR.
${\Psi}_{\alpha}^{-1}$
, which simultaneously optimizes both the image norm and the data norm.

## 3.

## Problem 1: Deblurring

## 3.1.

### Method

Deblurring is an example of an ill-posed inverse problem. The function *blur* from the Matlab package “Regularization Tools”^{26} was used to generate the matrix
$A$
, the original image
$x$
(Fig. 1
), with dimensions
$50\times 50\phantom{\rule{0.3em}{0ex}}\text{pixels}$
, and the corresponding blurred image
$b$
to which we added different levels of Gaussian noise, from 5 to 40% and for 500 noise realisations each (Fig. 2
). The smoothing matrix
$A$
is chosen to have properties that make it computationally efficient to handle: it is the Kronecker product of a Toeplitz matrix
$T$
with itself:
$A=T\otimes T$
. The Toeplitz matrix contains in its diagonal elements of a Gaussian point-spread function with variance
${\sigma}^{2}$
, which models the blurring effect, and is a banded matrix, with band
$l$
that defines the number of diagonals, from the main diagonal, which are stored in the matrix
$T$
.^{9, 27} For this test, we have set
$\sigma =3$
and
$\text{band}=5$
.

One convenient property of matrix $A$ is that its SVD depends only on the SVD of the initial Toeplitz matrices. Therefore, if the SVD of $T$ is $T={U}_{b}{S}_{b}{V}_{b}^{T}$ , the regularized solution $\widehat{x}$ is given by

## Eq. 2

$\widehat{x}={V}_{b}\frac{{S}_{t}\left({U}_{b}^{T}b{U}_{b}\right)}{{S}_{t}^{2}+{\lambda}^{2}}{V}_{b}^{T},$The discrete Picard condition is examined in Fig. 3 . The value of $i$ at which ${\sigma}_{i}$ begins to decay more slowly than $\mid {U}_{i}^{T}\mathbf{b}\mid $ is shown by the vertical line. This point is emphasized by examining $\mid {U}_{i}^{T}\mathbf{b}\mid \u2215{\sigma}_{i}$ whose gradient turns positive at the same point. This shows that the DPC is satisfied for ${i}_{\mathrm{DPC}}\lesssim 695$ . For higher values of $i$ , the DPC is no longer satisfied and $\mid {U}_{i}^{T}\mathbf{b}\mid $ reaches the noise level. Because the DPC is at least partially satisfied, it means that we can expect to find a solution that approximately recovers the real solution.

The original image is known; thus, we can find the regularized solution ${\mathbf{x}}_{\lambda}$ that is closest to the real solution ${\mathbf{x}}_{\mathbf{exact}}$ . This allows us to define the optimal regularization parameter ${\lambda}_{\mathrm{opt}}$ to be the one that minimizes the relative error

## Eq. 3

$\epsilon =\frac{{\Vert {x}_{\lambda}-{x}_{\text{exact}}\Vert}_{2}}{{\Vert {x}_{\text{exact}}\Vert}_{2}}.$Figure 4 shows a plot of $\epsilon $ against $\lambda $ , for one realization of 5% Gaussian noise. The minimum occurs for ${\lambda}_{\mathrm{opt}}=0.038$ . The corresponding deblurred and denoised image for ${\lambda}_{\mathrm{opt}}$ is shown in Fig. 5 .

## 3.2.

### Results

The mean optimal regularization parameters, and corresponding standard deviations, for the ten selection methods tested are shown for different noise levels in Table 1 . The corresponding errors calculated from Eq. 3 using the mean values in Table 1 are shown in Table 2 and in Fig. 6 . The regularization parameter was chosen from a set of 1000 logarithmically spaced points between ${10}^{-4}$ and ${10}^{-0.3}$ .

## Table 1

Regularization parameters λ obtained using the selection methods for the deblurring problem with different noise levels.

Method | 5% | 10% | 15% | 20% |
---|---|---|---|---|

Optimal | $0.039\pm 2.6\%$ | $0.066\pm 3.3\%$ | $0.090\pm 4.0\%$ | $0.111\pm 4.2\%$ |

Heuristic | $0.050\pm 60.0\%$ | $0.125\pm 60.0\%$ | $0.130\pm 53.8\%$ | $0.190\pm 57.9\%$ |

L-curve | $0.030\pm 2.7\%$ | $0.065\pm 1.7\%$ | $0.090\pm 1.6\%$ | $0.110\pm 1.5\%$ |

GCV | $0.030\pm 4.0\%$ | $0.047\pm 5.1\%$ | $0.060\pm 5.6\%$ | $0.071\pm 5.8\%$ |

UPRE | $0.030\pm 3.3\%$ | $0.048\pm 4.4\%$ | $0.062\pm 4.4\%$ | $0.073\pm 4.8\%$ |

DP | $0.054\pm 3.2\%$ | $0.087\pm 3.2\%$ | $0.113\pm 3.3\%$ | $0.133\pm 3.1\%$ |

NCP | $0.056\pm 6.4\%$ | $0.088\pm 7.6\%$ | $0.113\pm 8.5\%$ | $0.134\pm 8.7\%$ |

f-slope | $0.070\pm 1.7\%$ | $0.089\pm 1.7\%$ | $0.104\pm 1.8\%$ | $0.116\pm 1.7\%$ |

QOC | $0.121\pm 1.5\%$ | $0.142\pm 1.7\%$ | $0.160\pm 1.8\%$ | $0.176\pm 1.7\%$ |

CNR | $0.014\pm 35.0\%$ | $0.018\pm 57.2\%$ | $0.025\pm 60.0\%$ | $0.033\pm 61.8\%$ |

$\mathrm{CNR}\cdot {\Psi}^{-1}$ | $0.022\pm 3.6\%$ | $0.049\pm 4.5\%$ | $0.077\pm 13.2\%$ | $0.105\pm 11.3\%$ |

Method | 25% | 30% | 35% | 40% |

Optimal | $0.131\pm 4.5\%$ | $0.148\pm 4.2\%$ | $0.164\pm 4.3\%$ | $0.178\pm 4.0\%$ |

Heuristic | $0.190\pm 57.9\%$ | $0.300\pm 66.7\%$ | $0.300\pm 66.7\%$ | $0.350\pm 71.4\%$ |

L-curve | $0.128\pm 1.5\%$ | $0.145\pm 1.5\%$ | $0.159\pm 1.5\%$ | $0.172\pm 1.3\%$ |

GCV | $0.081\pm 6.1\%$ | $0.090\pm 5.7\%$ | $0.099\pm 5.8\%$ | $0.108\pm 5.0\%$ |

UPRE | $0.083\pm 5.1\%$ | $0.092\pm 4.8\%$ | $0.101\pm 4.9\%$ | $0.109\pm 5.0\%$ |

DP | $0.151\pm 3.2\%$ | $0.167\pm 3.1\%$ | $0.181\pm 3.2\%$ | $0.195\pm 3.1\%$ |

NCP | $0.152\pm 9.1\%$ | $0.168\pm 9.6\%$ | $0.183\pm 9.3\%$ | $0.195\pm 9.7\%$ |

f-slope | $0.127\pm 1.8\%$ | $0.138\pm 1.7\%$ | $0.147\pm 1.8\%$ | $0.155\pm 1.7\%$ |

QOC | $0.189\pm 1.7\%$ | $0.202\pm 1.7\%$ | $0.213\pm 1.6\%$ | $0.223\pm 1.7\%$ |

CNR | $0.039\pm 55.4\%$ | $0.044\pm 58.7\%$ | $0.049\pm 53.4\%$ | $0.051\pm 56.4\%$ |

$\mathrm{CNR}\cdot {\Psi}^{-1}$ | $0.129\pm 9.7\%$ | $0.154\pm 2.4\%$ | $0.183\pm 2.8\%$ | $0.190\pm 2.8\%$ |

## Table 2

Relative error for the deblurring problem with different noise levels.

Method | 5% | 10% | 15% | 20% |
---|---|---|---|---|

Optimal | $0.387\pm 1.0\%$ | $0.432\pm 1.2\%$ | $0.454\pm 1.1\%$ | $0.468\pm 1.1\%$ |

Heuristic | $0.393\pm 0.8\%$ | $0.457\pm 0.5\%$ | $0.463\pm 0.7\%$ | $0.485\pm 0.6\%$ |

L-curve | $0.395\pm 1.5\%$ | $0.432\pm 1.2\%$ | $0.454\pm 1.1\%$ | $0.468\pm 1.1\%$ |

GCV | $0.395\pm 1.5\%$ | $0.433\pm 1.2\%$ | $0.477\pm 1.7\%$ | $0.499\pm 1.8\%$ |

UPRE | $0.395\pm 1.5\%$ | $0.432\pm 1.2\%$ | $0.473\pm 1.7\%$ | $0.496\pm 1.8\%$ |

DP | $0.400\pm 0.8\%$ | $0.438\pm 0.9\%$ | $0.458\pm 0.9\%$ | $0.471\pm 0.9\%$ |

NCP | $0.400\pm 0.8\%$ | $0.439\pm 0.7\%$ | $0.456\pm 1.1\%$ | $0.471\pm 0.9\%$ |

f-slope | $0.414\pm 0.5\%$ | $0.439\pm 0.9\%$ | $0.456\pm 0.9\%$ | $0.469\pm 1.1\%$ |

QOC | $0.451\pm 0.2\%$ | $0.463\pm 0.4\%$ | $0.473\pm 0.5\%$ | $0.482\pm 0.6\%$ |

CNR | $0.578\pm 2.8\%$ | $0.853\pm 2.3\%$ | $0.878\pm 2.4\%$ | $0.884\pm 2.4\%$ |

$\mathrm{CNR}\cdot {\Psi}^{-1}$ | $0.403\pm 1.5\%$ | $0.456\pm 1.1\%$ | $0.457\pm 0.9\%$ | $0.472\pm 0.9\%$ |

Method | 25% | 30% | 35% | 40% |

Optimal | $0.479\pm 1.0\%$ | $0.488\pm 1.0\%$ | $0.495\pm 1.2\%$ | $0.502\pm 1.0\%$ |

Heuristic | $0.489\pm 0.6\%$ | $0.525\pm 0.6\%$ | $0.525\pm 0.6\%$ | $0.544\pm 0.6\%$ |

L-curve | $0.479\pm 1.0\%$ | $0.488\pm 1.0\%$ | $0.495\pm 1.2\%$ | $0.502\pm 1.2\%$ |

GCV | $0.513\pm 1.9\%$ | $0.526\pm 1.6\%$ | $0.534\pm 1.2\%$ | $0.541\pm 1.9\%$ |

UPRE | $0.510\pm 2.0\%$ | $0.520\pm 1.7\%$ | $0.528\pm 1.9\%$ | $0.541\pm 1.9\%$ |

DP | $0.481\pm 0.8\%$ | $0.489\pm 0.8\%$ | $0.496\pm 1.0\%$ | $0.503\pm 1.0\%$ |

NCP | $0.481\pm 0.8\%$ | $0.490\pm 0.8\%$ | $0.496\pm 1.0\%$ | $0.503\pm 1.0\%$ |

f-slope | $0.480\pm 1.3\%$ | $0.488\pm 1.2\%$ | $0.497\pm 1.2\%$ | $0.504\pm 1.2\%$ |

QOC | $0.489\pm 0.6\%$ | $0.495\pm 0.8\%$ | $0.501\pm 0.8\%$ | $0.506\pm 1.0\%$ |

CNR | $0.895\pm 2.6\%$ | $0.903\pm 2.6\%$ | $0.935\pm 2.7\%$ | $0.987\pm 2.5\%$ |

$\mathrm{CNR}\cdot {\Psi}^{-1}$ | $0.481\pm 0.8\%$ | $0.489\pm 0.8\%$ | $0.498\pm 1.0\%$ | $0.502\pm 1.0\%$ |

## 3.3.

### Discussion

## 3.3.1.

#### Heuristic method

It was not easy to identify a single $\lambda $ with the heuristic method, and we instead selected a range of $\lambda $ that provide acceptable results. In every case, the range included ${\lambda}_{\mathrm{opt}}$ . The corresponding ranges of errors are quoted in Table 2. Note that for better comparison with the other methods, we chose to display its central value and error, where the latter gives the range limits. The range of $\lambda $ included ${\lambda}_{\mathrm{opt}}$ ; however, the regularization parameter errors are consistently higher than for most of the other methods, illustrating the irreproducibility of the heuristic method.

## 3.3.2.

#### L-curve

The L-curve (plotted Fig. 7 for a 5% noise realization) did generally exhibit a single, easily identifiable point of maximum curvature. The predicted $\lambda $ agree closely with ${\lambda}_{\mathrm{opt}}$ .

Hanke^{28} showed that the L-curve method fails to find
$\lambda $
for very ill-posed solutions. When the singular values decay very rapidly to zero,
${\lambda}_{\mathrm{opt}}$
may occur before the data norm starts increasing, because a large number of singular values have to be included before the data norm increases significantly. Vogel^{29} has shown another nonconvergence of
$\lambda $
, which occurs when the regularized solution fails to converge to the true solution as the dimensions of the problem tend toward infinity. We do not see evidence of either of these two concerns. For the deblurring problem, an image reconstructed with 5% noise was slightly undersmoothed, with a visible noise component. However, all the results were acceptable, as confirmed by the small relative errors (Table 2) and by the small regularization parameter error (Table 1). The latter reflects the small sensitivity of this method to perturbations of similar nature in the data.

## 3.3.3.

#### GCV

The minimum value for this function was easily identifiable, although it was located on a relatively flat region of the curve. The selected $\lambda $ was similar to ${\lambda}_{\mathrm{opt}}$ at low noise levels but was rather low at higher noise levels.

The GCV method has been shown to perform well in different situations. However, it can sometimes be difficult to locate its minimum as the function may be flat near
${\lambda}_{\mathrm{opt}}$
or display multiple minima.^{30} Hansen and O’Leary^{15} presented an example where GCV failed to find
$\lambda $
due to the flatness of the curve. However, we were able to locate a single minimum for all our test cases from 5 to 40% noise.

## 3.3.4.

#### UPRE

The UPRE function did exhibit an easily identified minimum. However, to calculate UPRE it is necessary to find the SVD of the blur matrix, which, although it was a reasonable calculation in this case due to the carefully chosen blurring matrix, in general, could be prohibitively computationally intensive. The performance of UPRE appears to be similar to that of GCV, with good estimates of
$\lambda $
at low noise levels but an underestimated
$\lambda $
at higher noise levels. Vogel^{9} used a two-dimensional deblurring problem with added white noise to compare these two methods, and found that the UPRE and GCV methods were similar.

## 3.3.5.

#### DP

This method appeared to overestimate values of
$\lambda $
, confirming the results of Thompson
^{31} and Hansen.^{8}

## 3.3.6.

#### NCP

This method and the DP have similar performances, as expected, because they both select the regularization parameter such that the data norm is equal to the noise variance. However, this method seems to be more sensitive to the different noise realizations, in particular for noise levels above
$25\%$
. Hansen
^{21} found that the NCP gives better results than the GCV method, which is confirmed here at higher noise levels.

## 3.3.7.

#### f-Slope

The f-slope curve clearly shows a flat part, which is less sensitive to perturbations. This method appears to overestimate $\lambda $ at low noise levels but selects $\lambda $ close to ${\lambda}_{\mathrm{opt}}$ at higher noise levels.

It has been claimed^{22} that the f-slope method can perform better than the L-curve method, particularly when little regularization is needed. However, this work mainly looked at much smaller
$\lambda $
than in our deblurring problem (as low as
${10}^{-15}$
) and it is unclear how the results translate to our more ill-posed problem. Using f-slope, we were able to reconstruct an image that is close to the true image when the noise in the data is equal to or higher than 15% but for smaller noise levels, it tended to overregularize the solution.

## 3.3.8.

#### QOC

The regularization parameter $\lambda $ selected by this method was invariably higher than ${\lambda}_{\mathrm{opt}}$ . However, the performance of QOC appeared to improve at higher noise levels.

Hansen^{8} compared the L-curve, the GCV, and the QOC methods by applying them to six test problems with different data perturbations but with the same noise level. He found that the QOC method generally oversmooths and that there may be a problem with local minima. We saw the same oversmoothing effect but did not observe local minima.

## 3.3.9.

#### CNR

There was a single clear global maximum. However, for all noise levels, this method found $\lambda <{\lambda}_{\mathrm{opt}}$ , and generated much greater relative errors than for any of the other methods, and the errors associated with $\lambda $ are as large as those corresponding to the heuristic method.

## 3.3.10.

#### CNR $\cdot {\Psi}^{-1}$

The addition of the $\Psi $ parameter led to $\lambda $ which were closer to ${\lambda}_{\mathrm{opt}}$ than for CNR alone. The method still tended to underestimate $\lambda $ for low noise levels, but for higher noise levels, the selection of $\lambda $ was good. Note that the regularization parameter uncertainty is particularly large for noise in the data between 15 and 25%.

## 4.

## Problem 2: Optical Topography

## 4.1.

### Method

Data were obtained with the University College of London (UCL) optical topography system with light sources at
$770\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
^{32}using a test object consisting of absorbing targets within a tank filled with a solution with tissue-equivalent optical properties (the absorption coefficient
${\mu}_{\mathrm{a}}$
was
$0.01\phantom{\rule{0.3em}{0ex}}{\mathrm{mm}}^{-1}$
and the reduced scattering coefficient
${\mu}_{\mathrm{s}}^{\prime}$
was
$1\phantom{\rule{0.3em}{0ex}}{\mathrm{mm}}^{-1}$
). A cylindrical absorbing target (radius
$5\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$
and height
$10\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$
) was made with the same
${\mu}_{\mathrm{s}}^{\prime}$
as the solution but with twice the background absorption
$({\mu}_{\mathrm{a}}=0.02\phantom{\rule{0.3em}{0ex}}{\mathrm{mm}}^{-1})$
.

The optical topography array consisted of eight sources and eight detectors from which 64 measurements were made.^{33} The array was placed on one of the walls of the tank, which is made of epoxy resin with the same optical properties as the solution, and is
$2\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$
thick. The target was positioned in the center of the array, which should provide higher sensitivity and resolution, and at a depth of
$10\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$
. Data were collected for
$20\phantom{\rule{0.3em}{0ex}}\mathrm{s}$
and averaged to reduce the noise, and a baseline measurement was acquired with no target present.

The software package TOAST (temporal optical absorption and scattering tomography), which has been developed by Prof. S. R. Arridge and Dr. M. Schweiger at the UCL, models the propagations of light in highly scattering media and was used to generate the sensitivity matrix
$A$
. Finally, images were reconstructed using a 3D linear model.^{13}

## 4.2.

### Results

The results for all the methods are summarized in Table 3 . The $\lambda $ values used are the same as before.

## Table 3

Regularization parameters for experimental data using different selection methods.

Method | λ |
---|---|

Heuristic | [0.004, 0.015] |

L-curve | 0.0066 |

GCV | 0.0085 |

UPRE | 0.0071 |

DP | 0.0061 |

NCP | 0.0032 |

f-slope | — |

QOC | — |

CNR | 0.0051 |

$\mathrm{CNR}\cdot {\Psi}^{-1}$ | 0.0068 |

${\lambda}_{\u03f5{\mu}_{\alpha}}$ | 0.0071 |

## 4.2.1.

#### DPC

Figure 8 shows that the DPC was partially satisfied, and thus, regularization should be able to give a stable solution. The vertical line represents ${i}_{\mathrm{DPC}}$ , where for $i<{i}_{\mathrm{DPC}}$ the data space coefficients $\mid {u}_{i}^{T}\mathbf{b}\mid $ decay faster than the singular values ${\sigma}_{i}$ , and for $i>{i}_{\mathrm{DPC}}$ the data space coefficients reach a level determined by the perturbations in the data; thus the DPC is no longer satisfied.

## 4.3.

### Discussion

## 4.3.1.

#### Heuristic method

The range of $\lambda $ that generated acceptable images, which were relatively noise-free, with adequate spatial localization, and which placed the target in the middle of the image was $\lambda \u220a[0.004,0.015]$ .

For this phantom study, we know the optical properties and target position with an accuracy of 5–10%. Figure 9 shows the absorption coefficient error $\epsilon {\mu}_{\mathrm{a}}$ , which is the difference between the maximum ${\mu}_{\mathrm{a}}$ , calculated from the reconstructed images for regularization parameters found with the heuristic method in steps of 0.001, and the target ${\mu}_{\mathrm{a}}$ . We refer to the point where the error is zero, ${\lambda}_{\u03f5{\mu}_{\mathrm{a}}}=0.0071$ , as the optimal regularization parameter. An image reconstructed using $\lambda ={\lambda}_{\u03f5{\mu}_{\mathrm{a}}}=0.0071$ , at $10\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ depth, is shown in Fig. 10 . The target size in the image, measured by the FWHM, is $10\pm 3.5\times 14\pm 3.5\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ , which is not too different from the real target dimensions. The target size is approximately constant for the selected range of regularization parameters.

The values of $\lambda $ found by the other selection methods are compared to ${\lambda}_{\u03f5{\mu}_{\mathrm{a}}}$ below. Table 3 shows all $\lambda $ values.

## 4.3.2.

#### L-curve

The L-curve did not exhibit a pronounced corner, but it was still possible to calculate the point of maximum curvature. The failure to find a sharp corner was due to the high ill-posedness of this problem (as shown by the exponential decay of singular values in Figure 8). However,
$\lambda $
at the point of maximum curvature of the L-curve was closer than any of the other methods to
${\lambda}_{\u03f5{\mu}_{\mathrm{a}}}$
. The L-curve method has been used previously in optical imaging, for simulated and real data, and its results are considered to generate acceptable images.^{34, 35, 36}

## 4.3.3.

#### GCV

The value for
$\lambda $
found using GCV was the highest of all the methods. Hansen^{16} found that the GCV method fails to compute a useful solution when errors are highly correlated. In the presence of uncorrelated errors, this method gives a slightly overregularized
$\lambda $
. Our result could be explained by a low presence of correlated errors.

## 4.3.4.

#### UPRE

UPRE found a very shallow minimum at $\lambda ={\lambda}_{\u03f5{\mu}_{\mathrm{a}}}$ . The noise variance was assumed to be the point at which $i={i}_{\mathrm{DPC}}$ in Fig. 8. At this point, ${u}_{i}^{T}b$ reaches the noise floor and no longer decreases.

## 4.3.5.

#### DP

The L-curve, GCV, DP, and UPRE methods have previously been compared for simulated, phantom, and clinical data in electrical impedance tomography.^{37} All the methods were successful for simulated and clinical data, whereas for data acquired on a test phantom, the DP and UPRE methods failed to converge. Overall, the preferred method was the GCV. For optical topography data, both DP and UPRE methods converged and the predicted
$\lambda $
agreed with the values found heuristically.

## 4.3.6.

#### NCP

It is necessary to preprocess data so that all the variables have zero mean and unit variance. In optical topography, we use the difference imaging approach, where the baseline is subtracted from the data; hence, all the measurements should have zero mean, but different variance. Whitening could be accomplished by dividing each measurement by the respective standard deviation. This method returns a very low
$\lambda $
, which is not included in the values found heuristically. If we set the Kolmogorov–Smirnoff limits^{38} to a significance level of 5%, which is equivalent to a 95% confidence level, and choose the largest regularization parameter, then we obtain
$\lambda =0.0056$
, which is a more reasonable result.

## 4.3.7.

#### f-slope

The solution norm monotonically increased with
$\mathrm{ln}(1\u2215\lambda )$
; thus, it was not possible to identify the minimum slope. To our knowledge the f-slope method has not been tested previously on real data, only for testing models where it has a good performance.^{22} This method failed when applied to our optical topography problem, probably due to its ill-posedness.

## 4.3.8.

#### QOC

The QOC monotonically decreased with $\mathrm{log}\left(\lambda \right)$ and failed to show a reliable minimum, and consequently, this method failed.

## 4.3.9.

#### CNR

To calculate the contrast, it is necessary to know ${\mu}_{{\mathrm{a}}_{\mathrm{max}}}$ and ${\mu}_{{\mathrm{a}}_{\mathrm{bkg}}}$ . ${\mu}_{{\mathrm{a}}_{\mathrm{max}}}$ . The value of ${\mu}_{{\text{a}}_{\text{max}}}$ was taken to be the mean of the pixel of maximum intensity in the image and its eight nearest neighbors in the $xy$ plane. The value of ${\mu}_{{\mathrm{a}}_{\mathrm{bkg}}}$ was the average of $24\phantom{\rule{0.3em}{0ex}}\text{pixels}$ furthest from the peak in the image. This method gave the lowest $\lambda $ of all the methods, and artifacts were present in reconstructed images.

## 4.3.10.

#### CNR ${\Psi}^{-1}$

As before, including $\Psi $ gave a much more realistic estimate of $\lambda $ than CNR alone, which was very close to ${\lambda}_{\u03f5{\mu}_{\mathrm{a}}}$ .

The regularization parameter selection methods were tested on two further data sets, obtained using the same liquid phantom, but targets with different absorption coefficient and the results were consistent with those shown here.

## 5.

## Conclusion

Optimizing $\lambda $ is critical when reconstructing diffuse optical images because it controls the smoothness of the regularized solution and balances the influence of the noise present in the image against its accuracy. We have examined a number of methods for selecting $\lambda $ . Diffuse optical imaging is a challenging ill-posed and underdetermined problem, and the correct target image is unknown, making it difficult to validate the image quality. We therefore initially tested the methods by applying them to a simpler ill-posed problem, the deblurring problem, where we can control the amount of blur and noise applied to the data and where the exact solution is known (and so ${\lambda}_{\mathrm{opt}}$ can be found). We believe that if a method fails to produce a good regularized solution for the deblurring problem, we cannot rely on it for optical topography. However, if a good solution is found for the deblurring problem, then that method will not necessarily be reliable for the more demanding optical problem.

In Sec. 2.1, we list three criteria that a selection method must meet. On the basis of these criteria, we reject the heuristic method, the fixed noise figure method, and methods related to optimizing the FWHM of the image. All the remaining methods performed acceptably well for the deblurring problem. However, the f-slope and QOC failed to converge for the more demanding optical topography problem.

Of the remaining methods, the L-curve consistently demonstrated the lowest error on the deblurring problem (see Tables 1, 2). It is easy to implement and simultaneously minimizes both the data norm and the image norm. On the other hand, the DP and UPRE methods require knowledge of the noise variance, which may not always be available. However, the DPC seems to provide a good estimate of the noise level in optical topography. The NCP method has the advantage of selecting
$\lambda $
automatically; hence, it does not directly require an estimate of the noise variance. Nevertheless, this method is sensitive to error fluctuations and only gives reasonable results for the optical topography problem under certain assumptions. DP, UPRE, NCP, and GCV only consider the data norm. The CNR method gave poor results but was much more successful when
$\Psi $
was minimized simultaneously. The use of
$\Psi $
in this way, as proposed by Regińska^{25} and developed further here, could be applied to other methods and may be worth further examination. However, here we conclude that L-curve is the optimal selection method for optical topography.

Thus far, we have studied ideal or almost ideal cases, whereas *in vivo* studies suffer further sources of error. These include motion artifacts, changes in the contact between the optodes and the skin, which can result in intensity fluctuations in the collected data, and detection of light that has not passed through the investigated medium. All these effects produce correlated errors, which are not necessarily apparent in the raw data. The L-curve has been shown to perform well in the presence of these type of errors.^{15, 16} However, as mentioned previously, Hanke^{28} showed that the L-curve method may fail to find a good regularization parameter when the solutions are very smooth. Under these circumstances, GCV should be investigated as an alternative to L-curve.

## Acknowledgments

The work has been supported by a scholarship awarded to T.C. by Fundação para a Ciência e a Tecnologia, Portugal. Thanks to Dr. Simon Arridge, from the Department of Computer Science, for useful discussions.

## References

**,” Psychophysiology, 40 (4), 501 –510 (2003). https://doi.org/10.1111/1469-8986.00052 0048-5772 Google Scholar**

*Advances in optical imaging of the newborn infant brain***,” J. Biomed. Opt., 12 (6), 062104 (2007). https://doi.org/10.1117/1.2804899 1083-3668 Google Scholar**

*Progress of near-infrared spectroscopy and topography for brain and muscle clinical applications***,” J. Biomed. Opt., 4 (4), 414 –417 (1999). https://doi.org/10.1117/1.429940 1083-3668 Google Scholar**

*Measurement system for noninvasive dynamic optical topography***,” Neuroimage, 13 (1), 76 –90 (2001). https://doi.org/10.1006/nimg.2000.0674 1053-8119 Google Scholar**

*The accuracy of near infrared spectroscopy and imaging during focal changes in cerebral hemodynamics***,” Opt. Lett., 29 (13), 1506 –1508 (2004). https://doi.org/10.1364/OL.29.001506 0146-9592 Google Scholar**

*Improving the diffuse optical imaging spatial resolution of the cerebral hemodynamic response to brain activation in humans***,” SIAM J. Matrix Anal. Appl., 21 (1), 185 –194 (1999). https://doi.org/10.1137/S0895479897326432 0895-4798 Google Scholar**

*Tikhonov regularization and total least squares***,” BIT Numer. Math., 30 (4), 658 –672 (1990). https://doi.org/10.1007/BF01933214 Google Scholar**

*The discrete Picard condition of discrete ill-posed problems***,” Physiol. Meas, 27 S65 –S79 (2006). https://doi.org/10.1088/0967-3334/27/5/S06 0967-3334 Google Scholar**

*Objective selection of hyperparameter for EIT***,” Neuroimage, 30 521 –528 (2006). https://doi.org/10.1016/j.neuroimage.2005.08.059 1053-8119 Google Scholar**

*Three-dimensional whole-head optical tomography of passive motor evoked responses in the neonate***,” Computational Inverse Problems in Electrocardiology, 119 –142 (2001) Google Scholar**

*The L-curve and its use in the numerical treatment of inverse problems***,” SIAM J. Sci. Comput. (USA), 14 (6), 1487 –1503 (1993). https://doi.org/10.1137/0914086 1064-8275 Google Scholar**

*The use of the L-curve in the regularization of discrete ill-posed problems***,” SIAM Rev., 34 (4), 561 –580 (1992). https://doi.org/10.1137/1034115 0036-1445 Google Scholar**

*Analysis of discrete ill-posed problems by means of the L-curve***,” IEEE Trans. Med. Imaging, 15 (2), 170 –179 (1996). https://doi.org/10.1109/42.491418 0278-0062 Google Scholar**

*Electrical impedance tomography: regularized imaging and contrast detection***,” Physiol. Meas, 25 227 –238 (2004). https://doi.org/10.1088/0967-3334/25/1/028 0967-3334 Google Scholar**

*Accounting for erroneous electrode data in electrical impedance tomography***,” Technometrics, 21 (2), 215 –223 (1979). https://doi.org/10.2307/1268518 0040-1706 Google Scholar**

*Generalized-cross validation as a method for choosing a good ridge parameter***,” BIT Numer. Math., 46 41 –59 (2006). https://doi.org/10.1007/s10543-006-0042-7 Google Scholar**

*Exploiting residual information in the parameter choice for discrete ill-posed problems***,” Electron. Trans. Numer. Anal., 16 107 –128 (2003). 1097-4067 Google Scholar**

*A parameter choice method for Tikhonov regularization***,” BIT Numer. Math., 41 (5), 1049 –1058 (2001). https://doi.org/10.1023/A:1021949530676 Google Scholar**

*Regularization using QR factorization and the estimation of the optimal parameter***,” Med. Phys., 30 (2), 235 –247 (2003). https://doi.org/10.1118/1.1534109 0094-2405 Google Scholar**

*Three-dimensional diffuse optical tomography in the parallel plane transmission geometry: evaluation of a hybrid frequency domain/continuous wave clinical system for breast imaging***,” SIAM J. Sci. Comput. (USA), 17 (3), 740 –749 (1996). https://doi.org/10.1137/S1064827593252672 1064-8275 Google Scholar**

*A regularization parameter in discrete ill-posed problems***,” Numer. Algorithms, 6 (1–2), 1 –35 (1994). https://doi.org/10.1007/BF02149761 1017-1398 Google Scholar**

*Regularization tools: a Matlab package for analysis and solution of discrete ill-posed problems***,” Numer. Algorithms, 29 (56), 323 –378 (2002). https://doi.org/10.1023/A:1015222829062 1017-1398 Google Scholar**

*Deconvolution and regularization with Toeplitz matrices***,” BIT Numer. Math., 36 (2), 287 –301 (1996). https://doi.org/10.1007/BF01731984 Google Scholar**

*Limitations of the L-curve method in ill-posed problems***,” Inverse Probl., 12 (4), 535 –547 (1996). https://doi.org/10.1088/0266-5611/12/4/013 0266-5611 Google Scholar**

*Nonconvergence of the L-curve regularization parameter selection method***,” J. Stat. Comput. Simul., 33 (4), 199 –216 (1989). https://doi.org/10.1080/00949658908811198 0094-9655 Google Scholar**

*A cautionary note about crossvalidatory choice***,” IEEE Trans. Pattern Anal. Mach. Intell., 13 (4), 326 –339 (1991). https://doi.org/10.1109/34.88568 0162-8828 Google Scholar**

*A study of methods of choosing the smoothing parameter in image restoration by regularization***,” Rev. Sci. Instrum., 76 (9), 093705.1-5 (2005). https://doi.org/10.1063/1.2038567 0034-6748 Google Scholar**

*A frequency multiplexed near infrared topography system for imaging functional activation in the brain***,” Phys. Med. Biol., 52 6849 –6864 (2007). https://doi.org/10.1088/0031-9155/52/23/005 0031-9155 Google Scholar**

*Depth dependent changes in cerebral haemodynamics during face perception in infants***,” Proc. SPIE, The International Society for OPtical Engineering, 55 –66 (2000) Google Scholar**

*Comparison of linear reconstruction techniques for 3D DPDW imaging of absorption coefficient***,” Opt. Express, 14 1125 –1144 (2006). https://doi.org/10.1364/OE.14.001125 1094-4087 Google Scholar**

*Diffuse optical correlation tomography of cerebral blood flow during cortical spreading depression in rat brain***,” J. Biomed. Opt., 11 (6), 064019 (2006). https://doi.org/10.1117/1.2400703 1083-3668 Google Scholar**

*Improving performance of reflectance diffuse optical imaging using a multicentered mode***,” Physiol. Meas, 29 1319 –1334 (2009). 0967-3334 Google Scholar**

*Comparison of methods for optimal choice of the regularization parameter for linear electrical impedance tomography of brain function*