Selection of regularization parameter for optical topography

Teresa M. Correia; Adam P. Gibson; Martin Schweiger; Jeremy C. Hebden

doi:10.1117/1.3156839

1 May 2009 Selection of regularization parameter for optical topography

Teresa M. Correia, Adam P. Gibson, Martin Schweiger, Jeremy C. Hebden

Author Affiliations +

Journal of Biomedical Optics, Vol. 14, Issue 3, 034044 (May 2009). https://doi.org/10.1117/1.3156839

Abstract

The choice of the regularization parameter has a profound effect on the solution of ill-posed inverse problems such as optical topography. We review 11 different methods for selecting the Tikhonov regularization parameter that have been described previously in the literature. We test them on two trial problems, deblurring and optical topography, and conclude that the L-curve method is the method of choice, though in particularly ill-posed problems, generalized cross-validation may provide an alternative.

1. Introduction

1.1.

Image Reconstruction and Regularization

Optical topography uses diffuse reflectance measurements on the surface of an object to derive its internal optical properties.^{1, 2} It can be used to measure functional brain activity, because the optical properties depend on the concentration of oxyhemoglobin and deoxyhemoglobin. The first optical topography images, such as those obtained using the Hitachi ETG-100 system,³ were obtained by interpolating the measurements into the image space. However, it is becoming increasingly common for optical topography images to be reconstructed by solving the inverse problem. Boas ^{4, 5} has shown that this provides improved spatial resolution and quantitative accuracy compared to interpolation. However, the inverse problem is ill-posed, which means that there is not a single, well-behaved solution. In order to obtain a meaningful solution, the problem must be regularized, for example, using Tikhonov regularization.

Tikhonov regularization requires a regularization parameter, here called $λ$ . This is most often determined heuristically, by subjectively selecting a value of $λ$ that appears by eye to give the “best” image. A number of more objective methods have been proposed for selecting $λ$ in optical topography and other inverse problems. In this paper, we review 11 methods that have been proposed for selecting $λ$ and apply them to the optical topography inverse problem. We concentrate on functional imaging of brain activity, where the measured changes are typically small and measurements are available before and after a small change in the optical properties. Under these conditions, the nonlinear image reconstruction problem can be linearized using the Rytov approximation. However, the problem remains highly ill-posed and underdetermined.

The above methods were initially applied to a deblurring problem, which is an ill-posed problem for which the solution is known. Thereafter, the same methods were applied to experimental optical topography data.

1.2.

Regularization of Inverse Problems

Deblurring and image reconstruction are both discrete ill-posed inverse problems of the form $A x = b$ , where $b$ is the data vector (length $m$ ), $x$ is the vector of unknown parameters (length $n$ ) and A is a matrix of size $m \times n$ . For the deblurring problem, $b$ is the blurred image, $x$ is the original image, and A represents the blurring matrix. For the optical topography problem, $A$ is the sensitivity matrix, which maps the changes in the measured data $b$ to the changes in the optical properties $x$ .

The least-squares solution $\hat{x}$ is simply $\min_{x} {‖ A x - b ‖}_{2}$ , but this is highly affected by noise and must therefore be regularized. A common method, and the one we choose to use here, is the zero-order Tikhonov regularization^{6, 7, 8}

1.

x_{λ} = \arg \min_{x} {{‖ A x - b ‖}_{2}^{2} + λ^{2} {‖ x ‖}_{2}^{2}},

= {(A^{T} A + λ^{2} I)}^{- 1} A^{T} b,

where

{‖ A x - b ‖}_{2}

is a measure of the difference between the measured data

b

and the data that would be obtained if the solution image was used to simulate data. It is sometimes called the least-squares error or the residual norm. We choose to call it the data norm. The norm

{‖ x ‖}_{2}^{2}

is a measure of the noise in the image and is sometimes called the regularized norm or solution norm. Here it is called the image norm.

If $λ$ is increased, then the contribution of the image norm to the solution is increased and the solution becomes less sensitive to perturbations in the data. A smaller $λ$ emphasises the contribution from the data norm, effectively assuming that the quality of the data is good; thus, the solution is allowed to conform more closely to the measured data. In the case of Tikhonov regularization, $λ$ governs the level of smoothness enforced in the image.

A third norm, the predictive norm, is given by ${‖ A x_{λ} - A x_{exact} ‖}_{2}^{2}$ , where $x_{exact}$ is the exact solution. It requires knowledge of the exact solution, which in most real cases is unknown, or knowledge of the noise statistics.⁹

We briefly mention the singular value decomposition (SVD), a method for reducing a matrix into constituent parts, which also provides a way of analyzing the ill-posedness of a problem.^{8, 10} The SVD of matrix $A = U S V^{T} = \sum_{i = 1}^{n} u_{i} σ_{i} v_{i}^{T}$ , where $U$ and $V$ are square, orthonormal matrices and $S$ is a $m \times n$ diagonal (though nonsquare) matrix. The components of $S$ , $σ_{i}$ , are known as the singular values and are arranged in order of decreasing magnitude. The problem can be regularized either by setting all $σ_{i} < λ = 0$ , or by weighting them, for example, by a factor $f_{i} = σ_{i}^{2} ∕ (σ_{i}^{2} + λ^{2})$ which is equivalent to Tikhonov regularization.

The rate at which $σ_{i}$ decreases is an indication of the ill-posedness of the problem. One measure of this is given by the discrete Picard condition (DPC).¹¹

DEFINITION 1.1. The data vector $b$ satisfies the discrete Picard condition if the data space coefficients $∣ u_{i}^{T} b ∣$ on average decay to zero faster than the singular values $σ_{i}$ .

If the DPC is violated for a given problem, then one should question the validity of the solution. In ill-posed problems, we find that the DPC holds initially and then fails at some point $i_{DPC}$ , where the data become dominated by errors. If this is the case, and if the regularization parameter $λ$ is accurately selected, then the regularized solution should provide a valid solution. Examining $i_{DPC}$ provides a method of characterizing the ill-posedness of the problem.

2. Methods for Selecting Regularization Parameter

2.1.

Criteria

We are seeking a method for selecting the regularization parameter for optical topography. We begin by reviewing a number of methods that have been proposed in the literature, but reject some immediately because they are not suitable for optical topography. Our criteria are as follows:

1. The method should not require any subjective input from the user.
2. The method should only require knowledge that is available during clinical optical topography. For example, it should not require knowledge of the size of the feature being examined.
3. The method should not assume particular features in the image. For example, it should not assume there is a single, spatially isolated change.

2.2.

Heuristic Method

The most straightforward—and most widely used—method for selecting $λ$ is to examine solutions for a range of $λ$ heuristically by eye and to select the one that results in the most acceptable reconstruction. This method is subjective and nonrepeatable.¹² A common variant is to take $λ$ as being equal to the noise present in the data.¹³ We retain this method in our analysis as a measure against which to compare other, more objective methods.

2.3.

Methods that Optimize Data and Image

2.3.1.

L-curve

The L-curve is probably the most commonly employed objective method for finding the regularization parameter when solving ill-posed problems.^{8, 14, 15, 16} It is a log-log plot, for different $λ$ , of the image norm against the data norm. We take the value of $λ$ , which corresponds to the point of maximum curvature on the graph, which is normally the point on the graph that is nearest to the origin and so mutually minimizes both the image norm and the data norm.

2.3.2.

Fixed noise figure (NF)

The noise figure (NF) is the ratio of the signal-to-noise ratio in the measurements to the signal-to-noise ratio in the image.^{12, 17, 18} The regularization parameter is found by plotting NF as a function of $λ$ , and we select the NF whose $λ$ returns the most acceptable solution. This method replaces the selection of a regularization parameter as in the heuristic method by the selection of a fixed NF value. The ratio of the signal-to-noise ratios is more constant across different experimental setups than the heuristic method (i.e., when a fixed NF value is used the regularization parameter can be different for the different image reconstructions. However, we require a fully objective measure and therefore exclude this method from further analysis.

2.4.

Methods that Optimize the Data

2.4.1.

Generalized cross-validation (GCV)

GCV is based on the principle that, if a data point is omitted, then we should be able to estimate the missing data value from the regularized solution obtained from this reduced data set.^{19, 20} We minimize (GCV) $(λ) = {‖ A x_{λ} - b ‖}_{2}^{2} ∕ {[trace (I - A A_{λ}^{†})]}^{2}$ , where $A_{λ}^{†}$ is the Tikhonov regularized pseudoinverse of $A$ . The numerator is the data norm, and the denominator is inversely related to the number of singular values used in the regularized solution. Minimizing this then favors low data norms while penalizing solutions that require many singular values. GCV therefore finds the $λ$ that provides a solution that can fit the data using the smallest possible number of parameters, thereby minimizing the contribution from small singular values.

2.4.2.

Unbiased predictive risk estimator (UPRE)

The UPRE method seeks to minimize the predictive risk.⁹ The data noise is assumed to be random white noise of known variance. The accuracy of the method therefore depends on the accuracy of the estimate of noise. Furthermore, because the UPRE is an unbiased estimator, its expected value is the same as the expected value of the predictive risk, but it does not necessarily change with $λ$ in the same way as the predictive risk. We cannot therefore guarantee that the solution error is small.

2.4.3.

Discrepancy principle (DP)

This method selects the $λ$ for which the data norm is equal to the data variance.⁹ Like the UPRE, it depends on our knowledge of the noise statistics: if the data variance is unknown and must be estimated, the method may not necessarily return the optimal regularization parameter. If we do have knowledge of the data variance, then we may see improved performance as this additional information is used.

2.4.4.

Normalized cumulative periodogram (NCP)

This method favors the regularization parameter for which the residual vector resembles white noise.²¹ It is derived from the periodogram, which is the power spectrum of the residual and is obtained by taking the squares of the absolute values of the discrete Fourier transform for half the residual vector length. The NCP is the cumulative periodogram normalized by the sum of its elements. If the residuals are pure white noise, then the NCP is a straight line; hence, the selected regularization parameter is the one that minimizes the distance of the NCP to a straight line.²¹

2.5.

Methods that Optimize the Image

2.5.1.

F-slope

The f-slope is a plot of the image norm against $\ln (1 ∕ λ)$ .²² We select the $λ$ at the flattest part of the curve, which corresponds to the smallest difference between adjacent solution norms. This method only analyses the image norm and not the data norm.

2.5.2.

Quasi-optimality criterion (QOC)

The regularization parameter is found by minimizing $Q_{λ} = {‖ λ^{2} [d x_{λ} ∕ d (λ^{2})] ‖}_{2}$ .^{8, 23} In an iterative method, this minimizes the difference between the current and previous solutions. In a noniterative approach, QOC minimizes the update to the initial guess.

2.5.3.

Full width half maximum (FWHM)

The FWHM of the region of contrast is calculated for different regularization parameters.²⁴ This method is only applicable to images that contain a single isolated region, and thus, we reject it. Adler and Guardo¹⁷ proposed an alternative method of defining FWHM known as the blur radius as a method to select $λ$ . This method has the same disadvantages as FWHM; thus, we reject it also.

2.5.4.

Contrast-to-noise ratio (CNR)

The CNR is plotted as a function of the regularization parameter, where contrast is the ratio of the peak value of the image, after background subtraction, to the background value, and noise is the image norm.²⁴ We seek the regularization parameter that maximizes CNR.

Regińska²⁵ has shown that the minimum of $Ψ_{α} (λ) = {‖ x ‖}_{2} \cdot {‖ A x_{λ} - b ‖}_{2}^{α}$ , where $α > 0$ , is similar to the point that minimizes the L-curve. We propose a modification to the CNR method where, rather than maximizing CNR, we maximize CNR. $Ψ_{α}^{- 1}$ , which simultaneously optimizes both the image norm and the data norm.

3. Problem 1: Deblurring

3.1.

Method

Deblurring is an example of an ill-posed inverse problem. The function blur from the Matlab package “Regularization Tools”²⁶ was used to generate the matrix $A$ , the original image $x$ (Fig. 1 ), with dimensions $50 \times 50 pixels$ , and the corresponding blurred image $b$ to which we added different levels of Gaussian noise, from 5 to 40% and for 500 noise realisations each (Fig. 2 ). The smoothing matrix $A$ is chosen to have properties that make it computationally efficient to handle: it is the Kronecker product of a Toeplitz matrix $T$ with itself: $A = T \otimes T$ . The Toeplitz matrix contains in its diagonal elements of a Gaussian point-spread function with variance $σ^{2}$ , which models the blurring effect, and is a banded matrix, with band $l$ that defines the number of diagonals, from the main diagonal, which are stored in the matrix $T$ .^{9, 27} For this test, we have set $σ = 3$ and $band = 5$ .

Fig. 1

Original test image.

Fig. 2

Blurred image with 5% added Gaussian noise.

One convenient property of matrix $A$ is that its SVD depends only on the SVD of the initial Toeplitz matrices. Therefore, if the SVD of $T$ is $T = U_{b} S_{b} V_{b}^{T}$ , the regularized solution $\hat{x}$ is given by

Eq. 2

\hat{x} = V_{b} \frac{S_{t} (U_{b}^{T} b U_{b})}{S_{t}^{2} + λ^{2}} V_{b}^{T},

where

S_{t} = diag (S_{b}) diag (S_{b}^{T})

.

The discrete Picard condition is examined in Fig. 3 . The value of $i$ at which $σ_{i}$ begins to decay more slowly than $∣ U_{i}^{T} b ∣$ is shown by the vertical line. This point is emphasized by examining $∣ U_{i}^{T} b ∣ ∕ σ_{i}$ whose gradient turns positive at the same point. This shows that the DPC is satisfied for $i_{DPC} ≲ 695$ . For higher values of $i$ , the DPC is no longer satisfied and $∣ U_{i}^{T} b ∣$ reaches the noise level. Because the DPC is at least partially satisfied, it means that we can expect to find a solution that approximately recovers the real solution.

Fig. 3

Discrete Picard condition for the deblurring problem, where the vertical dashed line marks the beginning of $∣ U_{i}^{T} b ∣ < σ_{i}$ and the horizontal line represents the noise level. DPC is satisfied for $i_{DPC} ≲ 695$ .

The original image is known; thus, we can find the regularized solution $x_{λ}$ that is closest to the real solution $x_{exact}$ . This allows us to define the optimal regularization parameter $λ_{opt}$ to be the one that minimizes the relative error

Eq. 3

ε = \frac{{‖ x_{λ} - x_{exact} ‖}_{2}}{{‖ x_{exact} ‖}_{2}} .

Figure 4 shows a plot of $ε$ against $λ$ , for one realization of 5% Gaussian noise. The minimum occurs for $λ_{opt} = 0.038$ . The corresponding deblurred and denoised image for $λ_{opt}$ is shown in Fig. 5 .

Fig. 4

Relative error for the regularised solution for a single realization of 5% noise. The $λ_{opt}$ value is marked by the cross.

Fig. 5

Reconstructed image with $λ_{opt} = 0.038$ .

3.2.

Results

The mean optimal regularization parameters, and corresponding standard deviations, for the ten selection methods tested are shown for different noise levels in Table 1 . The corresponding errors calculated from Eq. 3 using the mean values in Table 1 are shown in Table 2 and in Fig. 6 . The regularization parameter was chosen from a set of 1000 logarithmically spaced points between $10^{- 4}$ and $10^{- 0.3}$ .

Fig. 6

Table 1

Regularization parameters λ obtained using the selection methods for the deblurring problem with different noise levels.

Method	5%	10%	15%	20%
Optimal	$0.039 \pm 2.6 %$	$0.066 \pm 3.3 %$	$0.090 \pm 4.0 %$	$0.111 \pm 4.2 %$
Heuristic	$0.050 \pm 60.0 %$	$0.125 \pm 60.0 %$	$0.130 \pm 53.8 %$	$0.190 \pm 57.9 %$
L-curve	$0.030 \pm 2.7 %$	$0.065 \pm 1.7 %$	$0.090 \pm 1.6 %$	$0.110 \pm 1.5 %$
GCV	$0.030 \pm 4.0 %$	$0.047 \pm 5.1 %$	$0.060 \pm 5.6 %$	$0.071 \pm 5.8 %$
UPRE	$0.030 \pm 3.3 %$	$0.048 \pm 4.4 %$	$0.062 \pm 4.4 %$	$0.073 \pm 4.8 %$
DP	$0.054 \pm 3.2 %$	$0.087 \pm 3.2 %$	$0.113 \pm 3.3 %$	$0.133 \pm 3.1 %$
NCP	$0.056 \pm 6.4 %$	$0.088 \pm 7.6 %$	$0.113 \pm 8.5 %$	$0.134 \pm 8.7 %$
f-slope	$0.070 \pm 1.7 %$	$0.089 \pm 1.7 %$	$0.104 \pm 1.8 %$	$0.116 \pm 1.7 %$
QOC	$0.121 \pm 1.5 %$	$0.142 \pm 1.7 %$	$0.160 \pm 1.8 %$	$0.176 \pm 1.7 %$
CNR	$0.014 \pm 35.0 %$	$0.018 \pm 57.2 %$	$0.025 \pm 60.0 %$	$0.033 \pm 61.8 %$
$CNR \cdot Ψ^{- 1}$	$0.022 \pm 3.6 %$	$0.049 \pm 4.5 %$	$0.077 \pm 13.2 %$	$0.105 \pm 11.3 %$
Method	25%	30%	35%	40%
Optimal	$0.131 \pm 4.5 %$	$0.148 \pm 4.2 %$	$0.164 \pm 4.3 %$	$0.178 \pm 4.0 %$
Heuristic	$0.190 \pm 57.9 %$	$0.300 \pm 66.7 %$	$0.300 \pm 66.7 %$	$0.350 \pm 71.4 %$
L-curve	$0.128 \pm 1.5 %$	$0.145 \pm 1.5 %$	$0.159 \pm 1.5 %$	$0.172 \pm 1.3 %$
GCV	$0.081 \pm 6.1 %$	$0.090 \pm 5.7 %$	$0.099 \pm 5.8 %$	$0.108 \pm 5.0 %$
UPRE	$0.083 \pm 5.1 %$	$0.092 \pm 4.8 %$	$0.101 \pm 4.9 %$	$0.109 \pm 5.0 %$
DP	$0.151 \pm 3.2 %$	$0.167 \pm 3.1 %$	$0.181 \pm 3.2 %$	$0.195 \pm 3.1 %$
NCP	$0.152 \pm 9.1 %$	$0.168 \pm 9.6 %$	$0.183 \pm 9.3 %$	$0.195 \pm 9.7 %$
f-slope	$0.127 \pm 1.8 %$	$0.138 \pm 1.7 %$	$0.147 \pm 1.8 %$	$0.155 \pm 1.7 %$
QOC	$0.189 \pm 1.7 %$	$0.202 \pm 1.7 %$	$0.213 \pm 1.6 %$	$0.223 \pm 1.7 %$
CNR	$0.039 \pm 55.4 %$	$0.044 \pm 58.7 %$	$0.049 \pm 53.4 %$	$0.051 \pm 56.4 %$
$CNR \cdot Ψ^{- 1}$	$0.129 \pm 9.7 %$	$0.154 \pm 2.4 %$	$0.183 \pm 2.8 %$	$0.190 \pm 2.8 %$

Table 2

Relative error for the deblurring problem with different noise levels.

Method	5%	10%	15%	20%
Optimal	$0.387 \pm 1.0 %$	$0.432 \pm 1.2 %$	$0.454 \pm 1.1 %$	$0.468 \pm 1.1 %$
Heuristic	$0.393 \pm 0.8 %$	$0.457 \pm 0.5 %$	$0.463 \pm 0.7 %$	$0.485 \pm 0.6 %$
L-curve	$0.395 \pm 1.5 %$	$0.432 \pm 1.2 %$	$0.454 \pm 1.1 %$	$0.468 \pm 1.1 %$
GCV	$0.395 \pm 1.5 %$	$0.433 \pm 1.2 %$	$0.477 \pm 1.7 %$	$0.499 \pm 1.8 %$
UPRE	$0.395 \pm 1.5 %$	$0.432 \pm 1.2 %$	$0.473 \pm 1.7 %$	$0.496 \pm 1.8 %$
DP	$0.400 \pm 0.8 %$	$0.438 \pm 0.9 %$	$0.458 \pm 0.9 %$	$0.471 \pm 0.9 %$
NCP	$0.400 \pm 0.8 %$	$0.439 \pm 0.7 %$	$0.456 \pm 1.1 %$	$0.471 \pm 0.9 %$
f-slope	$0.414 \pm 0.5 %$	$0.439 \pm 0.9 %$	$0.456 \pm 0.9 %$	$0.469 \pm 1.1 %$
QOC	$0.451 \pm 0.2 %$	$0.463 \pm 0.4 %$	$0.473 \pm 0.5 %$	$0.482 \pm 0.6 %$
CNR	$0.578 \pm 2.8 %$	$0.853 \pm 2.3 %$	$0.878 \pm 2.4 %$	$0.884 \pm 2.4 %$
$CNR \cdot Ψ^{- 1}$	$0.403 \pm 1.5 %$	$0.456 \pm 1.1 %$	$0.457 \pm 0.9 %$	$0.472 \pm 0.9 %$
Method	25%	30%	35%	40%
Optimal	$0.479 \pm 1.0 %$	$0.488 \pm 1.0 %$	$0.495 \pm 1.2 %$	$0.502 \pm 1.0 %$
Heuristic	$0.489 \pm 0.6 %$	$0.525 \pm 0.6 %$	$0.525 \pm 0.6 %$	$0.544 \pm 0.6 %$
L-curve	$0.479 \pm 1.0 %$	$0.488 \pm 1.0 %$	$0.495 \pm 1.2 %$	$0.502 \pm 1.2 %$
GCV	$0.513 \pm 1.9 %$	$0.526 \pm 1.6 %$	$0.534 \pm 1.2 %$	$0.541 \pm 1.9 %$
UPRE	$0.510 \pm 2.0 %$	$0.520 \pm 1.7 %$	$0.528 \pm 1.9 %$	$0.541 \pm 1.9 %$
DP	$0.481 \pm 0.8 %$	$0.489 \pm 0.8 %$	$0.496 \pm 1.0 %$	$0.503 \pm 1.0 %$
NCP	$0.481 \pm 0.8 %$	$0.490 \pm 0.8 %$	$0.496 \pm 1.0 %$	$0.503 \pm 1.0 %$
f-slope	$0.480 \pm 1.3 %$	$0.488 \pm 1.2 %$	$0.497 \pm 1.2 %$	$0.504 \pm 1.2 %$
QOC	$0.489 \pm 0.6 %$	$0.495 \pm 0.8 %$	$0.501 \pm 0.8 %$	$0.506 \pm 1.0 %$
CNR	$0.895 \pm 2.6 %$	$0.903 \pm 2.6 %$	$0.935 \pm 2.7 %$	$0.987 \pm 2.5 %$
$CNR \cdot Ψ^{- 1}$	$0.481 \pm 0.8 %$	$0.489 \pm 0.8 %$	$0.498 \pm 1.0 %$	$0.502 \pm 1.0 %$

3.3.

Discussion

3.3.1.

Heuristic method

It was not easy to identify a single $λ$ with the heuristic method, and we instead selected a range of $λ$ that provide acceptable results. In every case, the range included $λ_{opt}$ . The corresponding ranges of errors are quoted in Table 2. Note that for better comparison with the other methods, we chose to display its central value and error, where the latter gives the range limits. The range of $λ$ included $λ_{opt}$ ; however, the regularization parameter errors are consistently higher than for most of the other methods, illustrating the irreproducibility of the heuristic method.