Image Reconstruction and Regularization
Optical topography uses diffuse reflectance measurements on the surface of an object to derive its internal optical properties.1, 2 It can be used to measure functional brain activity, because the optical properties depend on the concentration of oxyhemoglobin and deoxyhemoglobin. The first optical topography images, such as those obtained using the Hitachi ETG-100 system,3 were obtained by interpolating the measurements into the image space. However, it is becoming increasingly common for optical topography images to be reconstructed by solving the inverse problem. Boas 4, 5 has shown that this provides improved spatial resolution and quantitative accuracy compared to interpolation. However, the inverse problem is ill-posed, which means that there is not a single, well-behaved solution. In order to obtain a meaningful solution, the problem must be regularized, for example, using Tikhonov regularization.
Tikhonov regularization requires a regularization parameter, here called . This is most often determined heuristically, by subjectively selecting a value of that appears by eye to give the “best” image. A number of more objective methods have been proposed for selecting in optical topography and other inverse problems. In this paper, we review 11 methods that have been proposed for selecting and apply them to the optical topography inverse problem. We concentrate on functional imaging of brain activity, where the measured changes are typically small and measurements are available before and after a small change in the optical properties. Under these conditions, the nonlinear image reconstruction problem can be linearized using the Rytov approximation. However, the problem remains highly ill-posed and underdetermined.
The above methods were initially applied to a deblurring problem, which is an ill-posed problem for which the solution is known. Thereafter, the same methods were applied to experimental optical topography data.
Regularization of Inverse Problems
Deblurring and image reconstruction are both discrete ill-posed inverse problems of the form , where is the data vector (length ), is the vector of unknown parameters (length ) and A is a matrix of size . For the deblurring problem, is the blurred image, is the original image, and A represents the blurring matrix. For the optical topography problem, is the sensitivity matrix, which maps the changes in the measured data to the changes in the optical properties .
The least-squares solution is simply , but this is highly affected by noise and must therefore be regularized. A common method, and the one we choose to use here, is the zero-order Tikhonov regularization6, 7, 8is a measure of the difference between the measured data and the data that would be obtained if the solution image was used to simulate data. It is sometimes called the least-squares error or the residual norm. We choose to call it the data norm. The norm is a measure of the noise in the image and is sometimes called the regularized norm or solution norm. Here it is called the image norm.
If is increased, then the contribution of the image norm to the solution is increased and the solution becomes less sensitive to perturbations in the data. A smaller emphasises the contribution from the data norm, effectively assuming that the quality of the data is good; thus, the solution is allowed to conform more closely to the measured data. In the case of Tikhonov regularization, governs the level of smoothness enforced in the image.
A third norm, the predictive norm, is given by , where is the exact solution. It requires knowledge of the exact solution, which in most real cases is unknown, or knowledge of the noise statistics.9
We briefly mention the singular value decomposition (SVD), a method for reducing a matrix into constituent parts, which also provides a way of analyzing the ill-posedness of a problem.8, 10 The SVD of matrix , where and are square, orthonormal matrices and is a diagonal (though nonsquare) matrix. The components of , , are known as the singular values and are arranged in order of decreasing magnitude. The problem can be regularized either by setting all , or by weighting them, for example, by a factor which is equivalent to Tikhonov regularization.
The rate at which decreases is an indication of the ill-posedness of the problem. One measure of this is given by the discrete Picard condition (DPC).11
DEFINITION 1.1. The data vector satisfies the discrete Picard condition if the data space coefficients on average decay to zero faster than the singular values .
If the DPC is violated for a given problem, then one should question the validity of the solution. In ill-posed problems, we find that the DPC holds initially and then fails at some point , where the data become dominated by errors. If this is the case, and if the regularization parameter is accurately selected, then the regularized solution should provide a valid solution. Examining provides a method of characterizing the ill-posedness of the problem.
Methods for Selecting Regularization Parameter
We are seeking a method for selecting the regularization parameter for optical topography. We begin by reviewing a number of methods that have been proposed in the literature, but reject some immediately because they are not suitable for optical topography. Our criteria are as follows:
1. The method should not require any subjective input from the user.
2. The method should only require knowledge that is available during clinical optical topography. For example, it should not require knowledge of the size of the feature being examined.
3. The method should not assume particular features in the image. For example, it should not assume there is a single, spatially isolated change.
The most straightforward—and most widely used—method for selecting is to examine solutions for a range of heuristically by eye and to select the one that results in the most acceptable reconstruction. This method is subjective and nonrepeatable.12 A common variant is to take as being equal to the noise present in the data.13 We retain this method in our analysis as a measure against which to compare other, more objective methods.
Methods that Optimize Data and Image
The L-curve is probably the most commonly employed objective method for finding the regularization parameter when solving ill-posed problems.8, 14, 15, 16 It is a log-log plot, for different , of the image norm against the data norm. We take the value of , which corresponds to the point of maximum curvature on the graph, which is normally the point on the graph that is nearest to the origin and so mutually minimizes both the image norm and the data norm.
Fixed noise figure (NF)
The noise figure (NF) is the ratio of the signal-to-noise ratio in the measurements to the signal-to-noise ratio in the image.12, 17, 18 The regularization parameter is found by plotting NF as a function of , and we select the NF whose returns the most acceptable solution. This method replaces the selection of a regularization parameter as in the heuristic method by the selection of a fixed NF value. The ratio of the signal-to-noise ratios is more constant across different experimental setups than the heuristic method (i.e., when a fixed NF value is used the regularization parameter can be different for the different image reconstructions. However, we require a fully objective measure and therefore exclude this method from further analysis.
Methods that Optimize the Data
Generalized cross-validation (GCV)
GCV is based on the principle that, if a data point is omitted, then we should be able to estimate the missing data value from the regularized solution obtained from this reduced data set.19, 20 We minimize (GCV) , where is the Tikhonov regularized pseudoinverse of . The numerator is the data norm, and the denominator is inversely related to the number of singular values used in the regularized solution. Minimizing this then favors low data norms while penalizing solutions that require many singular values. GCV therefore finds the that provides a solution that can fit the data using the smallest possible number of parameters, thereby minimizing the contribution from small singular values.
Unbiased predictive risk estimator (UPRE)
The UPRE method seeks to minimize the predictive risk.9 The data noise is assumed to be random white noise of known variance. The accuracy of the method therefore depends on the accuracy of the estimate of noise. Furthermore, because the UPRE is an unbiased estimator, its expected value is the same as the expected value of the predictive risk, but it does not necessarily change with in the same way as the predictive risk. We cannot therefore guarantee that the solution error is small.
Discrepancy principle (DP)
This method selects the for which the data norm is equal to the data variance.9 Like the UPRE, it depends on our knowledge of the noise statistics: if the data variance is unknown and must be estimated, the method may not necessarily return the optimal regularization parameter. If we do have knowledge of the data variance, then we may see improved performance as this additional information is used.
Normalized cumulative periodogram (NCP)
This method favors the regularization parameter for which the residual vector resembles white noise.21 It is derived from the periodogram, which is the power spectrum of the residual and is obtained by taking the squares of the absolute values of the discrete Fourier transform for half the residual vector length. The NCP is the cumulative periodogram normalized by the sum of its elements. If the residuals are pure white noise, then the NCP is a straight line; hence, the selected regularization parameter is the one that minimizes the distance of the NCP to a straight line.21
Methods that Optimize the Image
The f-slope is a plot of the image norm against .22 We select the at the flattest part of the curve, which corresponds to the smallest difference between adjacent solution norms. This method only analyses the image norm and not the data norm.
Quasi-optimality criterion (QOC)
The regularization parameter is found by minimizing .8, 23 In an iterative method, this minimizes the difference between the current and previous solutions. In a noniterative approach, QOC minimizes the update to the initial guess.
Full width half maximum (FWHM)
The FWHM of the region of contrast is calculated for different regularization parameters.24 This method is only applicable to images that contain a single isolated region, and thus, we reject it. Adler and Guardo17 proposed an alternative method of defining FWHM known as the blur radius as a method to select . This method has the same disadvantages as FWHM; thus, we reject it also.
Contrast-to-noise ratio (CNR)
The CNR is plotted as a function of the regularization parameter, where contrast is the ratio of the peak value of the image, after background subtraction, to the background value, and noise is the image norm.24 We seek the regularization parameter that maximizes CNR.
Regińska25 has shown that the minimum of , where , is similar to the point that minimizes the L-curve. We propose a modification to the CNR method where, rather than maximizing CNR, we maximize CNR. , which simultaneously optimizes both the image norm and the data norm.
Problem 1: Deblurring
Deblurring is an example of an ill-posed inverse problem. The function blur from the Matlab package “Regularization Tools”26 was used to generate the matrix , the original image (Fig. 1 ), with dimensions , and the corresponding blurred image to which we added different levels of Gaussian noise, from 5 to 40% and for 500 noise realisations each (Fig. 2 ). The smoothing matrix is chosen to have properties that make it computationally efficient to handle: it is the Kronecker product of a Toeplitz matrix with itself: . The Toeplitz matrix contains in its diagonal elements of a Gaussian point-spread function with variance , which models the blurring effect, and is a banded matrix, with band that defines the number of diagonals, from the main diagonal, which are stored in the matrix .9, 27 For this test, we have set and .
One convenient property of matrix is that its SVD depends only on the SVD of the initial Toeplitz matrices. Therefore, if the SVD of is , the regularized solution is given by.
The discrete Picard condition is examined in Fig. 3 . The value of at which begins to decay more slowly than is shown by the vertical line. This point is emphasized by examining whose gradient turns positive at the same point. This shows that the DPC is satisfied for . For higher values of , the DPC is no longer satisfied and reaches the noise level. Because the DPC is at least partially satisfied, it means that we can expect to find a solution that approximately recovers the real solution.
The original image is known; thus, we can find the regularized solution that is closest to the real solution . This allows us to define the optimal regularization parameter to be the one that minimizes the relative error
The mean optimal regularization parameters, and corresponding standard deviations, for the ten selection methods tested are shown for different noise levels in Table 1 . The corresponding errors calculated from Eq. 3 using the mean values in Table 1 are shown in Table 2 and in Fig. 6 . The regularization parameter was chosen from a set of 1000 logarithmically spaced points between and .
Regularization parameters λ obtained using the selection methods for the deblurring problem with different noise levels.
Relative error for the deblurring problem with different noise levels.
It was not easy to identify a single with the heuristic method, and we instead selected a range of that provide acceptable results. In every case, the range included . The corresponding ranges of errors are quoted in Table 2. Note that for better comparison with the other methods, we chose to display its central value and error, where the latter gives the range limits. The range of included ; however, the regularization parameter errors are consistently higher than for most of the other methods, illustrating the irreproducibility of the heuristic method.
The L-curve (plotted Fig. 7 for a 5% noise realization) did generally exhibit a single, easily identifiable point of maximum curvature. The predicted agree closely with .
Hanke28 showed that the L-curve method fails to find for very ill-posed solutions. When the singular values decay very rapidly to zero, may occur before the data norm starts increasing, because a large number of singular values have to be included before the data norm increases significantly. Vogel29 has shown another nonconvergence of , which occurs when the regularized solution fails to converge to the true solution as the dimensions of the problem tend toward infinity. We do not see evidence of either of these two concerns. For the deblurring problem, an image reconstructed with 5% noise was slightly undersmoothed, with a visible noise component. However, all the results were acceptable, as confirmed by the small relative errors (Table 2) and by the small regularization parameter error (Table 1). The latter reflects the small sensitivity of this method to perturbations of similar nature in the data.
The minimum value for this function was easily identifiable, although it was located on a relatively flat region of the curve. The selected was similar to at low noise levels but was rather low at higher noise levels.
The GCV method has been shown to perform well in different situations. However, it can sometimes be difficult to locate its minimum as the function may be flat near or display multiple minima.30 Hansen and O’Leary15 presented an example where GCV failed to find due to the flatness of the curve. However, we were able to locate a single minimum for all our test cases from 5 to 40% noise.
The UPRE function did exhibit an easily identified minimum. However, to calculate UPRE it is necessary to find the SVD of the blur matrix, which, although it was a reasonable calculation in this case due to the carefully chosen blurring matrix, in general, could be prohibitively computationally intensive. The performance of UPRE appears to be similar to that of GCV, with good estimates of at low noise levels but an underestimated at higher noise levels. Vogel9 used a two-dimensional deblurring problem with added white noise to compare these two methods, and found that the UPRE and GCV methods were similar.
This method and the DP have similar performances, as expected, because they both select the regularization parameter such that the data norm is equal to the noise variance. However, this method seems to be more sensitive to the different noise realizations, in particular for noise levels above . Hansen 21 found that the NCP gives better results than the GCV method, which is confirmed here at higher noise levels.
The f-slope curve clearly shows a flat part, which is less sensitive to perturbations. This method appears to overestimate at low noise levels but selects close to at higher noise levels.
It has been claimed22 that the f-slope method can perform better than the L-curve method, particularly when little regularization is needed. However, this work mainly looked at much smaller than in our deblurring problem (as low as ) and it is unclear how the results translate to our more ill-posed problem. Using f-slope, we were able to reconstruct an image that is close to the true image when the noise in the data is equal to or higher than 15% but for smaller noise levels, it tended to overregularize the solution.
The regularization parameter selected by this method was invariably higher than . However, the performance of QOC appeared to improve at higher noise levels.
Hansen8 compared the L-curve, the GCV, and the QOC methods by applying them to six test problems with different data perturbations but with the same noise level. He found that the QOC method generally oversmooths and that there may be a problem with local minima. We saw the same oversmoothing effect but did not observe local minima.
There was a single clear global maximum. However, for all noise levels, this method found , and generated much greater relative errors than for any of the other methods, and the errors associated with are as large as those corresponding to the heuristic method.
The addition of the parameter led to which were closer to than for CNR alone. The method still tended to underestimate for low noise levels, but for higher noise levels, the selection of was good. Note that the regularization parameter uncertainty is particularly large for noise in the data between 15 and 25%.
Problem 2: Optical Topography
Data were obtained with the University College of London (UCL) optical topography system with light sources at 32using a test object consisting of absorbing targets within a tank filled with a solution with tissue-equivalent optical properties (the absorption coefficient was and the reduced scattering coefficient was ). A cylindrical absorbing target (radius and height ) was made with the same as the solution but with twice the background absorption .
The optical topography array consisted of eight sources and eight detectors from which 64 measurements were made.33 The array was placed on one of the walls of the tank, which is made of epoxy resin with the same optical properties as the solution, and is thick. The target was positioned in the center of the array, which should provide higher sensitivity and resolution, and at a depth of . Data were collected for and averaged to reduce the noise, and a baseline measurement was acquired with no target present.
The software package TOAST (temporal optical absorption and scattering tomography), which has been developed by Prof. S. R. Arridge and Dr. M. Schweiger at the UCL, models the propagations of light in highly scattering media and was used to generate the sensitivity matrix . Finally, images were reconstructed using a 3D linear model.13
The results for all the methods are summarized in Table 3 . The values used are the same as before.
Regularization parameters for experimental data using different selection methods.
Figure 8 shows that the DPC was partially satisfied, and thus, regularization should be able to give a stable solution. The vertical line represents , where for the data space coefficients decay faster than the singular values , and for the data space coefficients reach a level determined by the perturbations in the data; thus the DPC is no longer satisfied.
The range of that generated acceptable images, which were relatively noise-free, with adequate spatial localization, and which placed the target in the middle of the image was .
For this phantom study, we know the optical properties and target position with an accuracy of 5–10%. Figure 9 shows the absorption coefficient error , which is the difference between the maximum , calculated from the reconstructed images for regularization parameters found with the heuristic method in steps of 0.001, and the target . We refer to the point where the error is zero, , as the optimal regularization parameter. An image reconstructed using , at depth, is shown in Fig. 10 . The target size in the image, measured by the FWHM, is , which is not too different from the real target dimensions. The target size is approximately constant for the selected range of regularization parameters.
The values of found by the other selection methods are compared to below. Table 3 shows all values.
The L-curve did not exhibit a pronounced corner, but it was still possible to calculate the point of maximum curvature. The failure to find a sharp corner was due to the high ill-posedness of this problem (as shown by the exponential decay of singular values in Figure 8). However, at the point of maximum curvature of the L-curve was closer than any of the other methods to . The L-curve method has been used previously in optical imaging, for simulated and real data, and its results are considered to generate acceptable images.34, 35, 36
The value for found using GCV was the highest of all the methods. Hansen16 found that the GCV method fails to compute a useful solution when errors are highly correlated. In the presence of uncorrelated errors, this method gives a slightly overregularized . Our result could be explained by a low presence of correlated errors.
UPRE found a very shallow minimum at . The noise variance was assumed to be the point at which in Fig. 8. At this point, reaches the noise floor and no longer decreases.
The L-curve, GCV, DP, and UPRE methods have previously been compared for simulated, phantom, and clinical data in electrical impedance tomography.37 All the methods were successful for simulated and clinical data, whereas for data acquired on a test phantom, the DP and UPRE methods failed to converge. Overall, the preferred method was the GCV. For optical topography data, both DP and UPRE methods converged and the predicted agreed with the values found heuristically.
It is necessary to preprocess data so that all the variables have zero mean and unit variance. In optical topography, we use the difference imaging approach, where the baseline is subtracted from the data; hence, all the measurements should have zero mean, but different variance. Whitening could be accomplished by dividing each measurement by the respective standard deviation. This method returns a very low , which is not included in the values found heuristically. If we set the Kolmogorov–Smirnoff limits38 to a significance level of 5%, which is equivalent to a 95% confidence level, and choose the largest regularization parameter, then we obtain , which is a more reasonable result.
The solution norm monotonically increased with ; thus, it was not possible to identify the minimum slope. To our knowledge the f-slope method has not been tested previously on real data, only for testing models where it has a good performance.22 This method failed when applied to our optical topography problem, probably due to its ill-posedness.
The QOC monotonically decreased with and failed to show a reliable minimum, and consequently, this method failed.
To calculate the contrast, it is necessary to know and . . The value of was taken to be the mean of the pixel of maximum intensity in the image and its eight nearest neighbors in the plane. The value of was the average of furthest from the peak in the image. This method gave the lowest of all the methods, and artifacts were present in reconstructed images.
As before, including gave a much more realistic estimate of than CNR alone, which was very close to .
The regularization parameter selection methods were tested on two further data sets, obtained using the same liquid phantom, but targets with different absorption coefficient and the results were consistent with those shown here.
Optimizing is critical when reconstructing diffuse optical images because it controls the smoothness of the regularized solution and balances the influence of the noise present in the image against its accuracy. We have examined a number of methods for selecting . Diffuse optical imaging is a challenging ill-posed and underdetermined problem, and the correct target image is unknown, making it difficult to validate the image quality. We therefore initially tested the methods by applying them to a simpler ill-posed problem, the deblurring problem, where we can control the amount of blur and noise applied to the data and where the exact solution is known (and so can be found). We believe that if a method fails to produce a good regularized solution for the deblurring problem, we cannot rely on it for optical topography. However, if a good solution is found for the deblurring problem, then that method will not necessarily be reliable for the more demanding optical problem.
In Sec. 2.1, we list three criteria that a selection method must meet. On the basis of these criteria, we reject the heuristic method, the fixed noise figure method, and methods related to optimizing the FWHM of the image. All the remaining methods performed acceptably well for the deblurring problem. However, the f-slope and QOC failed to converge for the more demanding optical topography problem.
Of the remaining methods, the L-curve consistently demonstrated the lowest error on the deblurring problem (see Tables 1, 2). It is easy to implement and simultaneously minimizes both the data norm and the image norm. On the other hand, the DP and UPRE methods require knowledge of the noise variance, which may not always be available. However, the DPC seems to provide a good estimate of the noise level in optical topography. The NCP method has the advantage of selecting automatically; hence, it does not directly require an estimate of the noise variance. Nevertheless, this method is sensitive to error fluctuations and only gives reasonable results for the optical topography problem under certain assumptions. DP, UPRE, NCP, and GCV only consider the data norm. The CNR method gave poor results but was much more successful when was minimized simultaneously. The use of in this way, as proposed by Regińska25 and developed further here, could be applied to other methods and may be worth further examination. However, here we conclude that L-curve is the optimal selection method for optical topography.
Thus far, we have studied ideal or almost ideal cases, whereas in vivo studies suffer further sources of error. These include motion artifacts, changes in the contact between the optodes and the skin, which can result in intensity fluctuations in the collected data, and detection of light that has not passed through the investigated medium. All these effects produce correlated errors, which are not necessarily apparent in the raw data. The L-curve has been shown to perform well in the presence of these type of errors.15, 16 However, as mentioned previously, Hanke28 showed that the L-curve method may fail to find a good regularization parameter when the solutions are very smooth. Under these circumstances, GCV should be investigated as an alternative to L-curve.
The work has been supported by a scholarship awarded to T.C. by Fundação para a Ciência e a Tecnologia, Portugal. Thanks to Dr. Simon Arridge, from the Department of Computer Science, for useful discussions.