## 1.

## Introduction

Objective assessment of image quality through task-specific metrics has a long history in medical imaging and is regarded by many as ultimately being the most meaningful approach to medical image evaluation.^{1}2.3.^{–}^{4} However, the application of task-based assessment to x-ray computed tomography (CT) is recent relative to its application to planar imaging modalities and nuclear medicine. One reason for this delay is that metrics based on the Hotelling observer (HO),^{3} such as those considered in this work, involve the image covariance matrix, and in CT, this matrix is often extremely large (well over ${10}^{9}$ elements), poorly conditioned, and possesses few, if any, simplifying structural properties. In order to address the challenge of large dimensionality, efficient channels have been proposed^{5}6.^{–}^{7} which essentially constitute a transformation of the image into a new basis, where the number of basis functions is substantially less than the number of image pixels. Another common means of circumventing the dimensionality problem is to assume noise stationarity, so that HO metrics can be obtained with relative computational efficiency through discrete Fourier transform (DFT) operations (see Sec. 2). Meanwhile, in order to address the somewhat unpredictable structure of the image covariance, various estimation strategies have been proposed which rely on samples of noisy images in order to construct an estimate for the image covariance when an analytic formulation of image covariance is impossible or infeasible.^{8}^{,}^{9}

Building on previous work,^{10} we propose a formalism, which, in certain cases, may be more appropriate than the use of efficient channels or estimation techniques, while still addressing the issues of dimensionality and structural complexity of the image covariance matrix. Our approach differs from alternative methods in that the resulting metrics are nonstochastic, and we impose no restrictions on the structure of the signal or on the nature of the noise correlations. Instead, similar to Refs. 11–12.13, we only assume that the relevant noise correlations and signal extent can be restricted to a region of interest (ROI) in the image. In order to investigate the impact and validity of various assumptions that simplify the computation of HO metrics, we herein compare the results of parameter optimization using our proposed method to two alternatives: a set of efficient [Laguerre-Gauss (LG)] channels and a DFT domain approach. Each of these approaches corresponds to a different assumption regarding the signal, the image noise, or both. We then also investigate two statistical estimation strategies for HO metrics and compare these results with our proposed method.

The specific context in which our formalism is intended to be applied is detection or classification tasks where the object of interest is small (on the order of several pixels), and the reconstruction algorithm is a direct, linear algorithm, such as filtered backprojection (FBP). In this case, we hypothesize that most of the relevant information for performing the detection or classification task is likely to be contained in pixels within a small ROI surrounding the signal. For direct linear algorithms, an analytic form of the image covariance matrix for a small ROI can be constructed and its elements can be stored directly in computer memory so long as the ROI contains at most several thousand pixels. While we consider only two-dimensional (2-D) reconstruction in this work, the extension to three-dimensional (3-D) reconstruction is straightforward so long as the corresponding 3-D ROI contains only several thousand voxels.

Since the metrics evaluated in this work are task-specific, we restrict ourselves to two relevant tasks in a specific CT application, namely dedicated breast CT. The tasks considered are microcalcification detection and the Rayleigh discrimination task, which measures the ability of an observer to distinguish between a single object and two smaller objects. The specific metric we investigate is the HO efficiency, which is a summary scalar metric describing the preservation of relevant information from the projection data measurements to the reconstructed image. The mathematical construction of this metric is described in Sec. 2, along with a description of the two imaging tasks considered. Section 3 demonstrates the use of the proposed ROI HO method for the optimization of reconstruction filter width in the FBP algorithm. Section 3 also contains results for the application of a channelized HO (CHO), a DFT-domain HO, and two estimation approaches to the same parameter optimization. Finally, a discussion and brief conclusion are included in Sec. 4.

## 2.

## Methods

## 2.1.

### Task Modeling

Two tasks are considered in this work. The first is a Rayleigh resolution task, wherein an observer must classify an image as either an image of a single 1.4 mm bar convolved with a Gaussian with a full width at half maximum (FWHM) of 0.53 mm or two distinct Gaussians of $\mathrm{FWHM}=0.53\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, separated by a 0.8 mm trough. These two signals were modeled on a discrete grid with pixels an order of magnitude smaller than the detector pixels backprojected to the center of the field of view (FOV). The two signals are shown in Fig. 1.

The second task is a signal detection task, where we model a microcalcification as a Gaussian with FWHM of $80\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$. Here the two classes into which an observer classifies the image are a signal-present class and a signal-absent class. For both tasks, we operate under the signal-known-exactly-but-variable paradigm,^{3} where three signal locations are considered: the center of the FOV, the position (2 cm, 2 cm), and the position (4 cm, 4 cm); however, results among the three signal locations are not substantially different.

## 2.2.

### Breast CT Simulation

Since the tasks considered in this work involve small signals, all of the relevant geometric and scanning parameters used in the breast CT simulations are based on Ref. 14, the purpose of which is to characterize spatial resolution properties of a dedicated breast CT scanner. In a separate submitted work,^{15} we fully explore parameter ranges for this system and compare with other work in optimization of breast CT systems. However, we here investigate only a few parameter settings and instead focus on HO performance estimation methods. To summarize, we simulate a 40 cm flat panel detector, rebinned to a sampling of 1024 detector bins. The source-to-isocenter distance is 45.8 cm, and the source-to-detector distance is 87.8 cm. The FOV is restricted to 16.59 cm, and this is sampled on a $512\times 512$ image pixel grid, for an image pixel size of 0.324 mm. Meanwhile, the detector element size, backprojected into the center of the FOV, is $\sim 0.2\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$.

The simulated x-ray spectrum corresponds to an 80 kVp setting with added Be and Al filtration of 0.8 and 2.5 mm, respectively. Methods from Refs. 1617.–18 were used in simulation of the x-ray spectrum. Finite detector bin size is modeled by subsampling each detector bin at 16 evenly spaced locations. The peak attenuation value for both types of signals considered is that of calcium. The attenuation of the background medium is the mean attenuation of a breast composed of 50% adipose tissue and 50% glandular tissue, as determined in Ref. 19. The numerical phantom diameter used is 14 cm, and the total photon fluence at isocenter necessary to achieve the same dose as two-view mammography for this diameter ($\approx 2\times {10}^{8}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{-2}$) is obtained from Ref. 20. We restrict the dose in our simulation to the corresponding dose from mammography since breast CT is being investigated as an alternative screening modality. In order to determine the 2-D fluence necessary for our fan-beam simulations, we assume a slice thickness of 1 mm. The total number of incident photons in our fan-beam simulations is then ${\overline{N}}_{0}=4.17\times {10}^{10}$. For the range of parameters investigated here, 50 projection views were sufficient to ensure that angular undersampling artifacts remained farther from the signal of interest than the typical correlation length of the image noise. For this reason, the HO optimization was insensitive to these artifacts. We, therefore, used only 50 projection views in order to further minimize the computation burden of the optimization. The total fluence is divided equally among these 50 projection views. The reconstruction algorithm used is the FBP algorithm with a Hanning filter. The optimization of the Hanning filter width is the system optimization task demonstrated in this work.

## 2.3.

### Model Observer Method

For each of the approaches to system optimization considered, we employ the HO (Ref. 3) and its associated metric, the HO signal-to-noise ratio (SNR). The HO quantifies the degree to which an ideal observer with full knowledge of the relevant distributions can classify an image as belonging to one of two classes (e.g., signal-absent or signal-present), using only linear operations. Let us first consider the performance of the HO in the line-integral data domain (i.e., the projection data after applying the negative logarithm). Hereafter, bold font indicates a random variable, and bars over variables indicate population averages. Given a data vector $\mathit{g}$, the HO will classify an image based on the outcome of a scalar value $\mathit{t}$, computed as

If we take $\mathrm{\Delta}\overline{g}$ to be the mean difference between data belonging to class 1 and data belonging to class 2, then the Hotelling template, ${w}_{g}$, is defined as

where ${K}_{g}$ is the data covariance matrix, whose $(i,j)$’th component is given by $\mathrm{Cov}({\mathit{g}}_{i},{\mathit{g}}_{j})$. We take ${K}_{g}$ to be diagonal, with diagonal elements given by^{21}where ${\overline{N}}_{0}$ is the average total number of incident photons in a blank scan. Further, throughout this work, we consider the data noise to be Gaussian. Since the matrix ${K}_{g}$ is clearly signal dependent, in all further discussions, we consider ${K}_{g}$ to be the average covariance matrix between the two signal classes. Under the assumption of equal class prevalence, the resulting HO is still the optimal linear observer, but may not be equivalent to the ideal observer.

^{3}We then have that the HO SNR in the data domain (${\mathrm{SNR}}_{g}$) is given by

In order to evaluate the optimal reconstruction filter width for the task considered, we consider classification in the reconstructed image domain rather than in the projection data domain. In other words, the reconstructed image in vector form is given by $\mathit{y}=A\mathit{g}$. The matrix $A$ denotes the FBP reconstruction operator. Since this operator is linear and discrete-to-discrete, its action is equivalent to left-multiplication with a matrix. We then denote this matrix as $A$. The HO SNR in the image domain (${\mathrm{SNR}}_{y}$) is then given by

where ${K}_{y}{w}_{y}=\mathrm{\Delta}\overline{y}$, ${K}_{y}=A{K}_{g}{A}^{T}$, ${w}_{y}$ is the Hotelling template in the image domain, and $\mathrm{\Delta}\overline{y}$ is the mean difference in the reconstructed image between the two hypotheses. Finally, we define the HO efficiency as $\epsilon ={({\mathrm{SNR}}_{y}/{\mathrm{SNR}}_{g})}^{2}$.We consider $\epsilon $ a more useful parameter for algorithm optimization since for many CT applications, one would not operate at the threshold for observer performance within which SNR is a meaningful metric. Instead, $\epsilon $ is a useful metric for algorithm and system evaluation that remains independent of the difficulty of a given task, while still retaining the exactness of computation inherent in HO performance calculation through the exact covariance matrix ${K}_{y}$. A further motivation for selection of the efficiency metric over more conventional metrics, such as SNR or area under the ROC curve (AUC),^{22} is that we are interested in performing system component optimization rather than full system evaluation. A single value of SNR or AUC carries little information regarding how much one can hope to improve performance by optimizing a single component. Rather, we would prefer a metric like efficiency, which relates the quality of the input to the system component to the quality of the component’s output. Here, the component that we optimize is a parameter of the reconstruction algorithm, so that the efficiency values are a reflection of how well the reconstruction preserves information relevant to the classification task.

In order to allow for direct manipulation of the image covariance matrix ${K}_{y}$, we restrict the reconstruction to an ROI image. As stated previously, 50 projection views are simulated. This angular sampling is sufficient for accurate reconstruction in the immediate vicinity of the signal; however, undersampling artifacts are visible elsewhere in the full image. We, therefore, restrict the ROI size based on the radius from the signal at which these artifacts begin to appear. Specifically, we consider the signal energy in the reconstructed image within an annulus of radius $r$ and width $2\mathrm{\Delta}r$, given by

## (6)

$${E}_{y}(r)={\int}_{0}^{2\pi}\mathrm{d}\theta {\int}_{r-\mathrm{\Delta}r}^{r+\mathrm{\Delta}r}{[\mathrm{\Delta}\overline{y}({r}^{\prime},\theta )]}^{2}{r}^{\prime}\text{\hspace{0.17em}}\mathrm{d}{r}^{\prime}.$$The integrand of Eq. (6) is shown in Fig. 2(a) for the microcalcification signal and a single Hanning filter width. In order to select the ROI size, we approximate the above integral at evenly spaced values of $r$ and set the ROI radius to the value of $r$ that minimizes ${E}_{y}(r)$. An example of the function ${E}_{y}(r)$ is shown in Fig. 2(b), corresponding to the image shown in Fig. 2(a).

For the ROI sizes investigated here, ranging from $4\times 4$ to $40\times 40$, the reconstruction matrix $A$ ranges in size from $16\times 4100$ to $1600\times 4100$. Meanwhile, the covariance matrix ${K}_{y}$ ranges from $16\times 16$ to $1600\times 1600$. The method proposed remains numerically feasible for ROI sizes up to roughly $100\times 100$. Hereafter, we refer to this approach as ROI-HO.

## 2.4.

### Approximation Strategies

One can show (see, for example, Ref. 3, Sec. 7.4.4) that if one assumes that the matrix ${K}_{y}$ is circulant, the equation ${K}_{y}{w}_{y}=\mathrm{\Delta}\overline{y}$ can readily be solved by application of the DFT operator. The assumption that ${K}_{y}$ is Toeplitz corresponds to an assumption of noise stationarity, and the more restrictive assumption of a circulant ${K}_{y}$ further implies cyclic correlations at the boundaries of the image. In order to test the validity of these assumptions, and, hence, of the DFT approach in determining HO performance for the present task, we extract a single row of the matrix ${K}_{y}$, corresponding to a pixel near the center of the signal location, and base a circulant approximation of ${K}_{y}$ on this single row. We then repeat the optimization of filter width for microcalcification detection and Rayleigh discrimination under this approximation.

An alternative approach suggested by Gallas and Barrett^{3}^{,}^{5} is the use of efficient LG channels for estimation of HO performance. These channels are discretizations of circularly symmetric basis functions, formed as the product of an exponential function and the Laguerre polynomials. They form an orthonormal basis for all circularly symmetric square-integrable functions in ${\mathbb{R}}^{2}$, and hence, they are an appropriate choice of channels for the task of detecting a circularly symmetric microcalcification. We, therefore, also apply 50 LG channels to the optimization of filter width for the microcalcification detection task. Rationale for this number of channels, as well as for the scale factor that determines the width of the Gaussian envelope of the LG channels is given in Sec. 3.

## 2.5.

### Estimation Strategies

Yet another approach to the estimation of HO metrics is to train and then test a linear classifier on samples of noisy images from each of the classes being considered.^{23}^{,}^{24} Using this approach, a set of training images from each class is used to compute a sample covariance matrix, along with sample mean images corresponding to each class. These are used to compute a template estimate, which is then applied to a series of testing images. The outcomes of the test statistic for each testing image are then recorded and a Mann-Whitney U statistic is computed on the test statistic values, yielding an estimate of HO AUC,^{22} which can be related to HO SNR through the equation

^{25}

For this approach, the authors of Ref. 5 suggest that, as a rule of thumb, 10 to 100 images are required for each row of the covariance matrix. For the case of a $30\times 30$ ROI, this corresponds to anywhere between $\sim 9000$ to 90,000 images. While this is infeasible for a parameter optimization study in general, we demonstrate the outcome of this approach for image training and testing sets ranging from 500 to 3000 images in order to demonstrate the general trend of results obtained using this method. Due to the large number of independent noisy data sets required, determination of HO performance for each reconstruction filter width is performed using the same sets of noisy data. Two approaches, commonly termed the hold-out method and resubstitution are applied. In the first case, the training and testing phase use separate, independent sets of noisy data. In the second, the HO is tested on the same images with which it was trained. These two approaches yield estimates with negative and positive biases, respectively.^{23}^{,}^{24}

A more feasible approach based on sample images can be taken when prior knowledge of the mean images under each hypothesis is exploited in order to reduce the bias and variance of the estimator of HO performance, thereby reducing the necessary number of sample images. The construction of an unbiased estimator of HO SNR in this case is detailed in Ref. 9. We demonstrate the application of this approach using 300 and 700 noisy image samples (far fewer than the number of samples required for training and testing) and compare to the previous methods. Two scenarios are considered: (1) the same 300 or 700 noisy data realizations are used to reconstruct noisy images at each filter width considered and (2) a separate set of 300 or 700 data realizations is used to create noisy images at each filter width.

## 3.

## Results

The results of applying the ROI-HO for microcalcification detection and Rayleigh discrimination are shown in Fig. 3 for a range of Hanning filter widths. The ROI-HO is noticeably sensitive to the reconstruction filter width, showing a clear maximum in performance for Hanning windows in the range of $0.75{\nu}_{N}$ to $0.825{\nu}_{N}$ for each task. However, this result should not be interpreted as giving a universally optimal filter width, but rather as a demonstration of the sensitivity of HO efficiency to relevant reconstruction algorithm parameters.

Next, we consider the use of channels for estimating HO performance in microcalcification detection. As with the ROI-HO, we restrict the CHO to an ROI, however, due to the Gaussian envelope that modulates the LG channels, this only has an effect for the smallest ROIs used. For the CHO using LG channels, results correspond solely to the microcalcification task, since the Rayleigh task involves a signal that is not radially symmetric. Figures 4 and 5 demonstrate the dependence of CHO efficiency on the number of channels and the scale factor that modulates the width of the Gaussian envelope of the LG functions. For the majority of filter widths considered, the CHO efficiency estimate is completely stable above $\sim 50$ channels, while the optimum scale factor of the channels (full width at half maximum of the Gaussian) is roughly seven times the width of the microcalcification diameter. The remainder of the results presented corresponds to these CHO parameters. In our case, however, the dependence of the CHO performance estimates upon each of these parameters is weak, so that fewer channels or a slightly different scale factor could likely produce comparable results.

Figure 6 compares three approaches to optimization of the reconstruction filter width parameter, namely the ROI-HO approach proposed in this work, the CHO with 50 LG channels, and the DFT-domain approach (labeled FHO). Since each of the methods utilizes the same ROI images, the ROI-HO that computes the exact HO performance should be taken as the ground truth for the sake of evaluating the alternative approaches. Clearly, for microcalcification detection, the CHO is an excellent approximation of the true HO. Further, the CHO allows for greater computation efficiency, since, in general, the ROIs used contained $>50\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$. However, the efficiency of this CHO comes with a loss of generality, as it is not applicable to general signals lacking radial symmetry, as in the Rayleigh discrimination task.

Meanwhile, the DFT-domain HO was capable of reproducing the general trend of the HO results; however, the quantitative reliability of this method varied widely depending on the specific parameter setting used. For example, while wider filter widths allowed for accurate computation of detection task performance with the DFT-domain approach, the estimates of Rayleigh task performance using this method could vary from the HO by as much as 40%, as seen in the right side of Fig. 6. While this may not impact the optimization of filter width substantially, if absolute HO performance with a fixed filter width of, say, $0.625{\nu}_{N}$ were of interest, the DFT-domain HO could be misleading.

Figure 7 demonstrates the use of noisy sample images to estimate HO efficiency when the mean image under each hypothesis is known. In our case, since we consider only linear image reconstruction algorithms, the mean image is produced simply by reconstructing an image without noise. The left and right plots correspond, respectively, to the use of independent noisy data sets for each filter width and the reuse of the same simulated data for each filter width. The variability seen in the left-hand figure provides intuition as to the variance of the efficiency estimates, while the figure on the right illustrates that, although subject to the same variability, reusing the same data realizations correlates the estimates for different filter widths, potentially allowing for more reliable rank ordering of parameter settings. In other words, the estimated curves on the right could undergo vertical translation due to the variance of the estimator, but are more likely to preserve their shape than the curves shown on the left.

Figure 9 illustrates the use of linear classifier training and testing in order to estimate HO performance. In this case, prior knowledge of the mean images is not exploited, and sample estimates of the images are computed instead. The left-hand plot corresponds to the hold-out approach, where independent image sets are used for the training and testing phases, while the right-hand plot is generated using resubstitution. As seen in the figure, these two approaches introduce negative and positive biases in the estimates, respectively. Note that thousands of images are required in order to construct quantitatively meaningful estimates. However, the general trend of the results can be seen for comparatively few samples. This suggests that for certain tasks, such as rank-ordering of only a few options for algorithm implementation, the training and testing approach could be useful if one lacks an adequate model of the image covariance or class means.

## 4.

## Conclusions

We have demonstrated that for classification tasks involving small signals in CT, the HO performance computed within an ROI constitutes an efficient and objectively meaningful approach to reconstruction algorithm and system optimization. Specifically, we have employed the HO to optimize reconstruction algorithm filter width for two tasks: microcalcification detection in dedicated breast CT and Rayleigh discrimination.

We performed a comparison of this approach with several alternatives for objective assessment. In broad terms, these alternatives fall into two categories: methods based on computation of statistical estimates and methods based on approximations regarding the signal and/or image noise. While our proposed methodology assumes that information beyond a small ROI is irrelevant to the classification task, the CHO imposes assumptions of rotational symmetry on the signal and Hotelling template, and the Fourier-domain approach assumes stationary noise. Strictly speaking, none of these assumptions are valid in CT for the tasks considered here; however, each approach possesses strengths and weaknesses. The Fourier-domain HO is the most computationally efficient method for large ROIs, but is less accurate than the ROI-HO or CHO approach. Meanwhile, the CHO with LG channels is limited to cases when the signal of interest and Hotelling template possess radial symmetry.

The statistical estimation approaches investigated possess an attractive degree of flexibility, in that they do not rely on any assumptions regarding the image covariance matrix or, in the case of the training/testing approach, the signals themselves. However, this flexibility comes at the expense of computational efficiency, as this family of approaches requires excessively large numbers of noisy samples, making any extensive system or algorithm optimization prohibitively time-consuming. Finally, while we have illustrated some of the trade-offs, benefits, and shortcomings of several methods for objective assessment in CT, ultimately future work will be necessary to ensure that the proposed methodology yields metrics that correlate with improved performance of humans for the given tasks.

## Acknowledgments

The authors would like to thank Brandon Gallas and Adam Wunderlich for helpful discussions regarding this work. This work was supported in part by NIH R01 Grant Nos. EB000225, CA158446, CA182264, EB018102 and NIH F31 Grant No. EB018742. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health.

## References

## Biography

**Adrian A. Sanchez** received his BS degree in physics from the University of Tennessee, Knoxville, in 2010, and is currently a PhD candidate at The University of Chicago in the Committee on Medical Physics, graduate program in medical physics with an anticipated graduation in 2015. His primary research interests are currently tomographic image reconstruction and objective image quality assessment.

**Emil Y. Sidky** received his BS degree in physics and mathematics from the University of Wisconsin, Madison, in 1987 and his PhD degree in physics from The University of Chicago in 1993. He held academic positions in physics at the University of Copenhagen and Kansas State University. He joined the University of Chicago in 2001, where he is currently a research associate professor. His current interests are in CT image reconstruction and large-scale optimization.

**Xiaochuan Pan** received his BS degree from Beijing University, Beijing, China, in 1982, his MS degree from the Institute of Physics, Chinese Academy of Sciences, Beijing, in 1986, and the master’s and PhD degrees from The University of Chicago in 1988 and 1991, respectively, all in physics. He is currently a professor with the Department of Radiology, The University of Chicago. His research interests include physics, algorithms, and applications of tomographic imaging.