## 1.

## Introduction

Medical images are routinely used to identify lesions within the human body. These images are read by radiologists, who generally make the final judgment about the presence of a lesion. Image quality is crucial for the radiologist to make the best possible judgment and may be improved by optimizing system designs and acquisition protocols. However, it is expensive and time consuming to conduct human-observer studies for this purpose at every stage of developmental research. This has led to the development of mathematical model observers as surrogates for humans in diagnostic imaging studies. Of particular interest are model observers which can predict human performance in clinically realistic tasks involving lesion search. Such tasks should probe how quantum and anatomical noise in the images affect observer performance. Anatomical noise is background structure that masquerades as a lesion or which obscures actual lesions. Together, quantum and anatomical noise comprise the image texture.

Popular model observers such as the channelized Hotelling (CH) observer^{1} operate based on prior information in the form of image statistics. The statistics include covariance matrices that characterize the noise in an image set. The level of prior information afforded an observer is dictated by several well-known task paradigms. For signal-known-exactly detection tasks, the observer need only decide lesion presence at a fixed location. Information about the background can be either exact [background-known-exactly (BKE)] or statistical [background-known-statistically (BKS)]. The task of lesion detection and localization is signal-known-statistically (SKS) in nature (e.g., the shape of the lesion profile may be known although its location is variable), but may additionally be either BKE or BKS. These search tasks can be performed by scanning versions of the CH observer and other Hotelling-type models.^{2}

With a BKE search task, a scanning model observer has knowledge of the quantum-mean background corresponding to the given test image. This knowledge greatly reduces the impact of anatomical noise on task performance. For BKS search tasks, the scanning CH observer computes image class means and covariances that account for anatomical variations. This computation can be extensive since the covariance matrices must be computed at all possible lesion locations. Moreover, it is unclear whether, given the covariance information, the scanning CH observer can adequately quantify how human observers respond to anatomical noise in individual images.

As an alternate approach, we have been investigating observer models which use less statistical knowledge.^{3}^{,}^{4} These models perform distinct target search and analysis procedures, thereby simulating the visual-search (VS) process that trained radiologists engage in to interpret medical images.^{5} Our VS model observer applies a feature-guided search that identifies target-like structure for subsequent analysis. For this study, the observer identified hot “blobs” (regions of elevated tracer uptake) as candidates, which may be actual lesions or false positives. Statistical information about the image background was only supplied for the candidate analysis, at which point some nonconspicuous lesions may have been ignored. The SKS task paradigm is a key constraint for the model, as many of the complex processes of human vision and cognition are not accounted for. The effects of image texture are nonetheless intrinsic to search performance with this relatively simple model.

We tested the VS and scanning CH observers as competing ways of predicting human-observer performance in a realistic search task involving both quantum and anatomical noise. The task was detection and localization of pelvic lesions in simulated In-111 planar nuclear medicine images. A localization receiver operating characteristic (LROC) study examined how observer performance was affected by the resolution-versus-sensitivity trade-offs among a family of medium-energy imaging collimators. The human observers in our study read a subset of the images considered by the model observers. Comparison was also made with a channelized nonprewhitening (CNPW) observer that performed both BKE and BKS search tasks.

Human observers are often modeled as ideal observers with added inefficiencies.^{6} In our study, the CH and VS observers were applied with and without internal noise that was added to the observers’ decision variables. In addition, the VS model was tested with a perceptual search threshold^{7} that rejected relatively nonsuspicious search candidates. Various covariance-based internal-noise models have been proposed for the CH observer,^{8} but the motivations for choosing one model or another are often unclear. The inefficiencies that were tested for the VS observer may afford a better intuition.

## 2.

## Model Observers

## 2.1.

### Framework for Localization Receiver Operating Characteristic Studies

All search tasks in our study used two-dimensional (2-D) slices extracted from reconstructed three-dimensional (3-D) volumes. A test image of size $N\times N$ is represented by the ${N}^{2}\times 1$ vector $\mathbf{g}$. This image could contain a single lesion within a region of interest (ROI) $\mathrm{\Omega}$. With the lesion centered at pixel $j\in \mathrm{\Omega}$, the test image can be regarded as

with anatomical background $\mathbf{b}$, lesion image ${\mathbf{s}}_{j}$, and zero-mean quantum noise $\mathbf{n}$. Lesion-absent images are comprised of only $\mathbf{b}$ and $\mathbf{n}$. The background can vary with image. We denote the quantum-mean lesion-present image for fixed location $j$ as ${\u27e8\mathbf{g}\u27e9}_{\mathbf{n}|\mathbf{b},j}=\mathbf{b}+{\mathbf{s}}_{j}$, where the bracket notation indicates an average over $\mathbf{n}$ with $\mathbf{b}$ and $j$ fixed. The corresponding mean lesion-absent image is ${\u27e8\mathbf{g}\u27e9}_{\mathbf{n}|\mathbf{b},0}=\mathbf{b}$. With additional averaging over anatomical background, we have the mean images ${\u27e8\mathbf{g}\u27e9}_{\mathbf{n},\mathbf{b}|0}=\overline{\mathbf{b}}$ and ${\u27e8\mathbf{g}\u27e9}_{\mathbf{n},\mathbf{b}|j}=\overline{\mathbf{b}}+{\mathbf{s}}_{j}$. Thus, $\overline{\mathbf{b}}$ is the mean background image computed over all anatomical realizations of $\mathbf{b}$.Given test image $\mathbf{g}$ in a LROC study, a model observer first computes a perception measurement ${\lambda}_{j}$ for every location $j$ within $\mathrm{\Omega}$, and then reports the most suspicious lesion location $r$ and a confidence rating $\lambda $ according to the formulas

## (3)

$$r=\underset{j\in \mathrm{\Omega}}{\mathrm{arg}\mathrm{max}}\text{\hspace{0.17em}}{\lambda}_{j}.$$## 2.2.

### Scanning Observers for Detection-Localization Tasks

The Hotelling-type scanning observers in this work computed linear perception measurements for all $j\in \mathrm{\Omega}$ using the general formula

## (4)

$${\lambda}_{j}^{\mathrm{obs}}={({\mathbf{w}}_{j}^{\mathrm{obs}})}^{t}[\mathbf{g}-{\mathbf{c}}_{j}],$$## (5)

$${\mathbf{c}}_{j}=\frac{{\u27e8\mathbf{g}\u27e9}_{\mathbf{n}|\mathbf{b},0}+{\u27e8\mathbf{g}\u27e9}_{\mathbf{n}|\mathbf{b},j}}{2}=\mathbf{b}+\frac{1}{2}{\mathbf{s}}_{j},$$^{9}

For this work, the lesion profile was assumed to be shift-invariant. In fact, the lesion profile in our test images changed slightly with location due to attenuation effects, but the effect was largely imperceptible. We thus computed a location-averaged profile $\overline{\mathbf{s}}$ for use by the model observers. The subscript on ${\overline{\mathbf{s}}}_{j}$ implies a shift of the profile to the $j$th location. How this mean profile was computed is described in Sec. 3.5.

## 2.3.

### The Scanning Channelized Hotelling Observer

For BKS search tasks, the scanning CH observer uses the reference image

in which we have accounted for the shift-invariant lesion profile. The scanning template for the CH observer takes the form## (7)

$${\mathbf{w}}_{j}^{CH}={\mathbf{U}}_{j}{\mathbf{K}}_{j}^{-1}{\mathbf{U}}_{j}^{t}{\overline{\mathbf{s}}}_{j},$$## (8)

$${\mathbf{K}}_{j}={\u27e8{\mathbf{U}}_{j}^{t}(\mathbf{g}-\overline{\mathbf{b}}){(\mathbf{g}-\overline{\mathbf{b}})}^{t}{\mathbf{U}}_{j}\u27e9}_{\mathbf{n},\mathbf{b}|0},$$When ${\mathbf{K}}_{j}$ includes the anatomical variations, the scanning CH observer operates under an SKS-BKS task paradigm. In that case, ${\mathbf{K}}_{j}$ can be decomposed into a sum of quantum-noise and anatomical-noise components.^{10} A mean quantum-noise covariance ${\mathbf{K}}_{j}^{\mathrm{quant}}$ is

## (9)

$${\mathbf{K}}_{j}^{\mathrm{quant}}={\u27e8{\u27e8{\mathbf{U}}_{j}^{t}(\mathbf{g}-\mathbf{b}){(\mathbf{g}-\mathbf{b})}^{t}{\mathbf{U}}_{j}\u27e9}_{\mathbf{n}|\mathbf{b},0}\u27e9}_{\mathbf{b}}.$$The quantum-mean backgrounds are also used to compute the anatomical-noise covariance via the formula

## 2.4.

### Visual-Search Observers

The VS observer^{3} offers an alternative approach to accounting for anatomical noise in model observer performance of lesion-search tasks. Based on the VS paradigm proposed by Kundel et al.^{5} for how radiologists read images, the VS observer combines a front-end search for suspicious candidate locations with subsequent analysis of just those candidates. Lesion detection in nuclear medicine is frequently a hot-spot search, which our basic VS observer performs through segmentation of the test image into blobs. As background knowledge is not incorporated into the search, a given blob may be a lesion or an artifact of the image texture. At the same time, actual lesions may be masked by the texture. The search process is thereby implicitly affected by both quantum and anatomical noise. The pixel with the maximum greyscale intensity in a given blob constitutes the blob focal point. The set of focal points within the ROI $\mathrm{\Omega}$ comprises a relatively small set of candidate locations for the observer.

The search is followed by a directed analysis of these candidate locations. Various discriminants may be used for this analysis. Many of our earlier studies^{3}^{,}^{11} have used the scanning CNPW discriminant. In scanning mode, the CNPW observer can be viewed as a nonprewhitening approximation to the CHO, with the shift-invariant template

## (11)

$${\mathbf{w}}_{j}^{\mathrm{CNPW}}={\mathbf{U}}_{j}{\mathbf{U}}_{j}^{t}{\overline{\mathbf{s}}}_{j}.$$The CNPW discriminant was also tested separately as a scanning observer in our study, being applied for both BKE and BKS tasks. The BKE task with ${\mathbf{c}}_{j}=\mathbf{b}$ represented lesion detection in reconstructed quantum noise. The reference image for the BKS task was ${\mathbf{c}}_{j}=\overline{\mathbf{b}}$. We shall use the notation CNPW-BKE and CNPW-BKS to differentiate these two observer/task combinations.

## 3.

## Methods

## 3.1.

### Model-Observer Specifications

Based on previous work,^{2}^{,}^{12} a set of three difference-of-Gaussian channels was selected for both the CH and CNPW observers. Defined in the frequency domain, the $i$th channel ${\mathbf{u}}_{i}$ has elements

## (12)

$${\mathit{u}}_{i}(\mathit{\xi})=\mathrm{exp}[-{\left(\frac{\parallel \mathit{\xi}\parallel}{{2}^{i+1}{\sigma}_{0}}\right)}^{2}]-\mathrm{exp}[-{\left(\frac{\parallel \mathit{\xi}\parallel}{{2}^{i}{\sigma}_{0}}\right)}^{2}],i=\mathrm{0,1},2,$$The image segmentation for the VS observer was performed with a watershed algorithm, a computationally efficient routine that is available in many software packages. A less-efficient gradient-ascent segmentation method yielded similar results in previous VS observer studies.^{4} With the watershed algorithm, a 2-D test image is viewed as a topographic landscape with holes at the local minima.^{13} When the landscape is lowered into a tank of water, water starts to fill the “catchment basins” around the holes. Dams (or watersheds) are constructed at points where water from two or more catchment basins meet. Once the entire landscape has been submerged, these catchment basins define the segmented blobs. The watershed algorithm is frequently applied to the gradient magnitude of an image. We instead applied the algorithm to the additive inverse of the test image in order to avoid the noise penalties involved in computing the gradient. With this approach, the minimum pixel within a given catchment basin represented a focal point.

Details about the training for the model observers are provided in Sec. 3.5.

## 3.2.

### Model-Observer Inefficiencies

## 3.2.1.

#### Internal noise

Given the same image, human observers can make different decisions at different times. With the model-observer formulations presented above, the decision for a given image would never change. To address this issue, we experimented with adding internal noise to the CH and VS observers. The noise was implemented by adding a random Gaussian deviate to the perception measurements ${\lambda}_{j}$. The width of the Gaussian distribution was selected relative to the standard deviation ${\sigma}_{I}$ of the set $\{{\lambda}_{j}:j\in \mathrm{\Omega}\}$ for a given set of training images. With $N(\mathrm{0,1})$ representing a random deviate from a Gaussian distribution with zero mean and unit standard deviation, the noisy measurement was given by

Values of $\gamma $ between 0.5 and 2.0 were tested.For the CH observer, internal noise can also be implemented through the covariance matrices instead of the analysis statistic. Several models of this type are discussed by Zhang et al.^{8} A corresponding process for the VS observer would be to add internal noise during the candidate search.^{7}

## 3.2.2.

#### Perceptual search threshold

A second type of model inefficiency applied in our study was specific to the VS observer. Recall from Sec. 2.4 that the VS observer identifies blob focal points as suspicious candidates by means of image segmentation. The segmentation can produce many candidates that human observers would ignore. The result, particularly with high-count images, is that the BKE analysis with the CNPW discriminant will often locate the lesion. In such cases, the VS and CNPW-BKE observers may perform similarly. A solution is to limit the set of candidates by setting a threshold $\tau $ such that a focal point with greyscale intensity less than $\tau $ is disregarded. Parameter $\tau $ is thus comparable to the drowning threshold described in the watershed literature.^{14}

This threshold was calculated with reference to the focal points from a set of training images. Each training image yielded a number of focal points, some of which were lesion-absent. The set of greyscale intensities from all of the lesion-absent focal points in the training images produced a distribution with mean and standard deviation denoted by ${\mu}_{l}$ and ${\sigma}_{l}$, respectively. We tested search thresholds of the form $\tau ={\mu}_{l}+\beta {\sigma}_{l}$ for four different scaling factors $\beta $ between 0.2 and 0.8.

Figure 1 shows the greyscale distributions of the normal and abnormal focal points obtained from a training set of images. The distributions are approximately Gaussian. Threshold values of particular interest might lie in the zone between the dashed vertical lines where the disease status is not obvious.

## 3.3.

### Planar Imaging Simulation

The 3-D XCAT anthropomorphic phantom^{15} was used for our studies, with the volume of interest consisting of the pelvic and abdominal regions. This region was discretized to voxel dimensions ${128}^{3}$. Five In-111 biodistributions were simulated in the volume and then imaged with an analytic projector. Our simulation modeled a system with medium-energy parallel-hole collimators and accounted for nonhomogeneous photon attenuation and distance-dependent collimator blur. Eight collimators that varied in terms of spatial resolution and sensitivity were tested (see Sec. 3.4). Noise-free projections were calculated for the two principle In-111 energies (171 and 245 keV), scaled for the abundance and absorption coefficients of a NaI detector with 1-cm crystal thickness, and then added. Additional details about the system model can be found in our previous work.^{11}

Spherical soft-tissue lesions of diameter 1 cm were considered. A lesion-placement map contained the prostate and lymph-node regions of the XCAT volume. The projection of this map represented the search region $\mathrm{\Omega}$ for the model observers in our study. A total of 225 distinct lesion locations were randomly generated from the map, with roughly equal distribution between the prostate and the lymph nodes. Two lesion-to-prostate relative activities (or contrasts) of 16:1 and 24:1 were used. Lesion projections were created separately, scaled for contrast and then added to the noise-free background projections to form noise-free lesion-present data.

The addition of Poisson noise to the projections was guided by photon count levels from clinical data. The mean number of counts in a projection varied with collimator as described next.

## 3.4.

### Collimators and System Resolution

Our LROC study compared observer performance with different collimators. This section reviews the properties of a gamma-camera collimator and how these properties affect image formation. The two main parameters for a parallel-hole collimator are the hole length $L$ and the hole width $d$, both of which affect the collimator blur and count sensitivity. The number of counts per unit time $\eta $ for regular hexagonal holes obeys the proportionality $\eta \sim 0.65{d}^{2}$.^{16}

The system spatial resolution ${R}_{\mathrm{sys}}$ of a gamma camera has two components: intrinsic resolution and distance-dependent geometric resolution. Expressed in terms of the full-width at half-maximum of a Gaussian blur function, the resolution for photons emitted at a distance $c$ from the face of the collimator is

The geometric component of the system resolution is completely determined by $L$ and $d$ according to the formula

while the intrinsic resolution depends on the gamma-ray energy and the crystal material. For the In-111 energies and a NaI detector, typical intrinsic resolutions are between 0.3 and 0.4 cm.^{17}Note that geometric resolution can also be defined in terms of distance from the face of the crystal.

For typical values of $c$ in patient imaging, the system resolution may be expressed with the linear model

with slope $m=d/L$ and the intercept ${R}_{0}\approx {R}_{\mathrm{sys}}(c)-mc$ for a suitably large value of $c$.Our study modeled variants of a typical medium-energy collimator with $L=4.06$ cm and $d=0.294$ cm. Eight collimators were defined by varying $d$ between 0.044 and 0.394 cm in increments of 0.05 cm while keeping $L$ constant. The intrinsic resolution was fixed at 0.38 cm. The relevant collimator parameters are listed in Table 1. The collimators are labeled using the notation C1, …, C8, listed in order of increasing $d$. The resolutions given in the table were computed for a distance $c=4L$. The sensitivities $\eta $ have units of millions of counts per unit time.

## Table 1

Parameters for the eight collimator models. System resolution Rsys was measured at distance c=4L from the collimator face. The count sensitivity η has units of millions of counts per unit time.

Collimator | ||||||||
---|---|---|---|---|---|---|---|---|

C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |

$d$ (cm) | 0.044 | 0.094 | 0.144 | 0.194 | 0.244 | 0.294 | 0.344 | 0.394 |

${R}_{\mathrm{sys}}$ (cm) | 0.402 | 0.473 | 0.575 | 0.695 | 0.824 | 0.960 | 1.100 | 1.240 |

$\eta $ | 0.04 | 0.19 | 0.45 | 0.81 | 1.23 | 1.87 | 2.55 | 3.35 |

As $\eta $ and ${R}_{\mathrm{sys}}$ both increase monotonically with $d$, our study examined the trade-off between sensitivity and spatial resolution in terms of observer performance. Low sensitivity leads to relatively higher quantum noise in the images, whereas low resolution increases anatomical noise due to partial-volume and blurring effects. Sample images obtained with the collimator models for two lesion-present cases are shown in Fig. 2.

## 3.5.

### Observer Studies

The model observers read images for each of the 16 combinations ($2\times 8=16$) of lesion contrast and collimator type in our study. For each combination, three image sets were prepared. A set of 450 test images consisted of a lesion-present/lesion-absent pair for each of the 225 lesion locations. The remaining two sets were for model-observer training. One set of 225 images, representing different quantum-noise realizations for the 225 lesion-absent test images, was used to compute the channel covariance matrices for the CH observer. The channel covariance matrices were precomputed (but not preinverted) for the study. The second training set contained 25 lesion-present/lesion-absent image pairs that were generated from distinct quantum-noise realizations for a subset of the 225 lesion locations. This latter set was used to estimate the mean lesion profile $\overline{\mathbf{s}}$. Taking the difference between a given image pair yielded a noisy lesion profile centered at one location. This profile was shifted to the center of the field of view. This process was repeated for all the image pairs and the resultant center-shifted profiles were averaged to get $\overline{\mathbf{s}}$. Note that the quantum-mean backgrounds used by the model observers were approximated by noise-free reconstructions.

Institutional approval was obtained to allow the participation of human observers and informed consent was obtained from each observer prior to the study. The human-observer study was restricted to the high-contrast image sets from the five collimators with the highest model-observer performances. Three nonradiologist imaging scientists participated in the study. Each observer read 25 training images and 50 test images per collimator. The observers provided confidence ratings on a four-point scale, with a rating of four implying high confidence that the image contains a lesion.

Correct localizations for all observers were scored based on an RCL of five pixels. This threshold radius was selected by following the empirical graphing process described in Wells et al.^{18} For each observer, a Wilcoxon estimate^{19} of the area under the LROC curve (${A}_{L}$) was computed. The average human performance for a given collimator was computed as the average of ${A}_{L}$ from the three observers. Uncertainties are expressed as $\pm \text{one standard error}$ in these averages.^{20} For a model observer featuring internal noise, ${A}_{L}$ was computed as the average performance based on three study realizations. The computational expense of running the CH observer was the reason for using only three realizations.

Separate analyses of variance (ANOVA) were conducted to test the statistical significance of the human and model-observer results. Collimator model and observer were fixed factors for the two-way ANOVA based on the human data. A three-way ANOVA with the model-observer data included lesion contrast as a main factor and the relevant two-way interactions. Significance was evaluated at the $\alpha =0.05$ level.

## 4.

## Results

The human-observer results from our study are summarized in Table 2. The uncertainties in individual observer performance were on the order of $\pm 0.06$. The two-way ANOVA indicated a significant collimator effect ($p=8.6\mathrm{E}\text{-4}$) but the observer effect was not significant ($p=0.075$). A subsequent Tukey multiple comparisons test^{21} found significant differences between C6 and the group {C2, C3, C4}. Collimator C5 was positioned in-between, with the data unable to distinguish it from C4 or C6.

## Table 2

Individual and average human-observer performances for the five tested collimator models C2–C6. The uncertainties in AL for the individual observers are ±0.06. The uncertainties in average performance represent one standard error in the mean.

Observer | Collimator | ||||
---|---|---|---|---|---|

C2 | C3 | C4 | C5 | C6 | |

Human #1 | 0.91 | 0.91 | 0.84 | 0.74 | 0.72 |

Human #2 | 0.80 | 0.83 | 0.78 | 0.69 | 0.67 |

Human #3 | 0.87 | 0.92 | 0.80 | 0.63 | 0.55 |

Average | $0.86\pm 0.03$ | $0.89\pm 0.03$ | $0.81\pm 0.02$ | $0.69\pm 0.03$ | $0.65\pm 0.05$ |

Figure 3 shows how the model observers without internal noise or search threshold performed as a function of collimator and lesion contrast. The average human-observer results are included in Fig. 3(b). The uncertainties in the ${A}_{L}$ estimates for each of the model observers did not exceed $\pm 0.02$. With the high-contrast lesions, the model observers showed some qualitative similarities with the humans for the low-sensitivity collimators, but did not match the substantial drop in human performance that occurred with increased sensitivity.

The ANOVA (Table 3) and multiple-comparisons testing applied to the model-observer data indicated that collimator C1 performed significantly worse than the other collimators. The differences between each observer were also significant. Affected almost entirely by quantum noise, the CNPW-BKE observer consistently outperformed the other three model observers, each of which also contended to some degree with anatomical noise. In particular, the CNPW-BKE observer performed at a high level with collimators C2 to C8 regardless of lesion contrast. The differences in performance that occurred between the CNPW-BKE and VS observers are attributable solely to the initial candidate search performed by the latter. The greatest deviations from CNPW-BKE performance occurred with the CNPW-BKS observer for collimator C1. The two CNPW observers otherwise performed similarly for collimators C2 to C4, only beginning to diverge once more with increased anatomical noise. The performance trends for the CNPW-BKS and CH observers were also similar, although the covariance noise modeling for the CH observer substantially moderated the effects of the reference-image subtraction for some collimators. The CH observer consistently outperformed the VS observer.

## Table 3

Results from the three-way ANOVA conducted with the model-observer scores. The analysis tested collimator, observer and lesion contrast as factors. All three effects and two-way interactions were significant at the α=0.05 level.

Factor | df | ss | F | Pr (>F) |
---|---|---|---|---|

Collimator | 7 | 0.30 | 193.94 | $<2.2\mathrm{E}\text{-}16$ |

Contrast | 1 | 0.040 | 181.50 | $<8.4\mathrm{E}\text{-}12$ |

Observer | 3 | 0.037 | 56.14 | $<3.4\mathrm{E}\text{-}10$ |

Collimator:Contrast | 7 | 0.081 | 53.24 | $<6.0\mathrm{E}\text{-}12$ |

Collimator:Observer | 21 | 0.016 | 3.44 | 0.003 |

Contrast:Observer | 3 | 0.0036 | 5.46 | 0.006 |

Residuals | 21 | 0.0046 |

The performance trends in Fig. 3 for the scanning observers were largely independent of lesion contrast. This was not entirely so for the VS observer, for which Fig. 3(b) demonstrates an upward deflection in performance with the higher collimator sensitivities that does not show in Fig. 3(a). The deflection is small, with ${A}_{L}$ increasing by 0.02 between C4 and C8, while performance with the low-contrast lesions decreased over that span. The BKE analysis is at the root of this variable behavior, amplifying the image quality effects of the higher lesion contrast and increased sensitivity. As shown below, some of the effects of the BKE assumption can be mitigated by the addition of perceptual inefficiencies to the observer.

Of course, the scanning models also relied on some prior knowledge of the image backgrounds that the human observers did not have. Internal noise is routinely added to Hotelling-type observers to compensate for this knowledge. The results from adding Gaussian noise to the perception measurements for the high-contrast lesions with the CH observer are shown in Fig. 4. The standard errors in the estimates of ${A}_{L}$ for the model observers with inefficiencies were $\sim \pm 0.03$. Within the range of $\gamma $ values tested, the added noise steadily reduced performance at all collimator sensitivities, but tended to penalize performance with the lower sensitivities most. Thus, none of the model observer curves in Fig. 4 reproduced the performance trend displayed in the average human observer data. The left and right ends of the human performance curve were quantitatively fit with different values of $\gamma $.

Figure 5 shows the individual effects of internal noise and search thresholding for the VS observer. Figure 5(a) pertains to the internal-noise results. As with the CH observer, changes in $\gamma $ generated performance changes that were fairly uniform with sensitivity. However, the VS observer was less affected by a given value of $\gamma $ compared to the CH observer. An interesting outcome for $\gamma \ge 1.5$ was the establishment of a nominal performance peak for C3 that mirrored the average human observer results.

With the search threshold [Fig. 5(b)], the VS observer was able to duplicate the considerable drop in average human performance that occurred at the higher sensitivities. However, the threshold had a negligible effect on observer performance at the lower sensitivities. The relatively high system resolutions with those collimator models ensured that actual lesions would generally be associated with focal points having greyscale maxima that exceeded the threshold.

We also investigated how the VS observer responded with internal noise and the search threshold together. There were 16 inefficiency models in all, corresponding to the four values of $\beta $ and four nonzero values of $\gamma $ that were used for the plots in Fig. 5. Figure 6 compares the average human performance with the best-fitting results from the VS observer that used $\gamma =1.5$ and $\beta =0.6$.

The computation times for the scanning and VS model observers in this study were disparate. Recall that the channel covariance matrices for the CH observer were computed (but not inverted) prior to the study. Despite this preprocessing, the CH computation times were relatively high: $\sim 90\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{min}$ in IDL (Interactive Data Language, Exelis Visual Information Solutions, Boulder, Colorado) to read 450 test images compared to less than a minute for the VS and CNPW observers. These timings are for the model observers without internal noise or search thresholding. The lesion search region for the model observers contained 380 locations. From this, the VS observer averaged between 20 and 40 focal points per image, with the higher numbers associated with lower collimator sensitivities. However, the code implementation of the candidate analysis did not exploit this search reduction.

## 5.

## Discussion

Our objective with this work was to compare the VS and scanning CH observers as surrogates for human observers in task-based assessments. A focus for the comparison was how these model observers respond to anatomical noise. The image sets in the planar imaging study presented varying proportions of quantum and anatomical noise as dictated by the collimator sensitivity and spatial resolution. Without inefficiencies, neither model observer matched the magnitudes or general trend in average human performance as anatomical noise increased in the images. The results with the observer inefficiencies show that the two models admit different sets of mechanisms for improving the agreement with the humans.

One must keep in mind the different task paradigms used with the CH and VS observers. With the CH observer, anatomical noise is equated with background variability as quantified by the image class statistics. Adding internal noise terms to the channel covariance matrices is a common means of reconciling differences with human observers. One may also inject greater anatomical noise into the model observer by modifying the reference image **c**_{j} [Eq. (6)], although comparison of the CNPW-BKE and CNPW-BKS results suggests this approach primarily affects observer performance with the higher-resolution collimators. Previous studies have also shown that excessive modification of the reference image can lead to artifacts that create persistent false positives for the observer.^{3}

For the VS observer tested herein, anatomical noise only affects the initial candidate search since the analysis is a BKE process. Furthermore, the separate influences of quantum and anatomical noise are not discerned in the search. Instead, image texture as a whole impedes the identification of lesions as focal points. Accounting for noise in this manner effectively expanded the dynamic range in observer performance across the full set of collimators compared to the scanning models. Translating this expansion into greater statistical power will depend on developing methods of candidate analysis that are independent of the BKE and BKS paradigms. The use of greyscale intensity as the sole search feature for the VS observers in our study was only possible because of the BKE analysis. One promising alternative is based on the extraction of multiple morphological features.^{22}

Overall, the VS framework provides a relatively flexible environment for studying observer inefficiencies. As shown in Fig. 5, different inefficiencies can influence different parts of parameter space, offering an intuition that is not always available with covariance modeling of internal noise. The VS observer attained quantitative agreement with the human observers, but this was the result of ad hoc fitting of the internal noise and search threshold parameters ($\gamma $ and $\beta $, respectively). A version of the VS observer that can be reliably applied without the BKE or BKS background paradigms may require smaller amounts of internal noise and other inefficiencies to approximate human-observer performance, and discussion of how to set the inefficiency and noise parameters should await development of such a model.

Computational efficiency is another important point of comparison for the CH and VS observers. Runtimes for the CH observer were substantially longer than for the VS observer. Preinverting the covariance matrices would have improved efficiency, but the main factor in the longer times was the shift-variant CH observer template. With regard to the amount of preprocessing required for the CH observer, one must recognize that the background variations in this study were restricted to modeling different In-111 biodistributions in a single geometry of the XCAT phantom, so that the computation of ${\mathbf{K}}_{j}^{\mathrm{anat}}$ [Eq. (10)] and $\overline{\mathit{b}}$ was relatively simple. A more extensive study involving multiple phantom geometries could complicate the BKS computations by involving intensive Monte Carlo simulations as demonstrated for nonsearch tasks by Kupinski et al.^{23} Observer studies with reconstructed images would substantially increase the computing time,^{24} as would studies involving larger regions of interest, high-resolution imaging or image volumes.

Finally, there were a number of basic limitations with this study for purposes of collimator optimization, including the use of an analytic projector that disregarded the considerable scatter and penetration potential of In-111. Still, the results are relevant to the practice of task-based collimator optimization. The accepted approach to hardware optimization is to apply ideal observers in projection space; the scanning CH observer has been used previously as an approximate ideal observer (although with channels other than what was used herein^{25}). VS observer development has focused instead on modeling human observers as the gold standard for lesion-detection performance in much of diagnostic imaging. The planar imaging framework allows for an even comparison of the two approaches to optimization. Follow-up studies will analyze the prospects for optimization studies based on increasingly realistic detection tasks with the VS observer.

## 6.

## Conclusions

When applied without inefficiencies, the VS and CH observers failed to match either the magnitudes or the general trend in average human performance as anatomical noise increased in the images. The result for the VS observer is partly attributed to reliance on the quasi-BKE task, which artificially inflated observer performance in images with high anatomical noise. Overall, the VS model observer offers a flexible and computationally efficient framework for studying the combined effects of quantum and anatomical noise in lesion search tasks. Future studies will examine versions of the model that are independent of the standard BKE/BKS task paradigms.

## Acknowledgments

This research was supported by NIBIB Grant No. R01EB012070. The authors acknowledge the contributions of Nagarohit Katta in the development of the computer interface for the human-observer study and of Kheya Banerjee for participating in the study.

## References

## Biography

**Anando Sen** completed his PhD from the Department of Mathematics at the University of Houston in 2012. He is currently a postdoctoral research scientist in the Department of Biomedical Informatics at Columbia University and was previously a postdoctoral fellow in the Department of Biomedical Engineering at the University of Houston. His research interest include image quality assessment, model observers and tomographic reconstruction. He has presented his research at several national and international conferences.

**Howard C. Gifford** received his PhD in applied mathematics from the University of Arizona in 1997. He was a postdoctoral fellow in the Radiology Department at the University of Massachusetts Medical School (UMMS) until 2000 and a UMMS faculty member through 2011. He is an associate professor in the Department of Biomedical Engineering at the University of Houston. His research interests include task-based image quality assessment and optimization methods for nuclear medicine and x-ray imaging.