We investigated effects of prevalence and case distribution on radiologist diagnostic performance as measured by area under the receiver operating characteristic curve (AUC) and sensitivity-specificity in lab-based reader studies evaluating imaging devices. Our retrospective reader studies compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with dense breasts. Mammograms were acquired from the prospective Digital Mammographic Imaging Screening Trial. We performed five reader studies that differed in terms of cancer prevalence and the distribution of noncancers. Twenty radiologists participated in each reader study. Using split-plot study designs, we collected recall decisions and multilevel scores from the radiologists for calculating sensitivity, specificity, and AUC. Differences in reader-averaged AUCs slightly favored SFM over FFDM (biggest AUC difference: 0.047, SE = 0.023, p = 0.047), where standard error accounts for reader and case variability. The differences were not significant at a level of 0.01 (0.05/5 reader studies). The differences in sensitivities and specificities were also indeterminate. Prevalence had little effect on AUC (largest difference: 0.02), whereas sensitivity increased and specificity decreased as prevalence increased. We found that AUC is robust to changes in prevalence, while radiologists were more aggressive with recall decisions as prevalence increased.
A new emphysema treatment uses endobronchial valves to perform lobar volume reduction. The degree of
fissure completeness may predict treatment efficacy. This study investigated the behavior of a semiautomated
algorithm for quantifying lung fissure integrity in CT with respect to reconstruction kernel and
dose. Raw CT data was obtained for six asymptomatic patients from a high-risk population for lung cancer.
The patients were scanned on either a Siemens Sensation 16 or 64, using a low-dose protocol of 120 kVp,
25 mAs. Images were reconstructed using kernels ranging from smooth to sharp (B10f, B30f, B50f, B70f).
Research software was used to simulate an even lower-dose acquisition of 15 mAs, and images were
generated at the same kernels resulting in 8 series per patient. The left major fissure was manually
contoured axially at regular intervals, yielding 37 contours across all patients. These contours were read
into an image analysis and pattern classification system which computed a Fissure Integrity Score (FIS) for
each kernel and dose. FIS values were analyzed using a mixed-effects model with kernel and dose as fixed
effects and patient as random effect to test for difference due to kernel and dose. Analysis revealed no
difference in FIS between the smooth kernels (B10f, B30f) nor between sharp kernels (B50f, B70f), but
there was a significant difference between the sharp and smooth groups (p = 0.020). There was no
significant difference in FIS between the two low-dose reconstructions (p = 0.882). Using a cutoff of 90%,
the number of incomplete fissures increased from 5 to 10 when the imaging protocol changed from B50f to
B30f. Reconstruction kernel has a significant effect on quantification of fissure integrity in CT. This has
potential implications when selecting patients for endobronchial valve therapy.
The ability to automatically detect and monitor implanted devices may serve an important role in patient care and
the evaluation of device and treatment efficacy. The purpose of this research was to develop a system for the
automated detection of one-way endobronchial valves implanted as part of a clinical trial for less invasive lung
volume reduction. Volumetric thin section CT data was obtained for 275 subjects; 95 subjects implanted with 246
devices were used for system development and 180 subjects implanted with 354 devices were reserved for testing.
The detection process consisted of pre-processing, pattern-recognition based detection, and a final device selection.
Following the pre-processing, a set of classifiers were trained using AdaBoost to discriminate true devices from
false positives (such as calcium deposits). The classifiers in the cascade used simple features (the mean or max
attenuation) computed near control points relative to a template model of the valve. Visual confirmation of the
system output served as the gold standard. FROC analysis was performed for the evaluation; the system could be set
so the mean sensitivity was 96.5% with a mean of 0.18 false positives per subject. These generic device modeling
and detection techniques may be applicable to other devices and useful for monitoring the placement and function of
The purpose of this work was to develop a 3D airway measurement technique that can be initialized at a single point
(either automatically or user defined) and to evaluate the measurement accuracy with varying imaging parameters as
well as in synthetic parenchyma and soft tissue regions. This approach may have advantages over existing methods
that require segmentation of the entire airway branch. METHODS: Rays are cast spherically from the initial
measurement point and a range image is created of the distance to the edge of the airway lumen. The trajectory of
the airway is estimated from the range image, and can be used to re-construct a 2D slice perpendicular to the airway
for cross-sectional measurements. The evaluation phantom consisted of 5 tubes (3.18 to 19.05 mm in diameter and
1.59 to 3.18 mm in wall thickness) embedded in synthetic lung parenchyma and soft tissue. Images were acquired at
10 and 100 mAs at three tube orientations (0°, 45°, 90°) and were reconstructed at 0.6 and 1.5 mm slice thicknesses
with both smooth and standard reconstruction kernels. RESULTS: The overall diameter and wall thickness accuracy
was 0.43 ± 0.19 mm and 0.28 ± 0.15 mm respectively in parenchyma regions and 0.46 ± 0.16 mm and 0.49 ± 0.40
mm respectively in the soft tissue regions. The overall accuracy of the trajectory estimate was 0.64 ± 0.51°. The
proposed technique may allow a potentially larger number of airways to be measured for research and clinical
analysis than with current methods.
The Lung Image Database Consortium (LIDC) has provided a publicly available collection of CT images with nodule
markings from four radiologists. The LIDC protocol does not require radiologists to reach a consensus during the
reading process, and as a result, there are varying levels of reader agreement for each potential nodule with no explicit
reference standard for nodules. The purpose of this work was to investigate the effects of the level of reader agreement
on the development of a reference standard and the subsequent impact on CAD performance. Ninety series were
downloaded from the LIDC database. Four different reference standards were created based on the markings of the
LIDC radiologists, reflecting four different levels of reader agreement. All series were analyzed with a research CAD
system and its performance was measured against each of the four standards. Between the standards with the lowest
(any 1 of 4 readers) and highest (all 4 readers) required level of reader agreement, the number of nodules ⩾ 3 mm
decreased 48% (from 174 to 90) and CAD sensitivity for nodules ⩾ 3 mm increased from 0.70 ± 0.34 to 0.79 ± 0.35.
Between the same reference standards, the number of nodules < 3 mm decreased 84% (from 483 to 75) and CAD
sensitivity for nodules < 3 mm increased from 0.30 ± 0.29 to 0.51 ± 0.45. This research illustrates the importance of
indicating the method used to form the reference standard, since the method influences both the number of nodules and
reported CAD performance.
Lung CAD systems require the ability to classify a variety of pulmonary structures as part of the diagnostic process.
The purpose of this work was to develop a methodology for fully automated voxel-by-voxel classification of
airways, fissures, nodules, and vessels from chest CT images using a single feature set and classification method.
Twenty-nine thin section CT scans were obtained from the Lung Image Database Consortium (LIDC). Multiple
radiologists labeled voxels corresponding to the following structures: airways (trachea to 6th generation), major and
minor lobar fissures, nodules, and vessels (hilum to peripheral), and normal lung parenchyma. The labeled data was
used in conjunction with a supervised machine learning approach (AdaBoost) to train a set of ensemble classifiers.
Each ensemble classifier was trained to detect voxels part of a specific structure (either airway, fissure, nodule,
vessel, or parenchyma). The feature set consisted of voxel attenuation and a small number of features based on the
eigenvalues of the Hessian matrix (used to differentiate structures by shape) computed at multiple smoothing scales
to improve the detection of both large and small structures. When each ensemble classifier was composed of 20
weak classifiers, the AUC values for the airway, fissure, nodule, vessel, and parenchyma classifiers were 0.984 ±
0.011, 0.949 ± 0.009, 0.945 ± 0.018, 0.953 ± 0.016, and 0.931± 0.015 respectively.
The aim of our investigation was to assess the influence of both CT acquisition dose and reconstruction kernel on computer-aided detection (CAD) of pulmonary nodules. Our hypothesis is that the detection of small nodules is affected by the noise characteristics of the image and the signal to noise ratio of the nodule and bronchiovascular anatomy. Knowledge gained from this experiment will assist in developing an advanced CAD system designed to detect smaller and more subtle nodules with minimal false positives. Eleven research subjects were selected from the Lung Image Database Consortium (LIDC) database based on our inclusion criteria of: 1) having at least one nodule and 2) available raw CT projection data for the series that our institution submitted to the LIDC study. Using the original raw projection data, research software simulated raw projection data acquired with a dose reduced 32-40% from the original scan. Projection data for both dose levels was reconstructed with smooth to very sharp kernels (B10f, B30f, B50f, and B70f). The resulting series were used to investigate the influence of dose and reconstruction kernel on CAD performance. A prototype CAD system was used to investigate changes in sensitivity and false positives with varying imaging parameters. In a sub-study, the prototype system was compared to a commercial CAD system. We did not have enough subjects to conclude significance, but the results indicate our research system had a higher sensitivity with the smooth or medium reconstruction kernels than with the sharper kernels. The sensitivity was similar for both dose levels. The false positive rate was higher with the smooth kernels and the lower dose levels.