Background: The performance of screen readers in detecting breast cancer is being assessed in some countries by using mammographic test sets. However, previous studies have provided little evidence that performance assessed by test sets strongly correlate to performance in clinical reading.
Methods: Five clinicians from BreastScreen New South Wales participated in this study. Each clinician was asked to read 200 de-identified mammographic examinations gathered from their own case history within the BreastScreen NSW Digital Imaging Library. All test sets were designed with specific proportions of true positive, true negative, false positive and false negative examinations from the previous actual clinical reads of each reader. A prior mammogram examination for comparison (when available) was also provided for each case.
Results: Preliminary analyses have shown that there is a moderate level of agreement (Kappa 0.42−0.56, p < 0.001) between laboratory test sets and actual clinical reading. In addition, a mean increase of 38% in sensitivity in the laboratory test sets as compared to their actual clinical readings was demonstrated. Specificity is similar between the laboratory test sets and actual clinical reading.
Conclusion: This study demonstrated a moderate level of agreement between actual clinical reading and test set reading, which suggests that test sets have a role in reflecting clinical performance.