The need to assure the image quality of digital systems for mammography screening applications is now widely recognized. One approach is embodied in Part B of the European Protocol for the Quality Control of the Physical and Technical Aspects of Mammography Screening (EPQCM), which prescribes criteria for several interconnected image quality metrics. The focus of this study is on the "threshold contrast visibility" (TCV) protocol (section 2.4.1 of the EPQCM), in which human observers score images of a CDMAM or similar 4-AFC phantom. This section of the EPQCM currently omits many critical experimental details, which must be gleaned from ancillary documents. Given these, the purpose of this study is to quantify the effects of several remaining experimental variables, including phantom design, and the methods used for scoring and analysis, on the measured results.
Preliminary studies of two CDMAM version 3.4 (CDMAM 3.4) phantoms have revealed a 17% difference in TCV when averaged over all target diameters from 0.1 to 2.0 mm. This indicates phantom variability may affect results at some sites. More importantly, we have shown that the current CDMAM phantom design, methods for scoring, and analysis, substantially limit the ability to measure system performance accurately and precisely. An improved phantom design has been shown to avoid these limitations.
Viewing environment and presentation context affect the performance and efficiency of visual scoring of phantom images. An automated display tool has been developed that isolates individual 4-AFC targets of CDMAM phantom images, automatically optimizes window/level, and automatically records observers' scores. While not substantially changing TCV, the tool has increased scoring efficiency while mitigating several of the limitations associated with unassisted visual scoring. For example, learning bias and navigational issues are completely avoided. Ultimately, software-based ideal observer scoring will likely prove to be a better approach.
Statistical-decision-theory-based (SDT) analysis has been shown to mitigate limitations associated with the current CDMAM phantom and the ad hoc nearest-neighbor correcting (NNC) scoring method. NNC analysis is sensitive to the degree of incomplete scoring (stopping criteria). However, SDT substantially mitigates this problem, using all of the available data to derive thresholds that are more interpretable. Bootstrap sampling was used to provide an estimate of the standard error for SDT analysis.
In conclusion, the current EPQCM section 2.4.1 protocol fails to measure TCV accurately and precisely enough to qualify digital mammography systems. This paper presents a series of recommendations that supplement section 2.4.1 of the EPQCM and that provide a stable and accurate measure of TCV.