Multi-reader multi-case (MRMC) studies are often used for the evaluation of medical imaging devices. Due to limited prior information, the sizing of such studies (i.e., sizing both readers and cases) is often inaccurate. It is therefore desirable to adaptively resize the study towards a target power after an interim analysis of the study data. The major statistical concern for sample size re-estimation based on the interim analysis is the inflation of type I error rate. We developed methods that, based upon the observed data at the interim analysis, simultaneously resize the study towards a target power and adaptively adjust the critical value for the final hypothesis testing to control the type I error rate. Our methodologies apply to commonly used study endpoints including the area under the ROC curve (AUC), sensitivity, and specificity. Simulation studies show our methods can boost the statistical power to a target value by resizing the study after an interim analysis while controlling the type I error rate at the nominal level. We have developed a freely available R software package for the design and analysis of adaptive MRMC studies.
It is generally recognized that recent advancements in computer vision, especially the development of deep convolutional neural networks, has substantially improved the performance of computerized algorithms in medical imaging for classification tasks such as cancer detection/diagnosis.These advancements underscore the importance of the question of how the computer algorithm’s stand-alone performance compares with the performance of physicians. Current literature often uses descriptive statistics or a visual check of plots for the comparison lacking quantitative and rigorous statistical inference. In this work, we developed a U-statistic based approach to estimate the variance of performance difference between an algorithm and a group of human observers in a binary classification task. The performance metric considered in this work is percent correct (PC), e.g., sensitivity or specificity. Our variance estimation treats both human observers and patient cases as random samples and accounts for both sources of variability, thereby allowing for the conclusion to be generalizable to both the patient and the physician populations. Moreover, we investigated a z -statistic method based on our variance estimator for hypothesis testing. Our simulation results show that our variance estimator for the PC performance difference is unbiased. The normal approximation method using our variance estimator for hypothesis testing appears useful for large sample sizes.