7 March 2018 Assessment of computerized algorithms by comparing with human observers in binary classification tasks: a simulation study
Author Affiliations +
It is generally recognized that recent advancements in computer vision, especially the development of deep convolutional neural networks, has substantially improved the performance of computerized algorithms in medical imaging for classification tasks such as cancer detection/diagnosis.These advancements underscore the importance of the question of how the computer algorithm’s stand-alone performance compares with the performance of physicians. Current literature often uses descriptive statistics or a visual check of plots for the comparison lacking quantitative and rigorous statistical inference. In this work, we developed a U-statistic based approach to estimate the variance of performance difference between an algorithm and a group of human observers in a binary classification task. The performance metric considered in this work is percent correct (PC), e.g., sensitivity or specificity. Our variance estimation treats both human observers and patient cases as random samples and accounts for both sources of variability, thereby allowing for the conclusion to be generalizable to both the patient and the physician populations. Moreover, we investigated a z -statistic method based on our variance estimator for hypothesis testing. Our simulation results show that our variance estimator for the PC performance difference is unbiased. The normal approximation method using our variance estimator for hypothesis testing appears useful for large sample sizes.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yang Yang, Yang Yang, Berkman Sahiner, Berkman Sahiner, Zhipeng Huang, Zhipeng Huang, Nicholas Petrick, Nicholas Petrick, Weijie Chen, Weijie Chen, "Assessment of computerized algorithms by comparing with human observers in binary classification tasks: a simulation study", Proc. SPIE 10577, Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment, 1057713 (7 March 2018); doi: 10.1117/12.2293807; https://doi.org/10.1117/12.2293807

Back to Top