Identification of malignancy and false recalls (women who are recalled in screening for additional workup, but later proven benign) in screening mammography has significant clinical value for accurate diagnosis of breast cancer. Deep learning methods have recently shown success in the area of medical imaging classification. However, there are a multitude of different training strategies that can significantly impact the overall model performance for a specific classification task. In this study, we aimed to investigate the impact of training strategy on classification of digital mammograms by performing a robustness analysis of deep learning models to distinguish malignancy and false-recall from normal (benign) findings. Specifically, we employed several pre-training strategies including transfer learning with medical and non-medical datasets, layer freezing, and varied network structure on both binary and three-class classification tasks of digital mammography images. We found that, overall, deep learning models appear to be robust to some modifications of network structure and pre-training strategy that we tested for mammogram-specific classification tasks. However, for specific classification tasks, some training strategies offer performance gains. The most notable performance gains in our experiments involved residual network models.
Digital mammography screening is an important exam for the early detection of breast cancer and reduction in mortality. False positives leading to high recall rates, however, results in unnecessary negative consequences to patients and health care systems. In order to better aid radiologists, computer-aided tools can be utilized to improve distinction between image classifications and thus potentially reduce false recalls. The emergence of deep learning has shown promising results in the area of biomedical imaging data analysis. This study aimed to investigate deep learning and transfer learning methods that can improve digital mammography classification performance. In particular, we evaluated the effect of pre-training deep learning models with other imaging datasets in order to boost classification performance on a digital mammography dataset. Two types of datasets were used for pre-training: (1) a digitized film mammography dataset, and (2) a very large non-medical imaging dataset. By using either of these datasets to pre-train the network initially, and then fine-tuning with the digital mammography dataset, we found an increase in overall classification performance in comparison to a model without pre-training, with the very large non-medical dataset performing the best in improving the classification accuracy.