We investigated three classifiers for the task of distinguishing between benign and malignant breast lesions.
Classification performance was measured in terms of area under the ROC curve (AUC value). We compared linear
discriminant analysis (LDA), quadratic discriminant analysis (QDA) and a Bayesian neural net (BNN) with 5 hidden
units. For each lesion, 46 image features were extracted and principal component analysis (PCA) of these features was
used as classifier input. For each classifier, the optimal number of principal components was determined by performing
PCA within each step of a leave-one-case-out protocol for the training dataset (1125 lesions, 14% cancer prevalence)
and determining which number of components maximized the AUC value. Subsequently, each classifier was trained on
the training dataset and applied 'cold turkey' to an independent test set from a different population (341 lesions, 30%
cancer prevalence). The optimal number of principal components for LDA was 24, accounting for 97% of the variance
in the image features. For QDA and BNN, these numbers were 5 (70%) and 15 (93%), respectively. The LDA, QDA and
BNN obtained AUC values of 0.88, 0.85, and 0.91, respectively, in the leave-one-case-out analysis. In the independent
test - with AUCs of 0.88, 0.76, and 0.82 - only LDA achieved performance identical to that for the training set (lower
bound of 95% non-inferiority interval -.0067), while the others performed significantly worse (p-values << 0.05). While
the more complex BNN classifier outperformed the others in leave-one-case-out of a large dataset, LDA was the robust
best-performer in an independent test.