We investigated three classifiers for the task of distinguishing between benign and malignant breast lesions.
Classification performance was measured in terms of area under the ROC curve (AUC value). We compared linear
discriminant analysis (LDA), quadratic discriminant analysis (QDA) and a Bayesian neural net (BNN) with 5 hidden
units. For each lesion, 46 image features were extracted and principal component analysis (PCA) of these features was
used as classifier input. For each classifier, the optimal number of principal components was determined by performing
PCA within each step of a leave-one-case-out protocol for the training dataset (1125 lesions, 14% cancer prevalence)
and determining which number of components maximized the AUC value. Subsequently, each classifier was trained on
the training dataset and applied 'cold turkey' to an independent test set from a different population (341 lesions, 30%
cancer prevalence). The optimal number of principal components for LDA was 24, accounting for 97% of the variance
in the image features. For QDA and BNN, these numbers were 5 (70%) and 15 (93%), respectively. The LDA, QDA and
BNN obtained AUC values of 0.88, 0.85, and 0.91, respectively, in the leave-one-case-out analysis. In the independent
test - with AUCs of 0.88, 0.76, and 0.82 - only LDA achieved performance identical to that for the training set (lower
bound of 95% non-inferiority interval -.0067), while the others performed significantly worse (p-values << 0.05). While
the more complex BNN classifier outperformed the others in leave-one-case-out of a large dataset, LDA was the robust
best-performer in an independent test.
The development of image databases for CAD research is not a trivial task. The collection and management of images
and their related metadata from multiple sources is a time-consuming but necessary process. By standardizing and
centralizing the methods in which these data are maintained, one can generate subsets of a larger database that match the
specific criteria needed for a particular research project in a quick and efficient manner. A research-oriented
management system of this type is highly desirable in a multi-modality CAD research environment. An online, webbased
database system for the storage and management of research-specific medical image metadata was designed for
use with four modalities of breast imaging: screen-film mammography, full-field digital mammography, breast
ultrasound and breast MRI. The system was designed to consolidate data from multiple clinical sources and provide the
user with the ability to anonymize the data. Input concerning the type of data to be stored as well as desired searchable
parameters was solicited from researchers in each modality. The backbone of the database was created using MySQL.
A robust and easy-to-use interface for entering, removing, modifying and searching information in the database was
created using HTML and PHP. This standardized system can be accessed using any modern web-browsing software and
is fundamental for our various research projects on computer-aided detection, diagnosis, cancer risk assessment, multimodality
lesion assessment, and prognosis. Our CAD database system stores large amounts of research-related metadata
and successfully generates subsets of cases that match the user's desired search criteria.
To be clinically viable, computer-aided diagnosis (CAD) systems must be as automated and user-friendly as possible. CAD systems for breast ultrasound are still preliminary and are not adapted for use in a standard clinical environment. For example, computer detection and classification schemes need the pixel size of each image to operate correctly, and while the DICOM standard allows pixel size to be encoded in the image file, some equipment manufacturers neglect to utilize the encoding. As a result, the pixel size is calculated from user input. In order to increase clinical efficiency and reduce the likelihood of error due to incorrect image specifications, automating this input process is a highly desirable asset. We developed and applied a character recognition algorithm to the annotation region of each ultrasound image in our database. A set of numerical masks, which corresponded to the characters used in the annotation information, enabled the filtering of each image. Numerical masks yielding the maximum output from the comparison operation between image data and mask were output to obtain the annotation information. Each image was then automatically cropped to remove the annotation banner and leave only the image data. The cropped image matrix dimensions and character recognition output were used to determine the corresponding pixel size. The algorithm was tested on 1110 images with various pixel sizes. In every case, the value output by the algorithm corresponded exactly to the true value. Our recognition algorithm now allows for the clinical translation of our fully-automated breast ultrasound CAD system.