21 May 1999 Stepwise linear discriminant analysis in computer-aided diagnosis: the effect of finite sample size
Author Affiliations +
In computer-aided diagnosis, a frequently-used approach is to first extract several potentially useful features from a data set. Effective features are then selected from this feature space, and a classifier is designed using the selected features. In this study, we investigated the effect of finite sample size on classifier accuracy when classifier design involves feature selection. The feature selection and classifier coefficient estimation stages of classifier design were implemented using stepwise feature selection and Fisher's linear discriminant analysis, respectively. The two classes used in our simulation study were assumed to have multidimensional Gaussian distributions, with a large number of features available for feature selection. We investigated the effect of different covariance matrices and means for the two classes on feature selection performance, and compared two strategies for sample space partitioning for classifier design and testing. Our results indicated that the resubstitution estimate was always optimistically biased, except in cases where too few features were selected by the stepwise procedure. When feature selection was performed using only the design samples, the hold-out estimate was always pessimistically biased. When feature selection was performed using the entire finite sample space, and the data was subsequently partitioned into design and test groups, the hold-out estimates could be pessimistically or optimistically biased, depending on the number of features available for selection, number of available samples, and their statistical distribution. All hold-out estimates exhibited a pessimistic bias when the parameters of the simulation were obtained from texture features extracted from mammograms in a previous study.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Berkman Sahiner, Berkman Sahiner, Heang-Ping Chan, Heang-Ping Chan, Nicholas Petrick, Nicholas Petrick, Robert F. Wagner, Robert F. Wagner, Lubomir M. Hadjiiski, Lubomir M. Hadjiiski, } "Stepwise linear discriminant analysis in computer-aided diagnosis: the effect of finite sample size", Proc. SPIE 3661, Medical Imaging 1999: Image Processing, (21 May 1999); doi: 10.1117/12.348606; https://doi.org/10.1117/12.348606

Back to Top