Computer-aided diagnosis (CADx) involves training supervised classifiers using labeled ("truth-known") data.
Often, training data consists of high-dimensional feature vectors extracted from medical images. Unfortunately, very
large data sets may be required to train robust classifiers for high-dimensional inputs. To mitigate the risk of classifier
over-fitting, CADx schemes may employ feature selection or dimension reduction (DR), for example, principal
component analysis (PCA). Recently, a number of novel "structure-preserving" DR methods have been proposed1.
Such methods are attractive for use in CADx schemes for two main reasons. First, by providing visualization of highdimensional
data structure, and second, since DR can be unsupervised or semi-supervised, unlabeled ("truth-unknown")
data may be incorporated2. However, the practical application of state-of-the-art DR techniques such as, t-SNE3, to
breast CADx were inhibited by the inability to retain a parametric embedding function capable of mapping new input
data to the reduced representation. Deep (more than one hidden layer) neural networks can be used to learn such
parametric DR embeddings. We explored the feasibility of such methods for use in CADx by conducting a variety of
experiments using simulated feature data, including models based on breast CADx features. Specifically, we
investigated the unsupervised parametric t-SNE4 (pt-SNE), the supervised deep t-distributed MCML5 (dt-MCML), and
hybrid semi-supervised modifications combining the two.