A growing body of discoveries in molecular signatures has revealed that volatile organic compounds (VOCs), the small
molecules associated with an individual's odor and breath, can be monitored to reveal the identity and presence of a
unique individual, as well their overall physiological status. Given the analysis requirements for differential VOC
profiling via gas chromatography/mass spectrometry, our group has developed a novel informatics platform, Metabolite
Differentiation and Discovery Lab (MeDDL). In its current version, MeDDL is a comprehensive tool for time-series
spectral registration and alignment, visualization, comparative analysis, and machine learning to facilitate the efficient
analysis of multiple, large-scale biomarker discovery studies. The MeDDL toolset can therefore identify a large
differential subset of registered peaks, where their corresponding intensities can be used as features for classification.
This initial screening of peaks yields results sets that are typically too large for incorporation into a portable, electronic
nose based system in addition to including VOCs that are not amenable to classification; consequently, it is also
important to identify an optimal subset of these peaks to increase classification accuracy and to decrease the cost of the
final system. MeDDL's learning tools include a classifier similar to a K-nearest neighbor classifier used in conjunction
with a genetic algorithm (GA) that simultaneously optimizes the classifier and subset of features. The GA uses ROC
curves to produce classifiers having maximal area under their ROC curve. Experimental results on over a dozen
recognition problems show many examples of classifiers and feature sets that produce perfect ROC curves.