6 March 2013 Density-induced oversampling for highly imbalanced datasets
Author Affiliations +
The problem of highly imbalanced datasets with only sparse data of the minority class in the context of two class classification is investigated. A novel synthetic data oversampling technique is proposed which utilizes estimations of the probability density distribution in the feature space. First, a Gaussian mixture model (GMM) from the data of the well-sampled majority class is generated and with its help a new GMM is approximated by Bayesian adaptation using the sparse minority class data. Random synthetic data is generated from the adapted GMM and an additional assignment rule assigns this data to either the minority class or else discards it. The obtained synthetic data is employed in combination with the available original data to train a support vector machine classifier. The examined application in this paper is optical on-line process monitoring of laser brazing with only rare sporadic occurring defects. Experiments with different amounts of minority class data samples and comparisons to other methods show that this approach performs very well for highly imbalanced datasets.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Daniel Fecker, Daniel Fecker, Volker Märgner, Volker Märgner, Tim Fingscheidt, Tim Fingscheidt, "Density-induced oversampling for highly imbalanced datasets", Proc. SPIE 8661, Image Processing: Machine Vision Applications VI, 86610P (6 March 2013); doi: 10.1117/12.2003973; https://doi.org/10.1117/12.2003973

Back to Top