Paper
31 July 2019 Exploring data sampling techniques for imbalanced classification problems
Yu Sui, Xiaohui Zhang, Jiajia Huan, Haifeng Hong
Author Affiliations +
Proceedings Volume 11198, Fourth International Workshop on Pattern Recognition; 1119813 (2019) https://doi.org/10.1117/12.2540457
Event: Fourth International Workshop on Pattern Recognition, 2019, Nanjing, China
Abstract
The class imbalance problem is one of the key challenges in machine learning and data mining. Imbalanced data can result in the sub-optimal performance of classification models. To address the problem, a variety of data sampling methods have been proposed in previous studies. However, there is no universal solution and it is worth to explore which kind of data sampling technique is more effective in balancing class distribution in terms of the type of data and classifier. In this work, we present an experimental study based on a number of real-world data sets obtained from different disciplines. The goal is to investigate different sampling techniques in terms of the effectiveness of increasing the classification performance in imbalanced data sets. In particular, we study ten sampling methods of different types, including random sampling, clusterbased sampling, ensemble sampling and so on. Besides, the C4.5 decision tree algorithm is used to train the base classifiers and the performance is measured by using precision, G-Measure and Cohen's Kappa statistic.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yu Sui, Xiaohui Zhang, Jiajia Huan, and Haifeng Hong "Exploring data sampling techniques for imbalanced classification problems", Proc. SPIE 11198, Fourth International Workshop on Pattern Recognition, 1119813 (31 July 2019); https://doi.org/10.1117/12.2540457
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Machine learning

Data modeling

Data mining

Performance modeling

Software engineering

Data centers

Detection and tracking algorithms

Back to Top