6 April 2000 Distance functions in dynamic integration of data mining techniques
Author Affiliations +
Abstract
One of the most important directions in the improvement of data mining and knowledge discovery is the integration of multiple data mining techniques. An integration method needs to be able either to evaluate and select the most appropriate data mining technique or to combine two or more techniques efficiently. A recent integration method for the dynamic integration of multiple data mining techniques is based on the assumption that each of the data mining techniques is the best one inside a certain subarea of the whole domain area. This method uses an instance-based learning approach to collect information about the competence areas of the mining techniques and applies a distance function to determine how close a new instance is to each instance of the training set. The nearest instance or instances are used to predict the performance of the data mining techniques. Because the quality of the integration depends heavily on the suitability of the used distance function, our goal is to analyze the characteristics of different distance functions. In this paper we investigate several distance functions as the very commonly used Euclidean distance function, the Heterogeneous Euclidean- Overlap Metric (HEOM), and the Heterogeneous Value Difference Metric (HVDM), among others. We analyze the effects of the use of different distance functions to the accuracy achieved by dynamic integration when the parameters describing datasets vary. We include also results of our experiments with different datasets which include both nominal and continuous attributes.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Seppo Jumani Puuronen, Seppo Jumani Puuronen, Alexey Tsymbal, Alexey Tsymbal, Vagan Terziyan, Vagan Terziyan, } "Distance functions in dynamic integration of data mining techniques", Proc. SPIE 4057, Data Mining and Knowledge Discovery: Theory, Tools, and Technology II, (6 April 2000); doi: 10.1117/12.381747; https://doi.org/10.1117/12.381747
PROCEEDINGS
11 PAGES


SHARE
RELATED CONTENT

A topological-based spatial data clustering
Proceedings of SPIE (April 19 2016)
Value-based customer grouping from large retail data sets
Proceedings of SPIE (April 05 2000)
Incremental information mining
Proceedings of SPIE (March 11 2002)
Decomposition in data mining: a medical case study
Proceedings of SPIE (March 26 2001)
The study on rough set in GIS and remote sensing
Proceedings of SPIE (December 02 2005)

Back to Top