Paper
25 September 2003 A GA-based clustering algorithm for large data sets with mixed numeric and categorical values
Jie Li, Xinbo Gao, Licheng Jiao
Author Affiliations +
Proceedings Volume 5286, Third International Symposium on Multispectral Image Processing and Pattern Recognition; (2003) https://doi.org/10.1117/12.538864
Event: Third International Symposium on Multispectral Image Processing and Pattern Recognition, 2003, Beijing, China
Abstract
In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numeric and categorical values. However, most exciting clustering algorithms are only efficient for the numeric data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numeric and categorical values.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jie Li, Xinbo Gao, and Licheng Jiao "A GA-based clustering algorithm for large data sets with mixed numeric and categorical values", Proc. SPIE 5286, Third International Symposium on Multispectral Image Processing and Pattern Recognition, (25 September 2003); https://doi.org/10.1117/12.538864
Lens.org Logo
CITATIONS
Cited by 20 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Genetic algorithms

Data mining

Prototyping

Computer programming

Fuzzy logic

Genetics

Distance measurement

Back to Top