Paper
18 April 2006 An algorithmic approach to mining unknown clusters in training data
Author Affiliations +
Abstract
In this paper, unsupervised learning is utilized to develop a method for mining unknown clusters in training data. The approach is based on the Bayesian Data Reduction Algorithm (BDRA), which has recently been developed into a patented system called the Data Extraction and Mining Software Tool (DEMIST). In the BDRA, the modeling assumption is that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach to selecting and discretizing the relevant features of each class for best performance. The primary metric for selecting and discretizing all relevant features contained in each class is an analytic formula for the probability of error conditioned on the training data. Thus, the primary contribution of this work is to demonstrate an algorithmic approach to finding multiple unknown clusters in training data, which represents an extension to the original data clustering algorithm. To illustrate performance, results are demonstrated using simulated data that contains multiple clusters. In general, the results of this work will demonstrate an effective method for finding multiple clusters in data mining applications.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Robert S. Lynch Jr. and Peter K. Willett "An algorithmic approach to mining unknown clusters in training data", Proc. SPIE 6241, Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006, 624101 (18 April 2006); https://doi.org/10.1117/12.664731
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mining

Algorithm development

Data modeling

Detection and tracking algorithms

Expectation maximization algorithms

Data mining

Quantization

Back to Top