6 April 2000 Using closed itemsets in association rule mining with taxonomies
Author Affiliations +
Abstract
Taxonomies or item hierarchies are often useful in association rule mining. The most time consuming subtask in rule mining is the discovery of the frequent itemsets. A standard algorithm for frequent itemset discovery, such as Apriori, would however produce many redundant itemsets if taxonomies are used, because any two itemsets that share their most specific items always match the same set of transactions. An efficient algorithm must therefore chose a canonical representative for each equivalence class of itemsets. Srikant and Agrawal solved the problem of redundancy in their Cumulate and Stratify algorithms by not allowing an itemset to contain an item and its ancestor. In this paper I will present a new algorithm, Closed Sets, for finding frequent itemsets in the presence of taxonomies. The algorithm requires itemsets to be closed under the taxonomies. If an item is a member of a closed itemset, then all its ancestors also are. The algorithm makes fewer passes over the database than Stratify and it prunes the search space optimally. This is not the case with Cumulate, and not even with Stratify, if there are items in the taxonomy with multiple parents. Furthermore, only modest modifications to Apriori are needed.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Petteri Sevon, Petteri Sevon, } "Using closed itemsets in association rule mining with taxonomies", Proc. SPIE 4057, Data Mining and Knowledge Discovery: Theory, Tools, and Technology II, (6 April 2000); doi: 10.1117/12.381729; https://doi.org/10.1117/12.381729
PROCEEDINGS
8 PAGES


SHARE
RELATED CONTENT

Asymmetric threat data mining and knowledge discovery
Proceedings of SPIE (March 27 2001)
Efficiently mining maximal frequent patterns: fast-miner
Proceedings of SPIE (March 27 2001)
Discovering fuzzy spatial association rules
Proceedings of SPIE (March 12 2002)
Interactive mining of schema for semistructured data
Proceedings of SPIE (March 12 2002)
Web data mining
Proceedings of SPIE (March 12 2002)

Back to Top