18 April 2006 Clustering method via independent components for semi-structured documents
Author Affiliations +
Abstract
This paper presents a novel clustering method for XML documents. Much research effort of document clustering is currently devoted to support the storage and retrieval of large collections of XML documents. However, traditional text clustering approaches cannot embody the structural information of semi-structured documents. Our technique is firstly to extract relative path features to represent each document. And then, we transform these documents to Vector Space Model (VSM) and propose a similarity computation. Before clustering, we apply Independent Component Analysis (ICA) to reduce dimensions of VSM. To the best of author's knowledge, ICA has not been used for XML clustering before. The standard C-means partition algorithm is also improved: When a solution can be no more improved, the algorithm makes the next iteration after an appropriate disturbance on the local minimum solution. Thus the algorithm can skip out of the local minimum and in the meanwhile, reach the whole search space. Experimental results, based on two real datasets and one synthetic dataset, show that the proposed approach is efficient and outperforms naive-clustering method without ICA applied.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tong Wang, Da-Xin Liu, Xuanzuo Lin, Wei Sun, "Clustering method via independent components for semi-structured documents", Proc. SPIE 6241, Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006, 62410V (18 April 2006); doi: 10.1117/12.665427; https://doi.org/10.1117/12.665427
PROCEEDINGS
8 PAGES


SHARE
RELATED CONTENT

I-vectors for image classification
Proceedings of SPIE (September 23 2014)
View subspaces for indexing and retrieval of 3D models
Proceedings of SPIE (February 04 2010)
Sparse representation in speech signal processing
Proceedings of SPIE (November 13 2003)
Fast association-rule-based similarity search in 3D models
Proceedings of SPIE (November 11 2004)

Back to Top