18 April 2006 Processing heterogeneous XML data from multi-source
Author Affiliations +
Abstract
Recently XML heterogeneity has become a new challenge. In this paper, a novel clustering strategy is proposed to regroup these heterogeneous XML sources, for searching in a relatively smaller space with certain similarity can reduce cost. The strategy consists of four steps. We at first extract features about paths and map them into High-dimension Vector Space (HDVS). In the data pre-process, two algorithms are applied to diminish the redundancies in XML sources. Then heterogeneous documents are clustered. Finally, Multivalued Dependency (MVD) is introduced, for MVD can be redefined according to the range of constraints of XML. This paper also proposes a novel algorithm that discovering minimal MVD, based on the rough set handling non-integrity data. It can solve the problem that non-integrity data of XML influence on finding the MVD of XML, thus patterns can be extracted from each cluster.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tong Wang, Tong Wang, Da-Xin Liu, Da-Xin Liu, Wei Sun, Wei Sun, Xuanzuo Lin, Xuanzuo Lin, "Processing heterogeneous XML data from multi-source", Proc. SPIE 6242, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006, 62420S (18 April 2006); doi: 10.1117/12.666467; https://doi.org/10.1117/12.666467
PROCEEDINGS
8 PAGES


SHARE
RELATED CONTENT

Flight plan optimization
Proceedings of SPIE (May 21 2015)
Immune algorithm for KDD
Proceedings of SPIE (September 24 2001)
Expert bidder for a pilot's monthly schedule
Proceedings of SPIE (December 31 1989)
Perceptual convexity
Proceedings of SPIE (August 10 1995)

Back to Top