10 July 2009 Information extraction from semi-structured web page based on DOM tree
Author Affiliations +
Proceedings Volume 7490, PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering; 749015 (2009) https://doi.org/10.1117/12.837215
Event: International Conference on Photonics and Image in Agriculture Engineering (PIAGENG 2009), 2009, Zhangjiajie, China
Abstract
To extract information automatically from semi-structured web pages, this paper puts forward a method named IESS for discovering the record model based on DOM and Maximal Similar Sub Tree, to identify records automatically and correctly when there are some differences in expression models of records that belong to the same type. Furthermore, the system can extract information from result pages of paper searching websites automatically. The experiments made through with some common paper searching websites have demonstrated that this system has high efficiency and accuracy.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Wei-Dong Li, Wei-Dong Li, Yi-bing Dong, Yi-bing Dong, } "Information extraction from semi-structured web page based on DOM tree", Proc. SPIE 7490, PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering, 749015 (10 July 2009); doi: 10.1117/12.837215; https://doi.org/10.1117/12.837215
PROCEEDINGS
6 PAGES


SHARE
Back to Top