4 February 2013 Rule-based versus training-based extraction of index terms from business documents: how to combine the results
Author Affiliations +
Abstract
Current systems for automatic extraction of index terms from business documents either take a rule-based or training-based approach. As both approaches have their advantages and disadvantages it seems natural to combine both methods to get the best of both worlds. We present a combination method with the steps selection, normalization, and combination based on comparable scores produced during extraction. Furthermore, novel evaluation metrics are developed to support the assessment of each step in an existing extraction system. Our methods were evaluated on an example extraction system with three individual extractors and a corpus of 12,000 scanned business documents.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Daniel Schuster, Daniel Schuster, Marcel Hanke, Marcel Hanke, Klemens Muthmann, Klemens Muthmann, Daniel Esser, Daniel Esser, } "Rule-based versus training-based extraction of index terms from business documents: how to combine the results", Proc. SPIE 8658, Document Recognition and Retrieval XX, 865813 (4 February 2013); doi: 10.1117/12.2002509; https://doi.org/10.1117/12.2002509
PROCEEDINGS
10 PAGES


SHARE
Back to Top