1 April 1998 Pattern matcher for OCR-corrupted documents and its evaluation
Author Affiliations +
Abstract
Document classification is one of the fundamental technologies prior to document routing, document understanding, and information extraction algorithms. Pattern matchers with rule-based components are in use in news agencies with electronic text as input. However, classification of OCR documents must deal with the ambiguities of the underlying OCR engine. The ambiguities of character segmentation and classification lead towards a directed graph of characters as the results of the OCR process - the so-called character hypothesis lattice. This paper deals with techniques to enhance the pattern matcher in order to cope with CHLs.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Stefan Agne, Stefan Agne, Hans-Guenther Hein, Hans-Guenther Hein, } "Pattern matcher for OCR-corrupted documents and its evaluation", Proc. SPIE 3305, Document Recognition V, (1 April 1998); doi: 10.1117/12.304629; https://doi.org/10.1117/12.304629
PROCEEDINGS
9 PAGES


SHARE
Back to Top