Translator Disclaimer
18 December 2001 N-gram language models for document image decoding
Author Affiliations +
Proceedings Volume 4670, Document Recognition and Retrieval IX; (2001) https://doi.org/10.1117/12.450728
Event: Electronic Imaging, 2002, San Jose, California, United States
Abstract
This paper explores the problem of incorporating linguistic constraints into document image decoding, a communication theory approach to document recognition. Probabilistic character n-grams (n=2--5) are used in a two-pass strategy where the decoder first uses a very weak language model to generate a lattice of candidate output strings. These are then re-scored in the second pass using the full language model. Experimental results based on both synthesized and scanned data show that this approach is capable of improving the error rate by a factor of two to ten depending on the quality of the data and the details of the language model used.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Gary E. Kopec, Maya R. Said, and Kris Popat "N-gram language models for document image decoding", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); https://doi.org/10.1117/12.450728
PROCEEDINGS
12 PAGES


SHARE
Advertisement
Advertisement
RELATED CONTENT


Back to Top