30 March 1995 Modified character-level deciphering algorithm for OCR in degraded documents
Author Affiliations +
Abstract
Modifications to a previous character-level deciphering algorithm for OCR are presented in this paper that are able to handle touching characters and are tolerant to mistakes made at the clustering stage. The objective of a character-level deciphering algorithm is to assign alphabetic identities to character patterns such that the character repetition pattern in an input text matches the letter repetition pattern provided by a language model. Degradation in document images usually causes the occurrence of touching characters and mistakes in clustering the character patterns, which pose difficulties for character-level deciphering algorithms. The modifications proposed in this paper tightly integrate visual constraints from characters and touching patterns with constraints from a language model to decode touching characters and to detect and reverse clustering mistakes. It provides a deciphering algorithm with robust performance under image degradation.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chi Fang, Chi Fang, Jonathan J. Hull, Jonathan J. Hull, } "Modified character-level deciphering algorithm for OCR in degraded documents", Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205843; https://doi.org/10.1117/12.205843
PROCEEDINGS
8 PAGES


SHARE
RELATED CONTENT


Back to Top