24 January 2011 Automatic extraction of numeric strings in unconstrained handwritten document images
Author Affiliations +
Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
M. Mehdi Haji, M. Mehdi Haji, Tien D. Bui, Tien D. Bui, Ching Y. Suen, Ching Y. Suen, "Automatic extraction of numeric strings in unconstrained handwritten document images", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740L (24 January 2011); doi: 10.1117/12.874706; https://doi.org/10.1117/12.874706


Modeling segmentation performance in NV-IPM
Proceedings of SPIE (May 28 2014)
A segmentation-free approach to Arabic and Urdu OCR
Proceedings of SPIE (February 03 2013)
Non-Manhattan layout extraction algorithm
Proceedings of SPIE (March 20 2013)
Machine-printed Arabic OCR
Proceedings of SPIE (February 24 1994)
A secure workflow-based automated research manager
Proceedings of SPIE (March 24 2008)

Back to Top