Translator Disclaimer
24 January 2011 Improved document image segmentation algorithm using multiresolution morphology
Author Affiliations +
Proceedings Volume 7874, Document Recognition and Retrieval XVIII; 78740D (2011) https://doi.org/10.1117/12.873461
Event: IS&T/SPIE Electronic Imaging, 2011, San Francisco Airport, California, United States
Abstract
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper describes modifications to the text/non-text segmentation algorithm presented by Bloomberg,1 which is also available in his open-source Leptonica library.2The modifications result in significant improvements and achieved better segmentation accuracy than the original algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Syed Saqib Bukhari, Faisal Shafait, and Thomas M. Breuel "Improved document image segmentation algorithm using multiresolution morphology", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740D (24 January 2011); https://doi.org/10.1117/12.873461
PROCEEDINGS
8 PAGES


SHARE
Advertisement
Advertisement
RELATED CONTENT

Text segmentation for automatic document processing
Proceedings of SPIE (January 07 1999)
Very fast recognition of GIRO check forms
Proceedings of SPIE (April 14 1993)
New thinning algorithm using rough-set theory
Proceedings of SPIE (April 14 1993)

Back to Top