11 April 1988 Model Based Segmentation And Hypothesis Generation For The Recognition Of Printed Documents
Author Affiliations +
Proceedings Volume 0860, Real-Time Image Processing: Concepts and Technologies; (1988) https://doi.org/10.1117/12.943391
Event: 1987 Symposium on the Technologies for Optoelectronics, 1987, Cannes, France
Abstract
The task of document recognition requires the scanning of a paper document and the analysis of its content and structure. The resulting electronic representation has to capture the content as well as the logic and layout structure of the document. The first step in the recognition process is scanning, filtering and binarization of the paper document. Based on the preprocessing results we delineate key areas like address or signature for a letter, or the abstract for a report. This segmentation procedure uses a specific document layout model. The validity of this segmentation can be verified in a second step by using the results of more time-consuming procedures like text/graphic classification, optical character recognition (OCR) and the comparison with more elaborate models for specific document parts. Thus our concept of model driven segmentation allows quick focussing of the analysis on important regions. The segmentation is able to operate directly on the raster image of a document without necessarily requiring CPU-intensive preprocessing steps for the whole document. A test version for the analysis of simple business letters has been implemented.
© (1988) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
A Dengel, A Luhn, B Ueberreiter, "Model Based Segmentation And Hypothesis Generation For The Recognition Of Printed Documents", Proc. SPIE 0860, Real-Time Image Processing: Concepts and Technologies, (11 April 1988); doi: 10.1117/12.943391; https://doi.org/10.1117/12.943391
PROCEEDINGS
7 PAGES


SHARE
RELATED CONTENT

Scalable ranked retrieval using document images
Proceedings of SPIE (March 24 2014)
Non-Manhattan layout extraction algorithm
Proceedings of SPIE (March 21 2013)
Clustering header categories extracted from web tables
Proceedings of SPIE (February 08 2015)
Text segmentation for automatic document processing
Proceedings of SPIE (January 07 1999)
Automatic benchmarking scheme for page segmentation
Proceedings of SPIE (March 23 1994)
Benchmarking system for document analysis algorithms
Proceedings of SPIE (April 01 1998)

Back to Top