21 March 1989 A Cut-Based Procedure For Document-Layout Modelling And Automatic Document Analysis
Author Affiliations +
Abstract
With the growing degree of office automation and the decreasing costs of storage devices, it becomes more and more attractive to store optically scanned documents like letters or reports in an electronic form. Therefore the need of a good paper-computer interface becomes increasingly important. This interface must convert paper documents into an electronic representation that not only captures their contents, but also their layout and logical structure. We propose a procedure to describe the layout of a document page by dividing it recursively into nested rectangular areas. A semantic meaning to each one will be assigned by means of logical labels. The procedure is used as a basis for modelling a hierarchical document layout onto the semantic meaning of the parts in the document. We analyse the layout of a document using a best-first search in this tesselation structure. The search is directed by a measure of similarity between the layout pattern in the model and the layout of the actual document. The validity of a hypothesis for the semantic labelling of a layout block can then be verified. It either supports the hypothesis or initiates the generation of a new one. The method has been implemented in Common Lisp on a SUN 3/60 Workstation and has run for a large population of office docu-ments. The results obtained have been very encouraging and have convincingly confirmed the soundness of the approach.
© (1989) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Andreas R. Dengel, Andreas R. Dengel, } "A Cut-Based Procedure For Document-Layout Modelling And Automatic Document Analysis", Proc. SPIE 1095, Applications of Artificial Intelligence VII, (21 March 1989); doi: 10.1117/12.969361; https://doi.org/10.1117/12.969361
PROCEEDINGS
8 PAGES


SHARE
Back to Top