18 December 2001 Stochastic language model for analyzing document physical layout
Author Affiliations +
Abstract
Image segmentation is an important component of any document image analysis system. While many segmentation algorithms exist in the literature, very few i) allow users to specify the physical style, and ii) incorporate user-specified style information into the algorithm's objective function that is to be minimized. We describe a segmentation algorithm that models a document's physical structure as a hierarchical structure where each node describes a region of the document using a stochastic regular grammar. The exact form of the hierarchy and the stochastic language is specified by the user, while the probabilities associated with the transitions are estimated from groundtruth data. We demonstrate the segmentation algorithm on images of bilingual dictionaries.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tapas Kanungo, Tapas Kanungo, Song Mao, Song Mao, } "Stochastic language model for analyzing document physical layout", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); doi: 10.1117/12.450736; https://doi.org/10.1117/12.450736
PROCEEDINGS
9 PAGES


SHARE
Back to Top