1 August 1992 Document understanding using layout styles of title page images
Author Affiliations +
Abstract
An important problem in the application of compound document architectures is the input of data from raster images. One technique is to use visual, syntactic cues found in the layout of the raster document to infer its logical structure or semantics. Another is to use context derived from characters recognized within a given block of raster data. Both character- and image- based information are considered here. A well-constrained environment is defined for use in developing rules that can be applied to basic book title page understanding. This paper identifies the attributes of title page layout objects which aid in mapping them into the fields of a simple bibliographic format. Using as input the raster images of the title page and the verso of the title page along with the ASCII output of a generic character recognition engine from these same images, a system of rules is defined for generating a marked-up text wherein key bibliographic fields may be identified.
© (1992) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Louis H. Sharpe, Louis H. Sharpe, Basil Manns, Basil Manns, } "Document understanding using layout styles of title page images", Proc. SPIE 1661, Machine Vision Applications in Character Recognition and Industrial Inspection, (1 August 1992); doi: 10.1117/12.130273; https://doi.org/10.1117/12.130273
PROCEEDINGS
11 PAGES


SHARE
RELATED CONTENT

Extraction of text boxes from engineering drawings
Proceedings of SPIE (August 01 1992)
Personalized direct marketing using digital publishing
Proceedings of SPIE (February 10 2006)
Detection of text strings from mixed text/graphics images
Proceedings of SPIE (December 21 2000)
Location and recovery of text on oriented surfaces
Proceedings of SPIE (December 22 1999)
Compressing images for the Internet
Proceedings of SPIE (January 02 1998)

Back to Top