Paper
13 January 2003 Graphics extraction in PDF document
Author Affiliations +
Proceedings Volume 5010, Document Recognition and Retrieval X; (2003) https://doi.org/10.1117/12.479683
Event: Electronic Imaging 2003, 2003, Santa Clara, CA, United States
Abstract
PDF is a document format for final presentation. It preserves the original document layout but often not the document logical structure. Graphic illustrations such as figures and tables in PDF often consist of ungrouped graphic primitives such as lines, curves and small text elements. In this paper, we present a bottom up approach to recognize graphic illustration in PDF document. Vicinities of page elements in both 2D space and indexes in layer are used to understand the logical connection between elements. Graphics recognition and elements grouping for illustration is an important part in understanding the document logical structure. This technique can be used in automatic figure extraction, document re-flow and document transformation.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hui Chao "Graphics extraction in PDF document", Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); https://doi.org/10.1117/12.479683
Lens.org Logo
CITATIONS
Cited by 6 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Image segmentation

Raster graphics

Chaos

Data storage

Electronic imaging

Operating systems

RELATED CONTENT


Back to Top