4 February 2013 Graphic composite segmentation for PDF documents with complex layouts
Author Affiliations +
Converting the PDF books to re-flowable format has recently attracted various interests in the area of e-book reading. Robust graphic segmentation is highly desired for increasing the practicability of PDF converters. To cope with various layouts, a multi-layer concept is introduced to segment graphic composites including photographic images, drawings with text insets or surrounded with text elements. Both image based analysis and inherent digital born document advantages are exploited in this multi-layer based layout analysis method. By combining low-level page elements clustering applied on PDF documents and connected component analysis on synthetically generated PNG image document, graphic composites can be segmented for PDF documents with complex layouts. The experimental results on graphic composite segmentation of PDF document pages have shown satisfactory performance.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Canhui Xu, Canhui Xu, Zhi Tang, Zhi Tang, Xin Tao, Xin Tao, Cao Shi, Cao Shi, "Graphic composite segmentation for PDF documents with complex layouts", Proc. SPIE 8658, Document Recognition and Retrieval XX, 86580E (4 February 2013); doi: 10.1117/12.2003705; https://doi.org/10.1117/12.2003705


Back to Top