30 March 1995 Text/graphics separation for technical papers
Author Affiliations +
One of the important operations in automatic analysis of technical papers is a text separation from graphics. In practice, a document skew often occurs both for initial document and for its image after scanning. Also text and graphic blocks can exist which have no rectangular shape. In these cases, the standard text/graphics separation methods such as projection profiles or run length smoothing are not always suitable. In this paper, we propose the text/graphics separation algorithm based on two simple and standard properties of technical paper pages. We call them as area and text compactness properties. The area property takes into account the geometrical relationships between text and graphics. The text compactness property reflects the spatial relationships between text components within block and between text and graphics. An application of both properties allows us to accurately perform the separation in the cases above. No skew correction is required before separation and text and graphic blocks can have arbitrary shape.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Oleg G. Okun, Oleg G. Okun, Sergey V. Ablameyko, Sergey V. Ablameyko, } "Text/graphics separation for technical papers", Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205819; https://doi.org/10.1117/12.205819

Back to Top