8 February 2015 A robust segmentation of scanned documents
Author Affiliations +
The image quality of reprinted documents that were scanned at a high resolution may not satisfy human viewers who anticipate at least the same image quality as the original document. Moiré artifacts without proper descreening, text blurred by the poor scanner modulation transfer function (MTF), and color distortion resulting from misclassification between color and gray may make the reprint quality worse. To remedy these shortcomings from reprinting, the documents should be classified into various attributes such as image or text, edge or non-edge, continuous-tone or halftone, color or gray, and so on. The improvement of the reprint quality could be achieved by applying proper enhancement with these attributes. In this paper, we introduce a robust and effective approach to classify scanned documents into the attributes of each pixel. The proposed document segmentation algorithm utilizes simple features such as variance-to-mean (VMR), gradient, etc in various combinations of sizes and positions of a processing kernel. We also exploit each direction of gradients in the multiple positions of the same kernel to detect as small as 4-point text. Experimental results show that our proposed algorithm performs well over various types of the scanned documents including the documents that were printed in a resolution of low lines per inch (LPI).
© (2015) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hyung Jun Park, Hyung Jun Park, Ji Young Yi, Ji Young Yi, "A robust segmentation of scanned documents", Proc. SPIE 9395, Color Imaging XX: Displaying, Processing, Hardcopy, and Applications, 939506 (8 February 2015); doi: 10.1117/12.2076907; https://doi.org/10.1117/12.2076907

Back to Top