1 March 2000 Document page segmentation based on pattern spread analysis
Author Affiliations +
This paper introduces an algorithm designed to segment black-and-white documents for the purpose of compression. A single document is segmented into two documents suitable for pattern-based and run-length-based compression. With some modification the same algorithm may also be used for optical character recognition. The segmentation is performed in two main steps: pattern extraction and classification. Patterns are extracted using a fast scan method that does not need to scan every pixel, and classification uses pattern characteristics such as spread and pattern context to segment the patterns. Documents may be segmented with an accuracy of at least 98%, depending on the content. Furthermore, text of any size and orientation may be successfully classified without the need for skew estimation or correction. This paper presents the segmentation algorithm and discusses the complete compression system.
Phillip E. Mitchell, Hong Yan, "Document page segmentation based on pattern spread analysis," Optical Engineering 39(3), (1 March 2000). https://doi.org/10.1117/1.602419 . Submission:

Back to Top