1 March 2000 Document page segmentation based on pattern spread analysis
Author Affiliations +
This paper introduces an algorithm designed to segment black-and-white documents for the purpose of compression. A single document is segmented into two documents suitable for pattern-based and run-length-based compression. With some modification the same algorithm may also be used for optical character recognition. The segmentation is performed in two main steps: pattern extraction and classification. Patterns are extracted using a fast scan method that does not need to scan every pixel, and classification uses pattern characteristics such as spread and pattern context to segment the patterns. Documents may be segmented with an accuracy of at least 98%, depending on the content. Furthermore, text of any size and orientation may be successfully classified without the need for skew estimation or correction. This paper presents the segmentation algorithm and discusses the complete compression system.
Phillip E. Mitchell, Phillip E. Mitchell, Hong Yan, Hong Yan, } "Document page segmentation based on pattern spread analysis," Optical Engineering 39(3), (1 March 2000). https://doi.org/10.1117/1.602419 . Submission:


Locally adaptive document skew detection
Proceedings of SPIE (April 02 1997)
Classification of objects in a video sequence
Proceedings of SPIE (April 16 1995)
Multithresholding for document image segmentation
Proceedings of SPIE (March 29 1995)
Space-frequency adaptive trellis-coded wavelet image coding
Proceedings of SPIE (October 22 1996)

Back to Top