1 March 2000 Document page segmentation based on pattern spread analysis
Author Affiliations +
Optical Engineering, 39(3), (2000). doi:10.1117/1.602419
Abstract
This paper introduces an algorithm designed to segment black-and-white documents for the purpose of compression. A single document is segmented into two documents suitable for pattern-based and run-length-based compression. With some modification the same algorithm may also be used for optical character recognition. The segmentation is performed in two main steps: pattern extraction and classification. Patterns are extracted using a fast scan method that does not need to scan every pixel, and classification uses pattern characteristics such as spread and pattern context to segment the patterns. Documents may be segmented with an accuracy of at least 98%, depending on the content. Furthermore, text of any size and orientation may be successfully classified without the need for skew estimation or correction. This paper presents the segmentation algorithm and discusses the complete compression system.
Phillip E. Mitchell, Hong Yan, "Document page segmentation based on pattern spread analysis," Optical Engineering 39(3), (1 March 2000). https://doi.org/10.1117/1.602419
JOURNAL ARTICLE
11 PAGES


SHARE
RELATED CONTENT

Locally adaptive document skew detection
Proceedings of SPIE (April 03 1997)
Classification of objects in a video sequence
Proceedings of SPIE (April 17 1995)
Space-frequency adaptive trellis-coded wavelet image coding
Proceedings of SPIE (October 23 1996)

Back to Top