1 March 2000 Document page segmentation based on pattern spread analysis
Phillip E. Mitchell, Hong Yan
Author Affiliations +
This paper introduces an algorithm designed to segment black-and-white documents for the purpose of compression. A single document is segmented into two documents suitable for pattern-based and run-length-based compression. With some modification the same algorithm may also be used for optical character recognition. The segmentation is performed in two main steps: pattern extraction and classification. Patterns are extracted using a fast scan method that does not need to scan every pixel, and classification uses pattern characteristics such as spread and pattern context to segment the patterns. Documents may be segmented with an accuracy of at least 98%, depending on the content. Furthermore, text of any size and orientation may be successfully classified without the need for skew estimation or correction. This paper presents the segmentation algorithm and discusses the complete compression system.
Phillip E. Mitchell and Hong Yan "Document page segmentation based on pattern spread analysis," Optical Engineering 39(3), (1 March 2000). https://doi.org/10.1117/1.602419
Published: 1 March 2000
Lens.org Logo
CITATIONS
Cited by 15 scholarly publications and 2 patents.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Image classification

Image processing algorithms and systems

Image compression

Visualization

Acquisition tracking and pointing

Optical engineering

RELATED CONTENT

Locally adaptive document skew detection
Proceedings of SPIE (April 03 1997)
Classification of objects in a video sequence
Proceedings of SPIE (April 17 1995)

Back to Top