Paper
16 January 2006 Partitioning of the degradation space for OCR training
Author Affiliations +
Proceedings Volume 6067, Document Recognition and Retrieval XIII; 606705 (2006) https://doi.org/10.1117/12.641229
Event: Electronic Imaging 2006, 2006, San Jose, California, United States
Abstract
Generally speaking optical character recognition algorithms tend to perform better when presented with homogeneous data. This paper studies a method that is designed to increase the homogeneity of training data, based on an understanding of the types of degradations that occur during the printing and scanning process, and how these degradations affect the homogeneity of the data. While it has been shown that dividing the degradation space by edge spread improves recognition accuracy over dividing the degradation space by threshold or point spread function width alone, the challenge is in deciding how many partitions and at what value of edge spread the divisions should be made. Clustering of different types of character features, fonts, sizes, resolutions and noise levels shows that edge spread is indeed shown to be a strong indicator of the homogeneity of character data clusters.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Elisa H. Barney Smith and Tim Andersen "Partitioning of the degradation space for OCR training", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 606705 (16 January 2006); https://doi.org/10.1117/12.641229
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Point spread functions

Optical character recognition

Data modeling

Detection and tracking algorithms

Current controlled current source

Printing

Scanners

Back to Top