Paper
30 March 1995 High-performance OCR preclassification trees
Henry S. Baird, C. L. Mallows
Author Affiliations +
Proceedings Volume 2422, Document Recognition II; (1995) https://doi.org/10.1117/12.205840
Event: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, 1995, San Jose, CA, United States
Abstract
We present an automatic method for constructing high-performance preclassification decision trees for OCR. Good preclassifiers prune the set of alternative classes to many fewer without erroneously pruning the correct class. We build the decision tree using greedy entropy minimization, using pseudo-randomly generated training samples derived from a model of imaging defects, and then `populate' the tree with many more samples to drive down the error rate. In [BM94] we presented a statistically rigorous stopping rule for population that enforces a user-specified upper bound on error: this works in practice, but is too conservative, driving the error far below the bound. Here, we describe a refinement that achieves the user- specified accuracy more closely and thus improves the pruning rate of the resulting tree. The method exploits the structure of the tree: the essential technical device is a leaf-selection rule based on Good's Theorem [Good53]. We illustrate its effectiveness through experiments on a pan-European polyfont classifier.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Henry S. Baird and C. L. Mallows "High-performance OCR preclassification trees", Proc. SPIE 2422, Document Recognition II, (30 March 1995); https://doi.org/10.1117/12.205840
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Error analysis

Optical character recognition

Statistical modeling

Binary data

Computing systems

Data modeling

Image classification

Back to Top