30 March 1995 High-performance OCR preclassification trees
Author Affiliations +
We present an automatic method for constructing high-performance preclassification decision trees for OCR. Good preclassifiers prune the set of alternative classes to many fewer without erroneously pruning the correct class. We build the decision tree using greedy entropy minimization, using pseudo-randomly generated training samples derived from a model of imaging defects, and then `populate' the tree with many more samples to drive down the error rate. In [BM94] we presented a statistically rigorous stopping rule for population that enforces a user-specified upper bound on error: this works in practice, but is too conservative, driving the error far below the bound. Here, we describe a refinement that achieves the user- specified accuracy more closely and thus improves the pruning rate of the resulting tree. The method exploits the structure of the tree: the essential technical device is a leaf-selection rule based on Good's Theorem [Good53]. We illustrate its effectiveness through experiments on a pan-European polyfont classifier.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Henry S. Baird, Henry S. Baird, C. L. Mallows, C. L. Mallows, } "High-performance OCR preclassification trees", Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205840; https://doi.org/10.1117/12.205840

Back to Top