1 August 1992 Recognition of poorly printed text by direct extraction of features from gray scale
Author Affiliations +
Omnifont optical character recognition proceeds by computing features on the input image and then classifying the image. Past omnifont optical character recognition techniques that use features have always binarized the image by comparing the brightness of an input pixel with a threshold level and then labeling it as `black'' or `white'' and then computing the features for each character. However, for poorly printed text such binarization results into broken or merged characters and consequently incorrect features. We propose a method for the direct computation of geometrical features, such as strokes, directly from the gray scale image. To this aim we use a model of the image forming process, namely the convolution of the original binary image with the point spread function of the digitizer. We also estimate how printing distortions and noise affect the result so that we can deduce how different parts of a printed character should appear under those conditions. Detected features are then clustered for each set of samples of the training set. The clustering guides the selection of prototypes and the final classification is made by graph matching between prototypes and new (unknown) characters.
© (1992) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Theo Pavlidis, Theo Pavlidis, Li Wang, Li Wang, Jiangying Zhou, Jiangying Zhou, William J. Sakoda, William J. Sakoda, Jairo Rocha, Jairo Rocha, "Recognition of poorly printed text by direct extraction of features from gray scale", Proc. SPIE 1661, Machine Vision Applications in Character Recognition and Industrial Inspection, (1 August 1992); doi: 10.1117/12.130280; https://doi.org/10.1117/12.130280

Back to Top