We present a fast new algorithm for grayscale character recognition. By operating on grayscale images directly, we attempt to maximize the information that can be extracted. Traditional recognition based on binarization of images is also avoided. Previous work using gradient-based contour encoding was used in a restricted domain where objects were size- invariant and fewer patterns had to be distinguished; a simpler feature set sufficed, in this domain. In character recognition, a feature-extractor has to handle character images of arbitrary size. Our algorithm extracts local contour features based on the pixel gradients present in an image. A gradient map, i.e., gradient magnitude and direction at every pixel, is computed; this map is thresholded to avoid responses to noise and spurious artifacts, the map is then partitioned coarsely. Character contours are encoded by quantizing gradient directions into a few representative direction bins. This method does not require that images be normalized to a fixed grid-size before feature extraction. Features extracted by this method have been used to train a 2-layer neural network classifier. Experiments with machine-printed numeric (10-class), and alphanumeric & punctuation (77-class) character images, captured at 300 ppi from mailpiece images, have been conducted. Currently, 99.4% numerals and 93.4% from 77-class images were recognized correctly from a testing set of 5420 numeric and 24,988 character images. When shape and case confusions are allowed, the recognition performance improves to 97.5%. A similar experiment with binarized 77-class images resulted in 92.1% correctly recognized test images. This method is being extended to handwritten characters recognition.
Conventional structural pattern recognition methods that rely on thinning approximate the skeleton by polygons and proceed from that step to recognition. Polygonal approximations have the disadvantage that they introduce significant ambiguities. In particular, vertices may correspond either to real corners or to approximations of smooth arcs. The method presented in this paper attempts to overcome the above problem by performing feature decomposition without doing polygonal approximations. The proposed method relies on topographic features and in particular ridge lines. Ridge lines are normally centered within character strokes, forming skeleton-like ribbons. The information about ridge line directions obtained from the underlying surface of the gray tone is used to discriminate between arcs and straight line segments. By not using a conventional thinning algorithm and the usual polygonal approximations, we are able to reduce some artifacts of conventional thinning and to eliminate completely the ambiguities mentioned above.
This paper reports proof of concept of a design for recognizing postal address blocks. The system must function with varying and unspecified fonts, dot matrix printing, and poor print quality. Our design achieves tolerance to differing contrast and degraded print via grayscale analysis, and omnifont capability by encoding character shapes as graphs. The current prototype, restricted to digits, successfully recognizes degraded numeric fields. There are four major modules. First, the strokes comprising each character are detected as ridges in grayscale space. Our design is tolerant of wide contrast variation even within a single character, and produces connected strokes from dot matrix print. Second, strokes are grouped to produce line segments and arcs, which are linked to produce a graph describing the character. The third stage, recognition by matching the input character graph to prototype graphs, is described in a companion paper by Rocha and Pavlidis. Finally, secondary classification is applied to break near ties by focusing on discriminating features. The secondary classifier is described in a companion paper by Zhou and Pavlidis. Experimental results on 2000 address blocks supplied by the USPS are presented. We also report experiments on subsampling the data, which indicate that the performance at 100 dpi is very close to that at the original 300 dpi.
In many document processing applications it is necessary to efficiently measure how accurately one shape matches another. Often, it is also necessary that the measurement technique be invariant to rotation and scaling. In this paper, a natural metric for the skeleton is proposed and applied to quantitative shape comparison. This new skeleton metric can be computed particularly efficiently when the skeleton is described with polygonal arcs -- as with the continuous skeleton representation. It is also straightforward and efficient to normalize the orientation and scale of the objects being compared when using this representation. However, the most significant property of the skeleton metric is that it is an upper-bound for the Hausdorff distance between the two shapes. Thus the skeleton metric can be used to ensure a bound on the maximum deviation of the shape boundaries from one another. Using this metric, it becomes possible to introduce simplifying approximations in the skeleton while controlling the error of the corresponding regenerated shape. Thus, the skeleton metric provides a bridge between the qualitative, shape-characterizing aspect of the skeleton and the quantitative, comparative aspect of the Hausdorff metric.
An image enhancement technique for mail piece images based on window statistics is presented. The approach has been developed to increase the image quality for the subsequent segmentation and block analysis in the real-time address block location (RT-ABL) system that processes a stream of mail piece images and locates the destination address block. As a framework of this approach, window statistics consisting of local average, (Alpha) , local standard deviation, (sigma) , and center pixel value, (Rho) , over a 9 X 9 window are used. This approach includes contrast enhancement, bleed-through removal, and binarization. Contrast enhancement and bleed-through removal are achieved through a nonlinear mapping M((Alpha) , (sigma) , (Rho) ) obtained empirically. A simple and efficient binarization is also obtained using the ratio of a pixel value of gray scale output (Rho) ' obtained from the mapping and (Alpha) . Major advantages of this method are the avoidance of black-out or white-out that are encountered in other binarization methods on low contrast images, and improved character segmentation that helps the segmentation tool to locate key components of the destination address block, such as state abbreviation or ZIP Code. Examples of images transformed using the method are presented along with a discussion of the performance comparisons.
A new signal processing method is developed for estimating the skew angle in text document images. Based on a recently introduced multi-line fitting algorithm, the proposed method reformulates the skew detection problem into a special parameter estimation framework such that a signal structure similar to the one in the field of sensor array processing is obtained. Then the well-studied techniques in that formalism (e.g., the ESPRIT algorithm) are exploited to produce a closed-form and high resolution estimate for the skew angle. A simple preprocessing stage transforms each line of text characters into a straight line of a single-pixel width. Then, a virtual planar wave propagation environment reformulates the line fitting problem into the sensor array processing framework, and high resolution signal subspace techniques are used to obtain an estimate of the skew angle. The proposed algorithm possess extensive computational speed superiority over existing single and multiple line fitting algorithms such as the Hough transform method. Details of the proposed skew detection algorithm and experimental results are presented.
This paper presents a thinning algorithm which involves the identification of local features like line segments, tips, and junctions by the use of rough set theory and Euler number calculation within a rectangular window operator. It handles image objects which are already presented in binary image format. The resulting skeleton preserves the topological properties of the original shape in the form of a graph with nodes presenting the local features and arcs for the adjacency relations. The algorithm offers two distinctive advantages in terms of conceptual simplicity and computational effort. In general, it generates the skeleton in one pass plus an auxiliary scan confirming the identification of some features in the ambiguous segment regions. The algorithm provides skeletons of good quality for character objects, and so can be used later for syntactic recognition of the alphabets. Excellent results have also been obtained for general shaped objects.
A method for recognition of word feature graphs without previous segmentation into characters is described. In the system, each subgraph of features that matches a previously defined character prototype is recognized anywhere in the word even if it corresponds to a broken character or to a character touching another one. The characters are detected in the order defined by the matching quality. Each subgraph that is recognized is introduced as a node in a direct net that compiles different alternatives of interpretation of the features in the feature graph. A path in the net represents a consistent succession of characters in the word. The method allows the recognition of characters that overlap, or that are underlined. A final search for the optimal path under certain criteria gives the best interpretation of the word features. The character recognizer uses a flexible matching between the features and a flexible grouping of the individual features to be matched. Broken characters are recognized by looking for gaps between features that may be interpreted as part of a character. Touching characters are recognized because the matching allows non-matched adjacent features. The recognition results of this system on over 4,000 printed numeral characters belonging to a USPS database and on some hand printed words confirmed the method's high robustness level.
This paper presents a segmentation-free approach to optical character recognition (OCR) based on the concept of occluded object recognition, in which objects are recognized and then segmented out from the image. In applying the concept of occluded object recognition to the problem of OCR, we treat characters as touching or occluded objects that are subject to special constraints on their poses, i.e., they are juxtaposed with little or no freedom in rotation. Based on these characteristics, we combine two very powerful techniques used in occluded object recognition -- indexing and voting (pose clustering) -- and tailor them to the problem of OCR. This results in a segmentation-free OCR approach that is both highly efficient and robust. We note that recently some techniques have been proposed for handwritten OCR that conceptually are also segmentation-free, although these techniques are quite different from ours.
Automatic understanding of line drawings is a fundamental task in order to feed a Geographic Information System. Furthermore, the amount of alphanumeric data in engineering drawings and maps makes it necessary to automatically process characters and to provide efficient interactive tools for recognition validation. In this paper, we present the character recognition subsystem implemented within a larger system for the automatic interpretation of Italian cadastral maps (A0 format monochrome drawings representing land properties by means of continuous lines, dashed lines, shading patterns, characters, and special symbols). The system relies upon quite general methodologies, and experiments are now in progress on different kinds of line drawings. The character recognition subsystem consists of a sequence of steps -- from the detection of characters within the raster image to the reconstruction of text strings by grouping single characters, from character recognition to the placement of recognized texts in the vector image. Fully automatic steps are followed by interactive sessions, during which an operator checks and corrects the results provided by the system. The system is currently running on two different platforms: (1) S/370 or S/390 architecture running VM/CMS, equipped with IBM 5080 graphic stations, (2) IBM RS/6000 workstation running AIX.
Check forms are used by many people in daily life for money remittance. Surprisingly, the processing of these forms at banks and post offices is only partly automated. In this paper, we deal with a particular kind of form, viz., the GIRO checks used in Switzerland. We describe a fully automatic system which is able to recognize the following items on a GIRO check: the financial institution, the name and address of the receiver, and the account number. The complete recognition of a GIRO check is divided into two phases. In the first phase, the system performs a layout analysis in order to localize regions corresponding to various items on the check. The input gray-level image is first binarized and segmented using the X-Y-tree decomposition algorithm resulting in a list of atomic entities (e.g., individual characters). Each entity is then interpreted as part of an item (e.g., receiver's name), according to the knowledge about possible layouts of a form. All atomic entities belonging to the same item are grouped together and yield the location of that item. In the second phase, the localized items are separately binarized again and submitted to an OCR engine to obtain streams of characters that correspond to the items. We have tested the system on a large number of checks and the results are promising in terms of both computation time and recognition accuracy.
We propose a new system which recognizes the printed music score more correctly in less time than the conventional systems. The most remarkable characteristic of this system is that it uses the non-template-matching method as much as possible, and as a result, decreases the use of the template-matching process. The non-template-matching method is the way that simple parts of symbols are recognized and reconstructed, and then they are recognized as musical notes. By this method, the number of templates for recognition can be much reduced.
The paper describes a low-cost handwritten character recognizer. It is constituted by three modules: the `acquisition' module, the `binarization' module, and the `core' module. The core module can be logically partitioned into six steps: character dilation, character circumscription, region and `profile' analysis, `cut' analysis, decision tree descent, and result validation. Firstly, it reduces the resolution of the binarized regions and detects the minimum rectangle (MR) which encloses the character; the MR partitions the background into regions that surround the character or are enclosed by it, and allows it to define features as `profiles' and `cuts;' a `profile' is the set of vertical or horizontal minimum distances between a side of the MR and the character itself; a `cut' is a vertical or horizontal image segment delimited by the MR. Then, the core module classifies the character by descending along the decision tree on the basis of the analysis of regions around the character, in particular of the `profiles' and `cuts,' and without using context information. Finally, it recognizes the character or reactivates the core module by analyzing validation test results. The recognizer is largely insensible to character discontinuity and is able to detect Arabic numerals and English alphabet capital letters. The recognition rate of a 32 X 32 pixel character is of about 97% after the first iteration, and of over 98% after the second iteration.
This paper describes a communication theory approach to document image recognition, patterned after the use of hidden Markov models in speech recognition. In general, a document recognition problem is viewed as consisting of three elements -- an image generator, a noisy channel, and an image decoder. A document image generator is a Markov source (stochastic finite-state automation) which combines a message source with an imager. The message source produces a string of symbols, or text, which contains the information to be transmitted. The imager is modeled as a finite-state transducer which converts the one-dimensional message string into an ideal two-dimensional bitmap. The channel transforms the ideal image into a noisy observed image. The decoder estimates the message, given the observed image, by finding the a posteriori most probable path through the combined source and channel models using a Viterbi-like dynamic programming algorithm. The proposed approach has been applied to the problem of decoding scanned telephone yellow pages to extract names and numbers from the listings. A finite-state model for yellow page columns was constructed and used to decode a database of scanned column images containing about 1100 individual listings. Overall, 99.5% of the listings were correctly recognized, with character classification rates of 98% and 99.6%, respectively, for the names and numbers.
We have developed an application software system for font and size invariant optical character recognition (OCR). A preliminary front-end process, which handles gray scale normalization, noise elimination, line finding, and character block segmentation, has been included in our system. But the main characteristic of our system is that we adopt a Fourier descriptors (FDs) based feature extraction approach and a multicategory back-propagation neural network classifier for the recognition. Instead of using one set of FDs to represent the image object as in conventional FDs approach, we use three sets of FDs to represent different portions of the object. We find this approach can solve the intrinsic problem of FDs caused by their rotational and reflectional invariance properties. Thus, our existing method can correctly classify ambiguous characters like 5, 2, p, q, etc. Our system will become a primitive building block for later more complex OCR systems. In general, the back propagation provides a gradient descent optimization in training. Without prior knowledge of the mathematical relation between the input and output, the network is highly efficient to map a large variety set of input patterns to an arbitrary set of output patterns after successful training. Thus, the system is not only capable of understanding printed English text and isolated cursive scripts, but also can be extended to read other symbols at the expense of additional training time. At present, a 99.8 percent rate of recognition accuracy has already been achieved for trained English samples in our experiment.
We examine the use of character image analysis coupled with contextual information in complex data gathering forms to identify and correct optical character recognition (OCR) system rejection and substitution errors. Segmented characters from a complex data gathering form are initially classified using an OCR engine based on a combination of Karhunen-Loeve transforms and a back-propagation neural network. Systems of equations are derived from the data gathering form to determine the values of characters rejected by the OCR engine and to verify the consistency of the data captured. If the OCR results for a single form are determined to be inconsistent with respect to the form's data relationships, a set of decision algorithms which incorporates a second neural network and uses additional character features is used to tag characters according to their likelihood of substitution error. Potential substitution errors are incrementally added to the set of OCR reject errors and are processed through dynamically selected systems of equations and search techniques which correct both error classes. We provide experimental results and determine the extent to which errors can be detected and corrected for various OCR error rates.
Certain characters are distinguishable from each other only by fine detail, and, therefore, in our method we group those characters into hierarchical categories. When the first classifier assigns a character to one of those categories, the second process, which is called disambiguation, is applied. We actually use two types of disambiguators. In one we look at the skeleton graph in finer detail and in the other we look at the original gray scale data. The first disambiguation process is geared toward resolving ties between the top two choices of the first classifier (pair disambiguation). Each procedure uses the skeleton features matched to the prototypes by the first classifier and then takes a closer look at the geometrical relations between arcs and strokes. Error analysis of the results suggested the need for the re- examination of the gray scale data. For this purpose the gray scale values of skeletons are used to find an accurate threshold in order to extract contours. Interpretations whose characteristics do not align well with the features measured from the contours are eliminated progressively. Experiments conducted with this method on address blocks supplied by the USPS indicated that the overall performance was substantially improved.
We propose a structure pattern recognition based OCR system for printed text recognition. The segmentation algorithm is a raster-scan based one that reduces memory access time a great deal. The feature extract algorithm is a chain-code encoding based graph traversal algorithm. It has an advantage of single memory access for each pixel while most graph traversal algorithms require multiple scans of the entire image. While traversing a thinned character, structural feature is automatically extracted. The classification is a process of tree search in the structure feature space that is created during the training process. The segmentation, thinning, and feature extraction algorithms are all raster-scan based algorithms that can be implemented by a parallel systolic array architecture.
Traditionally, the problem of cursive script recognition has been handled in two fundamental ways: one based on the global approach and the other on the segmentation approach. In this paper, we present an inexact segmentation approach to segment the word into letters or even strokes. Our algorithm searches for high curvature points along the lower contour of the image profile. These points are then treated as segment points and marked as potential letter boundaries. Using the histogram profile, an initial estimation of the word length is made. Yet, this estimated word length may be adjusted later. In the process, dynamic programming is used as a general optimization technique to produce a list of possible candidates of the same word length. If the score of these candidates cannot meet some recognition criteria, the word length is re-estimated and another set of candidates are generated. Eventually, an entropy- based measure is provided for comparison among candidates of different word length. In our system, a contextual postprocessor can easily be added to further improve the recognition rate. But in this instance, we have an additional advantage in that a large number of candidates of variable word length are available for selection and even some words with unrecognized letters may also be taken into consideration. In this paper, some experiments are also carried out to evaluate the performance of our novel algorithm.
In this paper we present the hardware design and characteristics of a non-conventional text reading system. By non-conventional, we define a text reading system that is able to scan, detect, extract, and recognize the text characters directly from the `page' without sending strings of 0s and 1s to the host computer (which uses software techniques to recognize the text characters). The non-conventional system is also called `character-driven' OCR system. The system proposed here consists of six main parts, a raster scanner unit, the character pre- processing circuit, the FCC generation unit, the line-generator recognizer unit, the graph generator unit and the matching processor unit. In particular, the scanner array (GRAMMA) consists of (12 X 9) photo-cells and is driven by the 3-D motion mechanism to focus and extract the typed text. The gramma array produces a bit-map for the character which is then fed to the pre-processing circuit which applies curve fitting and thinning techniques to produce a normalized skeleton image of the character, which is fed to the FCC generator. The FCC generator produces a generic code string of the character which is passed on to the line generator recognizer unit which breaks the FCC string into straight-line and curve segments. The graph generator produces interrelations between the different segments. The graph form of the character is then fed to the matching processor which produces an appropriate ASCII code of the character. The non-conventional system can recognize typed as well as handwritten characters of different fonts and sizes.
NIST needed a large set of segmented characters for use as a test set for the First Census Optical Character Recognition (OCR) Systems Conference. A machine-assisted human classification system was developed to expedite the process. The testing set consists of 58,000 digits and 10,000 upper and lower case characters entered on forms by high school students and is distributed as Testdata 1. A machine system was able to recognize a majority of the characters but all system decisions required human verification. The NIST recognition system was augmented with human verification to produce the testing database. This augmented system consists of several parts, the recognition system, a checking pass, a correcting pass, and a clean up pass. The recognition system was developed at NIST. The checking pass verifies that an image is in the correct class. The correcting pass allows classes to be changed. The clean-up pass forces the system to stabilize by making all images accepted with verified classifications or rejected. In developing the testing set we discovered that segmented characters can be ambiguous even without context bias. This ambiguity can be caused by over- segmentation or by the way a person writes. For instance, it is possible to create four ambiguous characters to represent all ten digits. This means that a quoted accuracy rate for a set of segmented characters is meaningless without reference to human performance on the same set of characters. This is different from the case of isolated fields where most of the ambiguity can be overcome by using context which is available in the non-segmented image. For instance, in the First Census OCR Conference, one system achieved a forced decision error rate for digits of 1.6% while 21 other systems achieved error rates of 3.2% to 5.1%. This statement cannot be evaluated until human performance on the same set of characters presented one at a time without context has been measured.
The accuracies of OCR systems have increased in recent years due to improvements in pre- processing methods and recognition algorithms. It has been suggested that even higher accuracy can be attained by integrating the results of two or more OCR systems with uncorrelated errors. There are several methods for integrating outputs, some of which had already been published. However, prior to integration, the individual characters or symbols from the various OCR outputs have to be synchronously tracked in order to compare them. Whereas it is simple to determine character correspondence among strings containing only substitution errors, the matching of strings of unequal length, which result when an OCR system generates insertion and deletion errors, is more complicated. Detecting loss of synchronization is made more difficult when consecutive errors occur. The length of the error burst must be determined or upper bounded before the error can be classified or synchronicity restored. This paper focuses on the tracking problem, and uses a dynamic programming search in n dimensions, where n is the number of OCR systems. The algorithm models the error generation process at each of the OCR systems and looks for the most probable combination of synchronized OCR outputs from the beginning to the end of all strings. The final output of the process (which can be the input to an integrator) is a series of n-tuples, each one containing exactly one output character (including nulls or deletions) from an OCR system.
A fast lexicon search based on trie is presented. Traditionally, a successful trie retrieval requires each element of the input string to find an exact match on a particular path in the trie. However, this approach cannot verify a garbled input string, which is generated due to imperfect character segmentation and recognition of an OCR. The string may contain multiple candidates for a character position or no candidate due to segmentation error. The proposed representation scheme, an extension of trie, allows lexicon look-up even with only some of the string elements specified. An input string is presented as a regular expression and all the paths in the trie that satisfy the expression will be considered as word candidates. The candidates can then be ranked based on character classifier decisions. Confusion in character recognition can be handled by allowing an expression component to have more than one character candidate while segmentation error can be overcome by postulating a word region to contain a certain number of unknown characters. Three extensions have been made to trie data structures for position independent access and fast exhaustive search of all valid paths: (1) bidirectional linkage between two nodes at adjacent levels to enable trie traversal in either direction, (2) the nodes with the same letter at the same word position are linked so that all the words which have the same letter at a specific position can be located immediately, and (3) an index table which records the entry points of letters at specific word positions in the trie in order to facilitate trie access at an arbitrary letter position. This representation scheme has been tested on postal address field recognition. The trie represents 11,157 words, the average access time is 0.02 sec on a SUN SPARC2 and with an average of 3 candidates.
Robust recognition of machine-printed characters may involve several independent methods. For instance, available commercial PC-oriented software packages for optical character recognition are able to achieve high performance, each of them on a specific data set. Assuming that each of the underlying approaches to character recognition provides at least partial ordering of the available templates for a given character image, it seems possible to integrate these different approaches so that the integrated system exceeds the individuals in performance. Classification of patterns according to different criteria may lead to conflicting individual decisions. This paper discusses the use of the concept of Pareto-optimality to resolve these conflicts. It is proposed to eliminate the inferior candidates (i.e., the candidates that have better alternatives with regard to all of the criteria). The remaining candidates are noninferior or Pareto-optimal. The Pareto selection mechanism reacts to ranking contradictions by extending the set of solutions. In that way, it decreases the plausibility to eliminate the correct solution (in statistical terms, to commit an error of the first kind). The plausibility to adopt an incorrect solution (to commit an error of the second kind) increases with the cardinality of the Pareto set, and hence it increases with decrease in consistency of the ranking criteria. When consistency of the multicriteria rankings is satisfactory, additional information is sometimes needed to choose a unique solution among the elements of the Pareto set.
We propose a new method of post-processing for character recognition using pattern features and linguistic information. This method corrects errors in the recognition of handwritten Japanese sentences containing Kanji characters. This post-process method is characterized by having two types of character recognition. Improving the accuracy of the character recognition rate of Japanese characters is made difficult by the large number of characters, and the existence of characters with similar patterns. Therefore, it is not practical for a character recognition system to recognize all characters in detail. First, this post-processing method generates a candidate character table by recognizing the simplest features of characters. Then, it selects words corresponding to the character from the candidate character table by referring to a word and grammar dictionary before selecting suitable words. If the correct character is included in the candidate character table, this process can correct an error, however, if the character is not included, it cannot correct an error. Therefore, if this method can presume a character does not exist in a candidate character table by using linguistic information (word and grammar dictionary). It then can verify a presumed character by character recognition using complex features. When this method is applied to an online character recognition system, the accuracy of character recognition improves 93.5% to 94.7%. This proved to be the case when it was used for the editorials of a Japanese newspaper (Asahi Shinbun).
At the first Census Optical Character Recognition Systems Conference, NIST generated accuracy data for more than character recognition systems. Most systems were tested on the recognition of isolated digits and upper and lower case alphabetic characters. The recognition experiments were performed on sample sizes of 58,000 digits, and 12,000 upper and lower case alphabetic characters. The algorithms used by the 26 conference participants included rule-based methods, image-based methods, statistical methods, and neural networks. The neural network methods included Multi-Layer Perceptron's, Learned Vector Quantitization, Neocognitrons, and cascaded neural networks. In this paper 11 different systems are compared using correlations between the answers of different systems, comparing the decrease in error rate as a function of confidence of recognition, and comparing the writer dependence of recognition. This comparison shows that methods that used different algorithms for feature extraction and recognition performed with very high levels of correlation. This is true for neural network systems, hybrid systems, and statistically based systems, and leads to the conclusion that neural networks have not yet demonstrated a clear superiority to more conventional statistical methods. Comparison of these results with the models of Vapnick (for estimation problems), MacKay (for Bayesian statistical models), Moody (for effective parameterization), and Boltzmann models (for information content) demonstrate that as the limits of training data variance are approached, all classifier systems have similar statistical properties. The limiting condition can only be approached for sufficiently rich feature sets because the accuracy limit is controlled by the available information content of the training set, which must pass through the feature extraction process prior to classification.
Over twenty-five organizations participating in the First Census OCR Systems Conference submitted confidence data as well as character classification data for the digit test in that conference. A three parameter function of the rejection rate r is fit to the error rate versus rejection rate data derived from this data, and found to fit it very well over the range from r equals 0 to r equals 0.15. The probability distribution underlying the model e(r) curve is derived and shown to correspond to an inherently inefficient rejection process. With only a few exceptions that seem to be insignificant, all of the organizations submitting data to the conference for scoring seem to employ this same rejection process with a remarkable uniformity of efficiency with respect to the maximum efficiency allowed for this process. Two measures of rejection efficiency are derived, and a practical definition of ideal OCR performance in the classification of segmented characters is proposed. Perfect rejection is shown to be achievable, but only at the cost of reduced classification accuracy in most practical situations. Human classification of a subset of the digit test suggests that there is considerable room for improvement in machine OCR before performance at the level of the proposed ideal is achieved.
Thinning is usually regarded as a process of deleting boundary pixels of a character pattern until all strokes are of one pixel in width without deforming the original stroke configuration and connection. Suppose an OCR system uses deformed skeletons as recognition features, we may see that a `T' may erroneously be recognized as a `Y' or `r.' In order to preserve original stroke features, we propose that global attributes should be considered in the thinning procedure. In this paper, we present a new 3 X 3 window-based binary thinning method that considers both local and global attributes in each thinning iteration. We have designed and implemented a fast thinning algorithm to incorporate these two attributes. This algorithm can (1) prevent any excessive removing of pixels at the junction of two strokes or at the end of a stroke, which causes Y-shaped or shortened skeletons, and (2) can detect and remove any spooky type noise (one or two pixels standing on the surface of a stroke) which usually produces spiky skeletons in most of the previously proposed thinning algorithms. Experiment results show that our thinning method can preserve precise skeleton features of the original character patterns.
An automatic registration method for graph images which recognizes the structure of graphs is proposed. This method has the advantage of saving cost and time compared with the manual registration of types and labels of graphs as search indexes in electronic filing systems. The method is implemented in an advanced document image retrieval system, and the efficiency of the index registration is ascertained by experiment. The outline of this method is as follows: The types of graphs, the axes, and the labels of axes are initially detected. The types of graphs are classified into several categories and labels are attached to the axes. Next, the types are registered as indexes of the graphs, and the label images are recognized and converted into ASCII data, and also stored as indexes. An automaton model is used in the process of attaching the labels to analyze the components of the images. The automaton model refers to a knowledge-base of stored rules of layout structure. It then chooses appropriate labels for each axis.
The quality of reference databases for optical character recognition is vital to the meaningful assessment of classification algorithms. NIST has produced two databases of segmented handprinted characters obtained from socially distinct writer populations. Two approaches to the comparison of the databases are described. The first uses the eigenvalue spectrum of the covariance matrix as an a priori measure of the variance intrinsic to the data. The second cross validates the datasets using classification error to quantify the difficulty of OCR. The eigenvalue spectra from the training partitions of the datasets are generated during the production of the Karhunen Loeve Transforms, the leading components of which are used as prototype features for a classifier. The eignespectra are used to quantify diversity of the character sets and the Bhattacharrya distance is used to measure class separability. The digits, uppers and lowers from the two populations of 500 writers are partitioned into N disjoint sets. The KL transforms of each such set are used for testing, while the remaining N-1 sets form the training prototypes for a PNN nearest neighbor classifier. Recognition error rates and their variances are calculated over the N partitions for both databases independently. This quantifies intra-database diversity. The inter-database results, or `cross' terms, obtained by training and testing on different databases, indicate the generality of the training set. The results for digits suggest that the second NIST database (used nominally for testing) is significantly harder than the first (training) set; the testing images are 11% more variant. The NIST training data classifies partitions of itself with 1.7% error, and the test set with 6.8% error. Conversely the test set generalizes to both itself and the training data with 3.5% error. This effect has also ben reported using non-NIST classifiers.