Machine perception and recognition of handwritten text in any language is a difficult problem. Even for Latin script most solutions are restricted to specific domains like bank checks courtesy amount recognition. Arabic script presents additional challenges for handwriting recognition systems due to its highly connected nature,
numerous forms of each letter, and other factors. In this paper we address the problem of offline Arabic handwriting
recognition of pre-segmented words. Rather than focusing on a single classification approach and trying to perfect it, we propose to combine heterogeneous classification methodologies. We evaluate our system on the IFN/ENIT corpus of Tunisian village and town names and demonstrate that the combined approach yields results that are better than those of the individual classifiers.
Although modern OCR technology is capable of handling a wide variety of document images, there is no single
OCR engine that performs equally well on all documents for a given single language script. Naturally, each OCR
engine has its strengths and weaknesses, and therefore different engines tend to differ in the accuracy on different
documents, and in the errors on the same document image. While the idea of using multiple OCR engines
to boost output accuracy is not new, most of the existing systems do not go beyond variations on majority
voting. While this approach may work well in many cases, it has limitations, especially when OCR technology
used to process a given script has not yet fully matured. Our goal is to develop a system called MEMOE (for
"Multi-Evidence Multi-OCR-Engine") that combines, in an optimal or near-optimal way, output streams of
one or more OCR engines together with various types of evidence extracted from these streams as well as from
original document images, to produce output of higher quality than that of the individual OCR engines, or of
majority voting applied to multiple OCR output streams. Furthermore, we aim to improve the accuracy of OCR
output on images that might otherwise have low accuracy that significantly impacts downstream processing.
The MEMOE system functions as an OCR engine taking document images and some configuration parameters
as input and producing a single output text stream. In this paper, we describe the design of the system, various
evidence types and how they are incorporated into MEMOE in the form of filters. Results of initial tests that
involve two corpora of Arabic documents show that, even in its initial configuration, the system is superior to a
voting algorithm and that even more improvement may be achieved by incorporating additional evidence types
into the system.
This paper describes new capabilities of ImageRefiner, an automatic image enhancement system based on machine learning (ML). ImageRefiner was initially designed as a pre-OCR cleanup filter for bitonal (black-and-white) document images. Using a single neural network, ImageRefiner learned which image enhancement transformations (filters) were best suited for a given document image and a given OCR engine, based on various image measurements (characteristics). The new release improves ImageRefiner in three major ways. First, to process grayscale document images, we have included three grayscale filters based on smart thresholding and noise filtering, as well as five image characteristics that are all byproducts of various thresholding techniques. Second, we have implemented additional ML algorithms, including a neural network ensemble and several "all-pairs" classifiers. Third, we have introduced a measure that evaluates overall performance of the system in terms of cumulative improvement of OCR accuracy. Our experiments indicate that OCR accuracy on enhanced grayscale images is higher than that of both the original grayscale images and the corresponding bitonal images obtained by scanning the same documents. We have noticed that the system's performance may suffer when document characteristics are correlated.