In this paper, we present an Arabic handwriting recognition method based on recurrent neural network. We use the Long Short Term Memory (LSTM) architecture, that have proven successful in different printed and handwritten OCR tasks. Applications of LSTM for handwriting recognition employ the two-dimensional architecture to deal with the variations in both vertical and horizontal axis. However, we show that using a simple pre-processing step that normalizes the position and baseline of letters, we can make use of 1D LSTM, which is faster in learning and convergence, and yet achieve superior performance. In a series of experiments on IFN/ENIT database for Arabic handwriting recognition, we demonstrate that our proposed pipeline can outperform 2D LSTM networks. Furthermore, we provide comparisons with 1D LSTM networks trained with manually crafted features to show that the automatically learned features in a globally trained 1D LSTM network with our normalization step can even outperform such systems.
This work proposes several approaches that can be used for generating correspondences between real scanned
books and their transcriptions which might have different modifications and layout variations, also taking OCR
errors into account. Our approaches for the alignment between the manuscript and the transcription are based on
weighted finite state transducers (WFST). In particular, we propose adapted WFSTs to represent the transcription
to be aligned with the OCR lattices. The character-level alignment has edit rules to allow edit operations
(insertion, deletion, substitution). Those edit operations allow the transcription model to deal with OCR segmentation
and recognition errors, and also with the task of aligning with different text editions. We implemented
an alignment model with a hyphenation model, so it can adapt the non-hyphenated transcription. Our models
also work with Fraktur ligatures, which are typically found in historical Fraktur documents. We evaluated our
approach on Fraktur documents from Wanderungen durch die Mark Brandenburg" volumes (1862-1889) and
observed the performance of those models under OCR errors. We compare the performance of our model for
three different scenarios: having no information about the correspondence at the word (i), line (ii), sentence (iii)
or page (iv) level.
Symbol spotting is important for automatic interpretation of technical line drawings. Current spotting methods
are not reliable enough for such tasks due to low precision rates. In this paper, we combine a geometric matching-based
spotting method with an SVM classifier to improve the precision of the spotting. In symbol spotting, a
query symbol is to be located within a line drawing. Candidate matches can be found, however, the found
matches may be true or false. To distinguish a false match, an SVM classifier is used. The classifier is trained
on true and false matches of a query symbol. The matches are represented as vectors that indicate the qualities
of how well the query features are matched, those qualities are obtained via geometric matching. Using the
classification, the precision of the spotting improved from an average of 76.6% to an average of 97.2% on a
database of technical line drawings.
Symbol retrieval is important for content-based search in digital libraries and for automatic interpretation of
line drawings. In this work, we present a complete symbol retrieval system. The proposed system has an
off-line content-analysis stage, where the contents of a database of line drawings are represented as a symbol
index, which is a compact indexable representation of the database. Such representation allows efficient on-line
query retrieval. Within the retrieval system, three methods are presented. First, a feature grouping method for
identifying local regions of interest (ROIs) in the drawings. The found ROIs represent symbols' parts. Second,
a clustering method based on geometric matching, is used to cluster the similar parts from all the drawings
together. A symbol index is then constructed from the clusters' representatives. Finally, the ROIs of a query
symbol are matched to the clusters' representatives. The matching symbols' parts are retrieved from the clusters,
and spatial verification is performed on the matching parts. By using the symbol index we are able to achieve
a query look-up time that is independent of the database size, and dependent on the size of the symbol index.
The retrieval system achieves higher recall and precision than state-of-the-art methods.
Page segmentation into text and non-text elements is an essential preprocessing step before optical character
recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage
characters due to the presence of non-text elements. This paper describes modifications to the text/non-text
segmentation algorithm presented by Bloomberg,1 which is also available in his open-source Leptonica library.2The modifications result in significant improvements and achieved better segmentation accuracy than the original
algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram
Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed,
highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font
recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain
hard problems, and many historical documents use fonts that are not available in digital forms. It is desirable
to be able to reconstruct fonts with vector glyphs that approximate the shapes of the letters that form a
font. In this work, we address the grouping of tokens in a token-compressed document into candidate fonts.
This permits us to incorporate font information into token-compressed images even when the original fonts are
unknown or unavailable in digital format. This paper extends previous work in font reconstruction by proposing
and evaluating an algorithm to assign a font to every character within a document. This is a necessary step
to represent a scanned document image with a reconstructed font. Through our evaluation method, we have
measured a 98.4% accuracy for the assignment of letters to candidate fonts in multi-font documents.
In current study we examine how letter permutation affects in visual recognition of words for two orthographically
dissimilar languages, Urdu and German. We present the hypothesis that recognition or reading of permuted and
non-permuted words are two distinct mental level processes, and that people use different strategies in handling
permuted words as compared to normal words. A comparison between reading behavior of people in these
languages is also presented. We present our study in context of dual route theories of reading and it is observed
that the dual-route theory is consistent with explanation of our hypothesis of distinction in underlying cognitive
behavior for reading permuted and non-permuted words. We conducted three experiments in lexical decision
tasks to analyze how reading is degraded or affected by letter permutation. We performed analysis of variance
(ANOVA), distribution free rank test, and t-test to determine the significance differences in response time
latencies for two classes of data. Results showed that the recognition accuracy for permuted words is decreased
31% in case of Urdu and 11% in case of German language. We also found a considerable difference in reading
behavior for cursive and alphabetic languages and it is observed that reading of Urdu is comparatively slower
than reading of German due to characteristics of cursive script.
Wide availability of cheap high-quality printing techniques make document forgery an easy task that can easily be done
by most people using standard computer and printing hardware. To prevent the use of color laser printers or color copiers
for counterfeiting e.g. money or other valuable documents, many of these machines print Counterfeit Protection System
(CPS) codes on the page. These small yellow dots encode information about the specific printer and allow the questioned
document examiner in cooperation with the manufacturers to track down the printer that was used to generate the document.
However, the access to the methods to decode the tracking dots pattern is restricted. The exact decoding of a tracking pattern
is often not necessary, as tracking the pattern down to the printer class may be enough. In this paper we present a method
that detects what CPS pattern class was used in a given document. This can be used to specify the printer class that the
document was printed on. Evaluation proved an accuracy of up to 91%.
Detecting the correct orientation of document images is an important step in large scale digitization processes, as most
subsequent document analysis and optical character recognition methods assume upright position of the document page.
Many methods have been proposed to solve the problem, most of which base on ascender to descender ratio computation.
Unfortunately, this cannot be used for scripts having no descenders nor ascenders. Therefore, we present a trainable
method using character similarity to compute the correct orientation. A connected component based distance measure is
computed to compare the characters of the document image to characters whose orientation is known. This allows to detect
the orientation for which the distance is lowest as the correct orientation. Training is easily achieved by exchanging the
reference characters by characters of the script to be analyzed. Evaluation of the proposed approach showed accuracy of
above 99% for Latin and Japanese script from the public UW-III and UW-II datasets. An accuracy of 98.9% was obtained
for Fraktur on a non-public dataset. Comparison of the proposed method to two methods using ascender / descender ratio
based orientation detection shows a significant improvement.
In large scale scanning applications, orientation detection of the digitized page is necessary for the following
procedures to work correctly. Several existing methods for orientation detection use the fact that in Roman
script text, ascenders are more likely to occur than descenders. In this paper, we propose a different approach
for page orientation detection that uses this information. The main advantage of our method is that it is more
accurate than compared widely used methods, while being scan resolution independent. Another interesting
aspect of our method is that it can be combined with our previously published method for skew detection to
have a single-step skew and orientation estimate of the page image. We demonstrate the effectiveness of our
approach on the UW-I dataset and show that our method achieves an accuracy of above 99% on this dataset. We
also show that our method is robust to different scanning resolutions and can reliably detect page orientations
for documents rendered at 150, 200, 300, and 400 dpi.
OCRopus is a new, open source OCR system emphasizing modularity, easy extensibility, and reuse, aimed at both the research community and large scale commercial document conversions. This paper describes the current status of the system, its general architecture, as well as the major algorithms currently being used for layout analysis and text line recognition.
Adaptive binarization is an important first step in many document analysis and OCR processes. This paper
describes a fast adaptive binarization algorithm that yields the same quality of binarization as the Sauvola
method,1 but runs in time close to that of global thresholding methods (like Otsu's method2), independent of
the window size. The algorithm combines the statistical constraints of Sauvola's method with integral images.3
Testing on the UW-1 dataset demonstrates a 20-fold speedup compared to the original Sauvola algorithm.
We propose a simple and effective method for the dequantization of color images, effectively interpolating the colors from quantized levels to a continuous range of brightness values. The method is designed to be applied to images that either have undergone a manipulation like image brightness adjustment, or are going to be processed in such a way. Such operations often cause noticeable color bands in the images that can be reduced using the proposed Constrained Diffusion technique. We demonstrate the advantages of our method using synthetic and real life images as examples. We also present quantitative results using 8 bit data that has been obtained from original 12 bit sensor data and obtain substantial gains in PSNR using the proposed method.
Many document analysis and OCR systems depend on precise identification of page rotation, as well as the reliable identification of text lines. This paper presents a new algorithm to address both problems. It uses a branch-and-bound approach to globally optimal line finding and simultaneously models the baseline and the descender line under a Gaussian error/robust least square model. Results of applying the algorithm to documents in the University of Washington Database 2 are presented.
The paper re-examines a well-known technique in OCR, recognition by clustering followed by cryptanalysis, from a Bayesian perspective. The advantage of such techniques is that they are font-independent, but they appear not to have offered competitive performance with other pattern recognition techniques in the past. The analysis presented in this paper suggests an approach to OCR that is based on modeling the sample distribution as a mixture of Gaussians. Results suggest that such an approach may combine the advantages of cluster- based OCR with the performance of traditional classification algorithms.
QBICTM (Query By Image Content) is a set of technologies and associated software that allows a user to search, browse, and retrieve image, graphic, and video data from large on-line collections. This paper discusses current research directions of the QBIC project such as indexing for high-dimensional multimedia data, retrieval of gray level images, and storyboard generation suitable for video. It describes aspects of QBIC software including scripting tools, application interfaces, and available GUIs, and gives examples of applications and demonstration systems using it.
Functional programming is a style of programming that avoids the use of side effects (like assignment) and uses functions as first class data objects. Compared with imperative programs, functional programs can be parallelized better, and provide better encapsulation, type checking, and abstractions. This is important for building and integrating large vision software systems. In the past, efficiency has been an obstacle to the application of functional programming techniques in computationally intensive areas such as computer vision. We discuss and evaluate several 'functional' data structures for representing efficiently data structures and objects common in computer vision. In particular, we will address: automatic storage allocation and reclamation issues; abstraction of control structures; efficient sequential update of large data structures; representing images as functions; and object-oriented programming. Our experience suggests that functional techniques are feasible for high- performance vision systems, and that a functional approach simplifies the implementation and integration of vision systems greatly. Examples in C++ and SML are given.