Many preprocessing techniques have been proposed for isolated word recognition. However, recently, recognition systems have dealt with text blocks and their compound text lines. In this paper, we propose a new preprocessing approach to efficiently correct baseline skew and fluctuations. Our approach is based on a sliding window within which the vertical position of the baseline is estimated. Segmentation of text lines into subparts is, thus, avoided. Experiments conducted on a large publicly available database (Rimes), with a BLSTM (bidirectional long short-term memory) recurrent neural network recognition system, show that our baseline correction approach highly improves performance.
Handwriting recognition systems are typically trained using publicly available databases, where data have been
collected in controlled conditions (image resolution, paper background, noise level,...). Since this is not often
the case in real-world scenarios, classification performance can be affected when novel data is presented to the
word recognition system. To overcome this problem, we present in this paper a new approach called database
adaptation. It consists of processing one set (training or test) in order to adapt it to the other set (test or training,
respectively). Specifically, two kinds of preprocessing, namely stroke thickness normalization and pixel intensity
normalization are considered. The advantage of such approach is that we can re-use the existing recognition
system trained on controlled data. We conduct several experiments with the Rimes 2011 word database and
with a real-world database. We adapt either the test set or the training set. Results show that training set
adaptation achieves better results than test set adaptation, at the cost of a second training stage on the adapted
data. Accuracy of data set adaptation is increased by 2% to 3% in absolute value over no adaptation.
This paper presents a system for the recognition of unconstrained handwritten mails. The main part of this
system is an HMM recognizer which uses trigraphs to model contextual information. This recognition system
does not require any segmentation into words or characters and directly works at line level. To take into account
linguistic information and enhance performance, a language model is introduced. This language model is based
on bigrams and built from training document transcriptions only. Different experiments with various vocabulary
sizes and language models have been conducted. Word Error Rate and Perplexity values are compared to show
the interest of specific language models, fit to handwritten mail recognition task.
We present in this paper an HMM-based recognizer for the recognition of unconstrained Arabic handwritten words.
The recognizer is a context-dependent HMM which considers variable topology and contextual information for a better modeling of writing units.
We propose an algorithm to adapt the topology of each HMM to the character to be modeled.
For modeling the contextual units, a state-tying process based on decision tree clustering is introduced which significantly reduces the number of parameters.
Decision trees are built according to a set of expert-based questions on how characters are written.
Questions are divided into global questions yielding larger clusters and precise questions yielding smaller ones.
We apply this modeling to the recognition of Arabic handwritten words.
Experiments conducted on the OpenHaRT2010 database show that variable length topology and contextual information significantly improves the recognition rate.
This paper proposes a novel method for document enhancement. The method is based on the combination of
two state-of-the-art filters through the construction of a mask. The mask is applied to a TV (Total Variation) -
regularized image where background noise has been reduced. The masked image is then filtered by NLmeans (Non
LocalMeans) which reduces the noise in the text areas located by the mask. The document images to be enhanced
are real historical documents from several periods which include several defects in their background. These defects
result from scanning, paper aging and bleed-through. We observe the improvement of this enhancement method
through OCR accuracy.
The effects of different image pre-processing methods for document image binarization are explored. They are
compared on five different binarization methods on images with bleed through and stains as well as on images
with uniform background speckle. The binarization method is significant in the binarization accuracy, but
the pre-processing also plays a significant role. The Total Variation method of pre-processing shows the best
performance over a variety of pre-processing methods.
This paper presents an HMM-based recognizer for the off-line recognition of handwritten words. Word models
are the concatenation of context-dependent character models (trigraphs). The trigraph models we consider are
similar to triphone models in speech recognition, where a character adapts its shape according to its adjacent
characters. Due to the large number of possible context-dependent models to compute, a top-down clustering is
applied on each state position of all models associated with a particular character. This clustering uses decision
trees, based on rhetorical questions we designed. Decision trees have the advantage to model untrained trigraphs.
Our system is shown to perform better than a baseline context independent system, and reaches an accuracy
higher than 74% on the publicly available Rimes database.
We investigate in this paper the combination of DBN (Dynamic Bayesian Network) classifiers, either
independent or coupled, for the recognition of degraded characters. The independent classifiers are a
vertical HMM and a horizontal HMM whose observable outputs are the image columns and the image
rows respectively. The coupled classifiers, presented in a previous study, associate the vertical and
horizontal observation streams into single DBNs. The scores of the independent and coupled classifiers
are then combined linearly at the decision level. We compare the different classifiers -independent,
coupled or linearly combined- on two tasks: the recognition of artificially degraded handwritten digits
and the recognition of real degraded old printed characters. Our results show that coupled DBNs
perform better on degraded characters than the linear combination of independent HMM scores. Our
results also show that the best classifier is obtained by linearly combining the scores of the best coupled
DBN and the best independent HMM.
In this paper we present a system for the off-line recognition of cursive Arabic handwritten words. This system
in an enhanced version of our reference system presented in [El-Hajj et al., 05] which is based on Hidden Markov
Models (HMMs) and uses a sliding window approach. The enhanced version proposed here uses contextual
character models. This approach is motivated by the fact that the set of Arabic characters includes a lot of ascending
and descending strokes which overlap with one or two neighboring characters. Additional character models are
constructed according to characters in their left or right neighborhood. Our experiments on images of the benchmark
IFN/ENIT database of handwritten villages/towns names show that using contextual character models improves
recognition. For a lexicon of 306 name classes, accuracy is increased by 0.6% in absolute value which corresponds
to a 7.8% reduction in error rate.
We investigate in this paper the application of dynamic Bayesian networks (DBNs) to the recognition
of handwritten digits. The main idea is to couple two separate HMMs into various architectures. First,
a vertical HMM and a horizontal HMM are built observing the evolving streams of image columns and
image rows respectively. Then, two coupled architectures are proposed to model interactions between
these two streams and to capture the 2D nature of character images. Experiments performed on the
MNIST handwritten digit database show that coupled architectures yield better recognition performances
than non-coupled ones. Additional experiments conducted on artificially degraded (broken)
characters demonstrate that coupled architectures better cope with such degradation than non coupled
ones and than discriminative methods such as SVMs.
A method for extracting names in degraded documents is presented in this article. The documents targeted are images of photocopied scientific journals from various scientific domains. Due to the degradation, there is poor OCR recognition, and pieces of other articles appear on the sides of the image. The proposed approach relies on the combination of a low-level textual analysis and an image-based analysis. The textual analysis extracts robust typographic features, while the image analysis selects image regions of interest through anchor components. We report results on the University of Washington benchmark database.