PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE
Proceedings Volume 7874, including the Title Page, Copyright
information, Table of Contents, Introduction (if any), and the
Conference Committee listing
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Field of Document Recognition is bipolar. On one end lies the excellent work of academic institutions
engaging in original research on scientifically interesting topics. On the other end lies the document recognition
industry which services needs for high-volume data capture for transaction and back-office applications. These
realms seldom meet, yet the need is great to address technical hurdles for practical problems using modern
approaches from the Document Recognition, Computer Vision, and Machine Learning disciplines. We reflect on
three categories of problems we have encountered which are both scientifically challenging and of high practical
value. These are Doctype Classification, Functional Role Labeling, and Document Sets. Doctype Classification
asks, "What is the type of page I am looking at?" Functional Role Labeling asks, "What is the status of
text and graphical elements in a model of document structure?" Document Sets asks, "How are pages and
their contents related to one another?" Each of these has ad hoc engineering approaches that provide 40-80%
solutions, and each of them begs for a deeply grounded formulation both to provide understanding and to attain
the remaining 20-60% of practical value. The practical need is not purely technical but also depends on the user
experience in application setup and configuration, and in collection and groundtruthing of sample documents.
The challenge therefore extends beyond the science behind document image recognition and into user interface
and user experience design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Authors of short papers such as letters or editorials often express complementary opinions, and sometimes contradictory
ones, on related work in previously published articles. The MEDLINE® citations for such short papers are required to
list bibliographic data on these "commented on" articles in a "CON" field. The challenge is to automatically identify the
CON articles referred to by the author of the short paper (called "Comment-in" or CIN paper). Our approach is to use
support vector machines (SVM) to first classify a paper as either a CIN or a regular full-length article (which is exempt
from this requirement), and then to extract from the CIN paper the bibliographic data of the CON articles. A solution to
the first part of the problem, identifying CIN articles, is addressed here. We implement and compare the performance of
two types of SVM, one with a linear kernel function and the other with a radial basis kernel function (RBF). Input
feature vectors for the SVMs are created by combining four types of features based on statistics of words in the article
title, words that suggest the article type (letter, correspondence, editorial), size of body text, and cue phrases.
Experiments conducted on a set of online biomedical articles show that the SVM with a linear kernel function yields a
significantly lower false negative error rate than the one with an RBF. Our experiments also show that the SVM with a
linear kernel function achieves a significantly higher level of accuracy, and lower false positive and false negative error
rates by using input feature vectors created by combining all four types of features rather than any single type.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an improvement to a document layout analysis system, offering a possible solution to Sayre's paradox ("a letter must be recognized before it can be segmented; and it must be segmented before it can be recognized"). This improvement, based on stochastic parsing, allows integration of statistical information, obtained from recognizers, during syntactic layout analysis. We present how this fusion of numeric and symbolic information in a feedback loop can be applied to syntactic methods to simplify document description. To limit combinatorial explosion during exploration of solutions, we devised an operator that allows optional activation of the stochastic parsing mechanism. Our evaluation on 1250 handwritten business letters shows this method allows the improvement of global recognition scores.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We report methodologies for computing high-recall masks for document image content extraction, that is, the
location and segmentation of regions containing handwriting, machine-printed text, photographs, blank space,
etc. The resulting segmentation is pixel-accurate, which accommodates arbitrary zone shapes (not merely rectangles).
We describe experiments showing that iterated classifiers can increase recall of all content types, with
little loss of precision. We also introduce two methodological enhancements: (1) a multi-stage voting rule; and (2)
a scoring policy that views blank pixels as a "don't care" class with other content classes. These enhancements
improve both recall and precision, achieving at least 89% recall and at least 87% precision among three content
types: machine-print, handwriting, and photo.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a novel method for extracting handwritten and printed text zones from noisy document
images with mixed content. We use Triple-Adjacent-Segment (TAS) based features which encode local shape
characteristics of text in a consistent manner. We first construct two codebooks of the shape features extracted
from a set of handwritten and printed text documents respectively. We then compute the normalized histogram
of codewords for each segmented zone and use it to train a Support Vector Machine (SVM) classifier. The
codebook based approach is robust to the background noise present in the image and TAS features are invariant
to translation, scale and rotation of text. In experiments, we show that a pixel-weighted zone classification
accuracy of 98% can be achieved for noisy Arabic documents. Further, we demonstrate the effectiveness of our
method for document page classification and show that a high precision can be achieved for the detection of
machine printed documents. The proposed method is robust to the size of zones, which may contain text content
at line or paragraph level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a Markov random field
(MRF) model with weighting parameters optimized by
conditional random field (CRF) for on-line
recognition of handwritten Japanese characters. The
model extracts feature points along the pen-tip trace
from pen-down to pen-up and sets each feature point
from an input pattern as a site and each state from a
character class as a label. It employs the coordinates
of feature points as unary features and the differences
in coordinates between the neighboring feature points
as binary features. The weighting parameters are
estimated by CRF or the minimum classification error
(MCE) method. In experiments using the TUAT
Kuchibue database, the method achieved a character
recognition rate of 92.77%, which is higher than the
previous model's rate, and the method of estimating
the weighting parameters using CRF was more
accurate than using MCE.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the trivial steps in the development of a classifier is the design of its architecture. This paper presents a
new algorithm, Multi Models Evolvement (MME) using Particle Swarm Optimization (PSO). This algorithm is
a modified version of the basic PSO, which is used to the unsupervised design of Hidden Markov Model (HMM)
based architectures. For instance, the proposed algorithm is applied to an Arabic handwriting recognizer based
on discrete probability HMMs. After the optimization of their architectures, HMMs are trained with the Baum-
Welch algorithm. The validation of the system is based on the IfN/ENIT database. The performance of the
developed approach is compared to the participating systems at the 2005 competition organized on Arabic
handwriting recognition on the International Conference on Document Analysis and Recognition (ICDAR). The
final system is a combination between an optimized HMM with 6 other HMMs obtained by a simple variation
of the number of states. An absolute improvement of 6% of word recognition rate with about 81% is presented.
This improvement is achieved comparing to the basic system (ARAB-IfN). The proposed recognizer outperforms
also most of the known state-of-the-art systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A SemiBoost-based character recognition method is introduced in order to incorporate the information of unlabeled
practical samples in training stage. One of the key problems in semi-supervised learning is the criteria of unlabeled
sample selection. In this paper, a criteria based on pair-wise sample similarity is adopted to guide the SemiBoost learning
process. At each time of iteration, unlabeled examples are selected and assigned labels. The selected samples are used
along with the original labeled samples to train a new classifier. The trained classifiers are integrated to make the final
classfier. An empirical study on several Arabic similar character pairs with different similarities shows that the proposed
method improves the performance as unlabeled samples reveal the distribution of practical samples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose in this paper a new online handwritten flowchart database and perform some first experiments to have a
baseline benchmark on this dataset. The collected database consists of 419 flowcharts labeled at the stroke and symbol
levels. In addition, an isolated database of graphical and text symbols was extracted from these collected flowcharts.
Then, we tackle the problem of online handwritten flowchart recognition from two different points of view. Firstly, we
consider that flowcharts are correctly segmented, and we propose different classifiers to perform two tasks, text/non-text
separation and graphical symbol recognition. Tested with the extracted isolated test database, we achieve up to 90% and
98% in text/non-text separation and up to 93.5% in graphical symbols recognition. Secondly, we propose a global
approach to perform flowchart segmentation and recognition. For this latter, we adopt a global learning schema and a
recognition architecture that considers a simultaneous segmentation and recognition. Global architecture is trained and
tested directly with flowcharts. Results show the interest of such global approach, but regarding the complexity of
flowchart segmentation problem, there is still lot of space to improve the global learning and recognition methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recognizing texts from images taken by mobile phones with low resolution has wide applications. It has been
shown that a good image binarization can substantially improve the performances of OCR engines. In this paper,
we present a framework to segment texts from outdoor images taken by mobile phones using color features. The
framework consists of three steps: (i) the initial process including image enhancement, binarization and noise
filtering, where we binarize the input images in each RGB channel, and apply component level noise filtering;
(ii) grouping components into blocks using color features, where we compute the component similarities by
dynamically adjusting the weights of RGB channels, and merge groups hierachically, and (iii) blocks selection,
where we use the run-length features and choose the Support Vector Machine (SVM) as the classifier.
We tested the algorithm using 13 outdoor images taken by an old-style LG-64693 mobile phone with 640x480
resolution. We compared the segmentation results with Tsar's algorithm, a state-of-the-art camera text detection
algorithm, and show that our algorithm is more robust, particularly in terms of the false alarm rates. In addition,
we also evaluated the impacts of our algorithm on the Abbyy's FineReader, one of the most popular commercial
OCR engines in the market.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new method to address the problem of handwritten text segmentation into text lines
and words. Thus, we propose a method based on the cooperation among points of view that enables the
localization of the text lines in a low resolution image, and then to associate the pixels at a higher level of
resolution. Thanks to the combination of levels of vision, we can detect overlapping characters and re-segment
the connected components during the analysis. Then, we propose a segmentation of lines into words based on the
cooperation among digital data and symbolic knowledge. The digital data are obtained from distances inside a
Delaunay graph, which gives a precise distance between connected components, at the pixel level. We introduce
structural rules in order to take into account some generic knowledge about the organization of a text page.
This cooperation among information gives a bigger power of expression and ensures the global coherence of the
recognition. We validate this work using the metrics and the database proposed for the segmentation contest of
ICDAR 2009. Thus, we show that our method obtains very interesting results, compared to the other methods
of the literature. More precisely, we are able to deal with slope and curvature, overlapping text lines and varied
kinds of writings, which are the main difficulties met by the other methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Page segmentation into text and non-text elements is an essential preprocessing step before optical character
recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage
characters due to the presence of non-text elements. This paper describes modifications to the text/non-text
segmentation algorithm presented by Bloomberg,1 which is also available in his open-source Leptonica library.2The modifications result in significant improvements and achieved better segmentation accuracy than the original
algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram
datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work presents an analytical study on the relevance of features in an existing framework for writer identification
from offline handwritten document images. The identification system comprises a set of 15 features combining the
orientation and curvature information in a writing with the well-known codebook based approach. This study aims to
find the optimal feature subset to identify the author of a questioned document while maintaining acceptable
identification rates. Employing a genetic algorithm with a wrapper method we carry out a feature selection mechanism
and identify the most relevant features that characterize the writer of a handwritten document.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Since real data is time-consuming and expensive to collect and label, researchers have proposed approaches using
synthetic variations for the tasks of signature verification, speaker authentication, handwriting recognition, keyword
spotting, etc. However, the limitation of real data is particularly critical in the field of writer identification
in that in forensics, adversaries cannot be expected to provide sufficient data to train a classifier. Therefore,
it is unrealistic to always assume sufficient real data to train classifiers extensively for writer identification. In
addition, this field differs from many others in that we strive to preserve as much inter-writer variations, but
model-perturbed handwriting might break such discriminability among writers. Building on work described in
another paper where human subjects were involved in calibrating realistic-looking transformation, we then measured
the effects of incorporating perturbed handwriting into the training dataset. Experimental results justified
our hypothesis that with limited real data, model-perturbed handwriting improved the performance of writer
identification. Particularly, if only one single sample for each writer was available, incorporating perturbed data
achieved a 36x performance gain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We provide a statistical basis for reporting the results of handwriting examination by questioned document (QD)
examiners. As a facet of Questioned Document (QD) examination, the analysis and reporting of handwriting
examination suffers from the lack of statistical data concerning the frequency of occurrence of combinations of
particular handwriting characteristics. QD examiners tend to assign probative values to specific handwriting
characteristics and their combinations based entirely on the examiner's experience and power of recall. The
research uses data bases of handwriting samples that are representative of the US population. Feature lists of
characteristics provided by QD examiners, are used to determine as to what frequencies need to be evaluated.
Algorithms are used to automatically extract those characteristics, e.g., a software tool for extracting most
of the characteristics from the most common letter pair th, is functional. For each letter combination the
marginal and conditional frequencies of their characteristics are evaluated. Based on statistical dependencies
of the characteristics the probability of any given letter formation is computed. The resulting algorithms are
incorporated into a system for writer verification known as CEDAR-FOX.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Two new methods for retrieving mathematical expressions using conventional keyword search and expression images are
presented. An expression-level TF-IDF (term frequency-inverse document frequency) approach is used for keyword search,
where queries and indexed expressions are represented by keywords taken from LATEX strings. TF-IDF is computed at the
level of individual expressions rather than documents to increase the precision of matching. The second retrieval technique
is a form of Content-Based Image Retrieval (CBIR). Expressions are segmented into connected components, and then
components in the query expression and each expression in the collection are matched using contour and density features,
aspect ratios, and relative positions. In an experiment using ten randomly sampled queries from a corpus of over 22,000
expressions, precision-at-k (k = 20) for the keyword-based approach was higher (keyword: μ = 84.0, σ = 19.0, imagebased:
μ = 32.0, σ = 30.7), but for a few of the queries better results were obtained using a combination of the two
techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A large amount of handwritten historical documents are located in libraries around the world. The desire to
access, search, and explore these documents paves the way for a new age of knowledge sharing and promotes
collaboration and understanding between human societies. Currently, the indexes for these documents are
generated manually, which is very tedious and time consuming. Results produced by state of the art techniques,
for converting complete images of handwritten documents into textual representations, are not yet sufficient.
Therefore, word-spotting methods have been developed to archive and index images of handwritten documents
in order to enable efficient searching within documents. In this paper, we present a new matching algorithm to be
used in word-spotting tasks for historical Arabic documents. We present a novel algorithm based on the Chamfer
Distance to compute the similarity between shapes of word-parts. Matching results are used to cluster images of
Arabic word-parts into different classes using the Nearest Neighbor rule. To compute the distance between two
word-part images, the algorithm subdivides each image into equal-sized slices (windows). A modified version
of the Chamfer Distance, incorporating geometric gradient features and distance transform data, is used as a
similarity distance between the different slices. Finally, the Dynamic Time Warping (DTW) algorithm is used
to measure the distance between two images of word-parts. By using the DTW we enabled our system to cluster
similar word-parts, even though they are transformed non-linearly due to the nature of handwriting. We tested
our implementation of the presented methods using various documents in different writing styles, taken from
Juma'a Al Majid Center - Dubai, and obtained encouraging results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Biomedical images are often referenced for clinical decision support (CDS), educational purposes, and research. They
appear in specialized databases or in biomedical publications and are not meaningfully retrievable using primarily textbased
retrieval systems. The task of automatically finding the images in an article that are most useful for the purpose of
determining relevance to a clinical situation is quite challenging. An approach is to automatically annotate images
extracted from scientific publications with respect to their usefulness for CDS. As an important step toward achieving
the goal, we proposed figure image analysis for localizing pointers (arrows, symbols) to extract regions of interest (ROI)
that can then be used to obtain meaningful local image content. Content-based image retrieval (CBIR) techniques can
then associate local image ROIs with identified biomedical concepts in figure captions for improved hybrid (text and
image) retrieval of biomedical articles.
In this work we present methods that make robust our previous Markov random field (MRF)-based approach for pointer
recognition and ROI extraction. These include use of Active Shape Models (ASM) to overcome problems in recognizing
distorted pointer shapes and a region segmentation method for ROI extraction.
We measure the performance of our methods on two criteria: (i) effectiveness in recognizing pointers in images, and (ii)
improved document retrieval through use of extracted ROIs. Evaluation on three test sets shows 87% accuracy in the
first criterion. Further, the quality of document retrieval using local visual features and text is shown to be better than
using visual features alone.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present
a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The
algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new
segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of
characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new
explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is
based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based
verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF
networks. We developed our algorithm for the processing of real-world documents where letters and digits may be
connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done
on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with
different types of layouts and levels of noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a method for automatically inferring the different page templates used to layout the document
content. The first step of the method consists in performing a logical analysis of the document. Depending of the
coverage of this step, a given number of document elements will be labeled. Then geometric relations are computed
between these labeled elements, and page templates candidates are generated using frequent related elements. A fuzzy
matching operation allows for selecting the most frequent and relevant page templates for a given document. Such page
templates can be used to correct errors produced during the different previous steps of the document analysis: zoning,
OCR, and logical analysis. Evaluation has been performed using the INEX book track collection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed,
highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font
recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain
hard problems, and many historical documents use fonts that are not available in digital forms. It is desirable
to be able to reconstruct fonts with vector glyphs that approximate the shapes of the letters that form a
font. In this work, we address the grouping of tokens in a token-compressed document into candidate fonts.
This permits us to incorporate font information into token-compressed images even when the original fonts are
unknown or unavailable in digital format. This paper extends previous work in font reconstruction by proposing
and evaluating an algorithm to assign a font to every character within a document. This is a necessary step
to represent a scanned document image with a reconstructed font. Through our evaluation method, we have
measured a 98.4% accuracy for the assignment of letters to candidate fonts in multi-font documents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Making datasets available for peer reviewing of published document analysis methods or distributing large commonly used document corpora for benchmarking are extremely useful and sound practices and initiatives. This paper shows that they cover only a very tiny segment of the uses shared and commonly available research data may have. We develop a completely new paradigm for sharing and accessing common data sets, benchmarks and other tools that is based on a very open and free community based contribution model. The model is operational and has been implemented so that it can be tested on a broad scale. The new interactions that will arise from its use may spark innovative ways of conducting document analysis research on the one hand, but create very challenging interactions with other research domains as well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to accurately recognize textual images of a book, we often employ various models including iconic
model (for character classification), dictionary (for word recognition), character segmentation model, etc.,
which are derived from prior knowledge. Imperfections in these models affect recognition performance inevitably.
In this paper, we propose an unsupervised learning technique that adapts multiple models on-the-fly
on a homogeneous input data set to achieve a better overall recognition accuracy fully automatically. The
major challenge for this unsupervised learning process is, how to make models improve rather than damage
one another? In our framework, models measure disagreements between their input data and output data.
We propose a policy based on disagreements to adapt multiple models simultaneously (or alternately) safely.
We will construct a book recognition system based on this framework, and demonstrate its feasibility.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article presents a way to evaluate the bleed-through defect on very old document images. We design
measures to quantify and evaluate the verso ink bleeding through the paper onto the recto side. Measuring the
bleed-through defect alows us to perform statistical analysis that are able to predict the feasibility of different
post-scan tasks. In this article we choose to illustrate our measures by creating two OCR error rate predicting
models based bleed-through evaluation. Two models are proposed, one for Abbyy FineReader * which is a very
power-full commercial OCR and OCRopus † which is sponsored by Google. Both prediction models appears to
be very accurate when calculating various statistic indicators.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Document binarization is one of the initial and critical steps for many document analysis systems. Nowadays,
with the success and popularity of hand-held devices, large efforts are motivated to convert documents into
digital format by using hand-held cameras. In this paper, we propose a Bayesian based maximum a posteriori
(MAP) estimation algorithm to binarize the camera-captured document images. A novel adaptive segmentation
surface estimation and normalization method is proposed as the preprocessing step in our work and followed by
a Markov Random Field based refine procedure to remove noises and smooth binarized result. Experimental
results show that our method has better performance than other algorithms on bad or uneven illumination
document images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In previous work , we proposed the application of the Expectation-Maximization (EM) algorithm in the binarization
of historical documents by defining a multi-resolution framework. In this work, we extend the multiresolution
framework to the Otsu algorithm for effective binarization of historical documents. We compare the
effectiveness of the EM based binarization technique to the Otsu thresholding algorithm on historical documents.
We demonstrate how the EM can be extended to perform an effective segmentation of historical documents by
taking into account multiple features beyond the intensity of the document image. Experimental results, analysis
and comparisons to known techniques are presented using the document image collection from the DIBCO 2009
contest.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Identifying figure captions has wide applications in producing high quality e-books such as kindle books or ipad
books. In this paper, we present a rule-based system to detect horizontal figure captions in old-style documents.
Our algorithm consists of three steps: (i) segment images into regions of different types such as text and figures,
(ii) search the best caption region candidate based on heuristic rules such as region alignments and distances,
and (iii) expand caption regions identified in step (ii) with its neighboring text-regions in order to correct oversegmentation
errors.
We test our algorithm using 81 images collected from old-style books, with each image containing at least
one figure area. We show that the approach is able to correctly detect figure captions from images with different
layouts, and we also measure its performances in terms of both precision rate and recall rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When reading electronic books on handheld devices, content sometimes should be reflowed and recomposed to adapt for
small-screen mobile devices. According to people's reading practice, it is reasonable to reflow the text content based on
paragraphs. Hence, this paper addresses the requirement and proposes a set of novel methods on paragraph recognition
for electronic books in PDF. The proposed methods consist of three steps, namely, physical structure analysis, paragraph
segmentation, and reading order detection. We make use of locally ordered property of PDF documents and layout style
of books to improve traditional page recognition results. In addition, we employ the optimal matching of Bipartite Graph
technology to detect paragraphs' reading order. Experiments show that our methods achieve high accuracy. It is
noteworthy that, the research has been applied in a commercial software package for Chinese E-book production.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present a procedure for removing ruling lines from a handwritten document image that does not require
any preprocessing or postprocessing tasks and it does not break existing characters. We take advantage of common
ruling line properties such as uniform width, predictable spacing, position vs. text, etc. The deletion procedure of the
detected ruling line is based on the fact that the coordinates of three collinear points have a determinant equal to zero.
The system is evaluated on synthetic page images in five different languages and is compared to a previous
methodology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Logos are considered valuable intellectual properties and a key component of the goodwill of a business. In
this paper, we propose a natural scene logo recognition method which is segmentation-free and capable of
processing images extremely rapidly and achieving high recognition rates. The classifiers for each logo are trained
jointly, rather than independently. In this way, common features can be shared across multiple classes for better
generalization. To deal with large range of aspect ratio of different logos, a set of salient regions of interest
(ROI) are extracted to describe each class. We ensure the selected ROIs to be both individually informative and
two-by-two weakly dependant by a Class Conditional Entropy Maximization criteria. Experimental results on a
large logo database demonstrate the effectiveness and efficiency of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a new system to enhance navigation inside digital corpora. This system is based on an
automatic indexation in image mode and provides the user intuitive navigation in interactive time. Keywords
and containers are extracted directly from the document images to create an Image Mode Index, which shows
the keywords as cut-out images of their actual appearances. Our approach recreates a summary of the structured
documents, following indications given by the creators of the document themselves. Our system is detailed in the
general case and sample applications on a 19th century handwritten corpus and a 18th century machine printed
text corpus are provided. This approach, developed for documents unreachable otherwise, can be applied on any
corpus where keywords and containers can be identified.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motivated by the widely accepted principle that the more training data, the better a recognition system performs,
we conducted experiments asking human subjects to do evaluate a mixture of real English handwritten text lines
and text lines altered from existing handwriting with various distortion degrees. The idea of generating synthetic
handwriting is based on a perturbation method by T. Varga and H. Bunke that distorts an entire text line. There
are two purposes of our experiments. First, we want to calibrate distortion parameter settings for Varga and
Bunke's perturbation model. Second, we intend to compare the effects of parameter settings on different writing
styles: block, cursive and mixed. From the preliminary experimental results, we determined appropriate ranges
for parameter amplitude, and found that parameter settings should be altered for different handwriting styles.
With the proper parameter settings, it should be possible to generate large amount of training and testing data
for building better off-line handwriting recognition systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Biomedical images are often referenced for clinical decision support (CDS), educational purposes, and research. The
task of automatically finding the images in a scientific article that are most useful for the purpose of determining
relevance to a clinical situation is traditionally done using text and is quite challenging. We propose to improve this
by associating image features from the entire image and from relevant regions of interest with biomedical concepts
described in the figure caption or discussion in the article. However, images used in scientific article figures are
often composed of multiple panels where each sub-figure (panel) is referenced in the caption using alphanumeric
labels, e.g. Figure 1(a), 2(c), etc. It is necessary to separate individual panels from a multi-panel figure as a first step
toward automatic annotation of images.
In this work we present methods that add make robust our previous efforts reported here. Specifically, we address
the limitation in segmenting figures that do not exhibit explicit inter-panel boundaries, e.g. illustrations, graphs, and
charts. We present a novel hybrid clustering algorithm based on particle swarm optimization (PSO) with fuzzy logic
controller (FLC) to locate related figure components in such images.
Results from our evaluation are very promising with 93.64% panel detection accuracy for regular (non-illustration)
figure images and 92.1% accuracy for illustration images. A computational complexity analysis also shows that PSO
is an optimal approach with relatively low computation time. The accuracy of separating these two type images is
98.11% and is achieved using decision tree.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a method for perspective distortion correction of rectangular documents. This scheme
exploits the orthogonality of the document edges, allowing to recover the aspect ratio of the original document.
The results obtained after correcting the perspective of several document images captured with a mobile phone
are compared with those achieved by digitizing the same documents with several scanner models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Document management systems have become important because of the growing popularity of electronic filing of
documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading
on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the
electronic documents for document retrieval. Since texts generated by OCR generally include character recognition
errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval
method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion
of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation
errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other
character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher
than that of the conventional method. However, the precision rate was 64% lower.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we describe a scheme to enhance the usability of a Tablet PC's handwriting recognition system by
including medical symbols that are not a part of the Tablet PC's symbol library. The goal of this work is to make
handwriting recognition more useful for medical professionals accustomed to using medical symbols in medical records.
To demonstrate that this new symbol recognition module is robust and expandable, we report results on both a medical
symbol set and an expanded symbol test set which includes selected mathematical symbols.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Photocopies of the ballots challenged in the 2008 Minnesota elections, which constitute a public
record, were scanned on a high-speed scanner and made available on a public radio website. The
PDF files were downloaded, converted to TIF images, and posted on the PERFECT website. Based
on a review of relevant image-processing aspects of paper-based election machinery and on
additional statistics and observations on the posted sample data, robust tools were developed for
determining the underlying grid of the targets on these ballots regardless of skew, clipping, and
other degradations caused by high-speed copying and digitization. The accuracy and robustness of
a method based on both index-marks and oval targets are demonstrated on 13,435 challenged
ballot page images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a novel method for document enhancement. The method is based on the combination of
two state-of-the-art filters through the construction of a mask. The mask is applied to a TV (Total Variation) -
regularized image where background noise has been reduced. The masked image is then filtered by NLmeans (Non
LocalMeans) which reduces the noise in the text areas located by the mask. The document images to be enhanced
are real historical documents from several periods which include several defects in their background. These defects
result from scanning, paper aging and bleed-through. We observe the improvement of this enhancement method
through OCR accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital libraries need more than just a retrieval based on keywords, which can be inefficient for some applications.
Thus, a document retrieval based on content of the digitized image version of the document can be a
more appropriated approach. This paper discusses the retrieval of document images by means of identifying a
variety of elements present in the document's image body. We propose a new strategy to identify and combine
features extracted from a document image. We also consider the task of constructing an optimized feature set to
improve the retrieval performance and to validate our experiments on an assorted database. Experimental results
show that the proposed segmentation together with a wisely feature combination increase the overall retrieval
performance. Moreover the retrieved images demonstrate the generality and effectiveness of our approach for an
efficient segmentation and classification of document images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Layout analysis is a crucial process for document image understanding and information retrieval. Document
layout analysis depends on page segmentation and block classification. This paper describes an algorithm for
extracting blocks from document images and a boosting based method to classify those blocks as machine printed
text or not. The feature vector which is fed into the boosting classifier consists of a four direction run-length
histogram, and connected components features in both background and foreground. Using a combination of
features through a boosting classifier, we obtain an accuracy of 99.5% on our test collection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While Optical Music Recognition (OMR) of modern printed and handwritten documents is considered a solved problem,
with many commercial systems available today, the OMR of ancient musical manuscripts still remains an open problem.
In this paper we present a system for the OMR of degraded western plainchant manuscripts in square notation from the
XIV to XVI centuries. The system has two main blocks, the first one deals with symbol extraction and recognition, while
the second one acts as an error detection stage for the first block outputs. For symbol extraction we use widely known
image-processing techniques, such as Sobel filtering and Hough Transform, and SVM for classification. The error
detection stage is implemented with a hidden Markov model (HMM), which takes advantage of a priori knowledge for
this specific kind of music.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.