PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE
Proceedings Volume 7534, including the Title Page, Copyright
information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Maps can be a great source of information for a given geographic region, but they can be difficult to find and
even harder to process. A significant problem is that many interesting and useful maps are only available in
raster format, and even worse many maps have been poorly scanned and they are often compressed with lossy
compression algorithms. Furthermore, for many of these maps there is no meta data providing the geographic
coordinates, scale, or projection. Previous research on map processing has developed techniques that typically
work on maps from a single map source. In contrast, we have developed a general approach to finding and
processing street maps. This includes techniques for discovering maps online, extracting geographic and textual
features from maps, using the extracted features to determine the geographic coordinates of the maps, and
aligning the maps with imagery. The resulting system can find, register, and extract a variety of features from
raster maps, which can then be used for various applications, such as annotating satellite imagery, creating and
updating maps, or constructing detailed gazetteers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we propose to combine two quite different approaches for retrieving handwritten documents. Our
hypothesis is that different retrieval algorithms should retrieve different sets of documents for the same query.
Therefore, significant improvements in retrieval performances can be expected. The first approach is based on
information retrieval techniques carried out on the noisy texts obtained through handwriting recognition, while
the second approach is recognition-free using a word spotting algorithm. Results shows that for texts having
a word error rate (WER) lower than 23%, the performances obtained with the combined system are close to
the performances obtained on clean digital texts. In addition, for poorly recognized texts (WER > 52%), an
improvement of nearly 17% can be observed with respect to the best available baseline method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
"Investigator Names" is a newly required field in MEDLINE citations. It consists of personal names listed as members
of corporate organizations in an article. Extracting investigator names automatically is necessary because of the
increasing volume of articles reporting collaborative biomedical research in which a large number of investigators
participate. In this paper, we present an SVM-based stacked sequential learning method in a novel application -
recognizing named entities such as the first and last names of investigators from online medical journal articles. Stacked
sequential learning is a meta-learning algorithm which can boost any base learner. It exploits contextual information by
adding the predicted labels of the surrounding tokens as features. We apply this method to tag words in text paragraphs
containing investigator names, and demonstrate that stacked sequential learning improves the performance of a nonsequential
base learner such as an SVM classifier.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present in this work a method to detect numbered sequences in a document. The method relies on the following
steps: first, all potential "numbered patterns" are automatically extracted from the document. Secondly, possible coherent
sequences are built using pattern incrementality (called incremental relation). Finally possible wrong links between items
are corrected using the notion of optimization context. An evaluation of the method is presented and weaknesses and
possible improvements are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information
from OCR text. Although the program finds data of birth information with high precision and recall, this type of
information extraction task seems to be negatively impacted by OCR errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the project Aware we aim to develop an automatic assistant for the detection of law infringements on web
pages. The motivation for this project is that many authors of web pages are at some points infringing copyrightor
other laws, mostly without being aware of that fact, and are more and more often confronted with costly legal
warnings.
As the legal environment is constantly changing, an important requirement of Aware is that the domain
knowledge can be maintained (and initially defined) by numerous legal experts remotely working without further
assistance of the computer scientists. Consequently, the software platform was chosen to be a web-based generic
toolbox that can be configured to suit individual analysis experts, definitions of analysis flow, information
gathering and report generation. The report generated by the system summarizes all critical elements of a given
web page and provides case specific hints to the page author and thus forms a new type of service.
Regarding the analysis subsystems, Aware mainly builds on existing state-of-the-art technologies. Their
usability has been evaluated for each intended task. In order to control the heterogeneous analysis components
and to gather the information, a lightweight scripting shell has been developed. This paper describes the analysis
technologies, ranging from text based information extraction, over optical character recognition and phonetic
fuzzy string matching to a set of image analysis and retrieval tools; as well as the scripting language to define
the analysis flow.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Handwriting has been proposed as a possible biometric for a number of years. However, recent work has
shown that handwritten passphrases are vulnerable to both human-based and machine-based forgeries. Pseudosignatures
as an alternative are designed to thwart such attacks while still being easy for users to create, remember,
and reproduce. In this paper, we briefly review the concept of pseudo-signatures, then describe an
evaluation framework that considers aspects of both usability and security. We present results from preliminary
experiments that examine user choice in creating pseudo-signatures and discuss the implications when sketching
is used for generating cryptographic keys.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scaling up document-image classifiers to handle an unlimited variety of document and image types poses serious
challenges to conventional trainable classifier technologies. Highly versatile classifiers demand representative
training sets which can be dauntingly large: in investigating document content extraction systems, we have
demonstrated the advantages of employing as many as a billion training samples in approximate k-nearest
neighbor (kNN) classifiers sped up using hashed K-d trees. We report here on an algorithm, which we call online
bin-decimation, for coping with training sets that are too big to fit in main memory, and we show empirically that
it is superior to offline pre-decimation, which simply discards a large fraction of the training samples at random
before constructing the classifier. The key idea of bin-decimation is to enforce an upper bound approximately on
the number of training samples stored in each K-d hash bin; an adaptive statistical technique allows this to be
accomplished online and in linear time, while reading the training data exactly once. An experiment on 86.7M
training samples reveals a 23-times speedup with less than 0.1% loss of accuracy (compared to pre-decimation);
or, for another value of the upper bound, a 60-times speedup with less than 5% loss of accuracy. We also compare
it to four other related algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an algorithm called CIPDEC (Content Integrity of Printed Documents using Error Correction),
which identifies any modifications made to a printed document. CIPDEC uses an error correcting code
for accurate detection of addition/deletion of even a few pixels. A unique advantage of CIPDEC is that it works
blind - it does not require the original document for such detection. Instead, it uses fiducial marks and error
correcting code parities. CIPDEC is also robust to paper-world artifacts like photocopying, annotations, stains,
folds, tears and staples. Furthermore, by working at a pixel level, CIPDEC is independent of language, font,
software, and graphics that are used to create paper documents. As a result, any changes made to a printed
document can be detected long after the software, font, and graphics have fallen out of use. The utility of
CIPDEC is illustrated in the context of tamper-proofing of printed documents and ink extraction for form-filling
applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a novel approach for the multi-oriented text line extraction from historical handwritten
Arabic documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image
paving algorithm that can progressively and locally determine the lines. The paving algorithm is initialized with
a small window and then its size is corrected by extension until enough lines and connected components were
found. We use the Snake for line extraction. Once the paving is established, the orientation is determined using
the Wigner-Ville distribution on the histogram projection profile. This local orientation is then enlarged to limit
the orientation in the neighborhood. Afterwards, the text lines are extracted locally in each zone basing on
the follow-up of the baselines and the proximity of connected components. Finally, the connected components
that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of
Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an
separation accuracy of about 98.6%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Document layout analysis is a key step in document image understanding with wide applications in document
digitization and reformatting. Identifying correct layout from noisy scanned images is especially challenging.
In this paper, we introduce a semi-supervised learning framework to detect text-lines from noisy document
images. Our framework consists of three steps. The first step is the initial segmentation that extracts text-lines
and images using simple morphological operations. The second step is a grouping-based layout analysis that
identifies text-lines, image zones, column separator and vertical border noise. It is able to efficiently remove the
vertical border noises from multi-column pages. The third step is an online classifier that is trained with the high
confidence line detection results from Step Two, and filters out noise from low confidence lines. The classifier
effectively removes speckle noises embedded inside the content zones.
We compare the performance of our algorithm to the state-of-the-art work in the field on the UW-III database.
We choose the results reported by the Image Understanding Pattern Recognition Research (IUPR) and Scansoft
Omnipage SDK 15.5. We evaluate the performances at both the page frame level and the text-line level. The
result shows that our system has much lower false-alarm rate, while maintains similar content detection rate. In
addition, we also show that our online training model generalizes better than algorithms depending on offline
training.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The OCR technology for Chinese historical documents is still an open problem. As these documents are hand-written or
hand-carved in various styles, overlapped and touching characters bring great difficulty for character segmentation
module. This paper presents an over-segmentation-based method to handle the overlapped and touching Chinese
characters in historic documents. The whole segmentation process includes two parts: over-segmented and segmenting
path optimization. In the former part, touching strokes will be found and segmented by analyzing the geometric
information of the white and black connected components. The segmentation cost of the touching strokes is estimated
with connected components' shape and location, as well as the touching stroke width. The latter part uses local
optimization dynamic programming to find best segmenting path. HMM is used to express the multiple choices of
segmenting paths, and Viterbi algorithm is used to search local optimal solution. Experimental results on practical
Chinese documents show the proposed method is effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed several technologies for protecting automated teller machines. These technologies are based mainly
on pattern recognition and are used to implement various self-defence functions. They include (i) banknote recognition
and information retrieval for preventing machines from accepting counterfeit and damaged banknotes and for retrieving
information about detected counterfeits from a relational database, (ii) form processing and character recognition for
preventing machines from accepting remittance forms without due dates and/or insufficient payment, (iii) person
identification to prevent machines from transacting with non-customers, and (iv) object recognition to guard machines
against foreign objects such as spy cams that might be surreptitiously attached to them and to protect users against
someone attempting to peek at their user information such as their personal identification number. The person
identification technology has been implemented in most ATMs in Japan, and field tests have demonstrated that the
banknote recognition technology can recognise more then 200 types of banknote from 30 different countries. We are
developing an "advanced intelligent ATM" that incorporates all of these technologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In previous work we showed that shape descriptor features can be used in Look Up Table (LUT) classifiers to
learn patterns of degradation and correction in historical document images. The algorithm encodes the pixel
neighborhood information effectively using a variant of shape descriptor. However, the generation of the shape
descriptor features was approached in a heuristic manner. In this work, we propose a system of learning the
shape features from the training data set by using neural networks: Multilayer Perceptrons (MLP) for feature
extraction. Given that the MLP maybe restricted by a limited dataset, we apply a feature selection algorithm to
generalize, and thus improve, the feature set obtained from the MLP. We validate the effectiveness and efficiency
of the proposed approach via experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Quality of camera-based whiteboard images is highly related to the light environment and the writing effect of the
content. Specular reflection and low contrast reduce the readability of captured whiteboard images frequently. A
novel method is proposed to enhance camera-based whiteboard images in this paper. The images are enhanced
by removing the highlight specular reflection to improve the visibility and emphasizing the content to improve
the readability of the whiteboards. The method can be practically embedded in mobile devices with image
capturing cameras.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The effects of different image pre-processing methods for document image binarization are explored. They are
compared on five different binarization methods on images with bleed through and stains as well as on images
with uniform background speckle. The binarization method is significant in the binarization accuracy, but
the pre-processing also plays a significant role. The Total Variation method of pre-processing shows the best
performance over a variety of pre-processing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an HMM-based recognizer for the off-line recognition of handwritten words. Word models
are the concatenation of context-dependent character models (trigraphs). The trigraph models we consider are
similar to triphone models in speech recognition, where a character adapts its shape according to its adjacent
characters. Due to the large number of possible context-dependent models to compute, a top-down clustering is
applied on each state position of all models associated with a particular character. This clustering uses decision
trees, based on rhetorical questions we designed. Decision trees have the advantage to model untrained trigraphs.
Our system is shown to perform better than a baseline context independent system, and reaches an accuracy
higher than 74% on the publicly available Rimes database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We create a polyfont OCR recognizer using HMM (Hidden Markov models) models of character trained on a dataset of
various fonts. We compare this system to monofont recognizers showing its decrease of performance when it is used to
recognize unseen fonts. In order to fill this gap of performance, we adapt the parameters of the models of the polyfont
recognizer to a new dataset of unseen fonts using four different adaptation algorithms. The results of our experiments
show that the adapted system is far more accurate than the initial system although it does not reach the accuracy of a
monofont recognizer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reducing the time complexity of character matching is critical to the development of efficient Japanese Optical
Character Recognition (OCR) systems. To shorten processing time, recognition is usually split into separate preclassification
and recognition stages. For high overall recognition performance, the pre-classification stage must both
have very high classification accuracy and return only a small number of putative character categories for further
processing. Furthermore, for any practical system, the speed of the pre-classification stage is also critical. The
associative matching (AM) method has often been used for fast pre-classification, because its use of a hash table and
reliance solely on logical bit operations to select categories makes it highly efficient. However, redundant certain level of
redundancy exists in the hash table because it is constructed using only the minimum and maximum values of the data
on each axis and therefore does not take account of the distribution of the data. We propose a modified associative
matching method that satisfies the performance criteria described above but in a fraction of the time by modifying the
hash table to reflect the underlying distribution of training characters. Furthermore, we show that our approach
outperforms pre-classification by clustering, ANN and conventional AM in terms of classification accuracy,
discriminative power and speed. Compared to conventional associative matching, the proposed approach results in a
47% reduction in total processing time across an evaluation test set comprising 116,528 Japanese character images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, we have investigated the use of Arabic linguistic knowledge to improve the recognition of wide Arabic word lexicon. A neural-linguistic approach was proposed to mainly deal with canonical vocabulary of decomposable words derived from tri-consonant healthy roots. The basic idea is to factorize words by their roots and schemes. In this direction, we conceived two neural networks TNN_R and TNN_S to respectively recognize roots and schemes from structural primitives of words. The proposal approach achieved promising results. In this paper, we will focus on how to reach better results in terms of accuracy and recognition rate. Current improvements concern especially the training stage. It is about 1) to benefit from word letters order 2) to consider "sisters letters" (letters having same features), 3) to supervise networks behaviors, 4) to split up neurons to save letter occurrences and 5) to solve observed ambiguities. Considering theses improvements, experiments carried on 1500 sized vocabulary show a significant enhancement: TNN_R (resp. TNN_S) top4 has gone up from 77% to 85.8% (resp. from 65% to 97.9%). Enlarging the vocabulary from 1000 to 1700, adding 100 words each time, again confirmed the results without altering the networks stability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe a technique of linguistic post-processing of whole-book recognition results. Whole-book recognition is a
technique that improves recognition of book images using fully automatic cross-entropy-based model adaptation. In previous
published works, word recognition was performed on individual words separately, without awaring passage-level information
such as word-occurrence frequencies. Therefore, some rare words in real texts may appear much more often in recognition
results; vice versa. Differences between word frequencies in recognition results and in prior knowledge may indicate recognition
errors on a long passage. In this paper, we propose a post-processing technique to enhance whole-book recognition
results by minimizing differences between word frequencies in recognition results and prior word frequencies. This technique
works better when operating on longer passages, and it drives the character error rate down 20% from 1.24% to 0.98% in a
90-page experiment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The language model design and implementation issue is researched in this paper. Different from previous research,
we want to emphasize the importance of n-gram models based on words in the study of language model. We build up a
word based language model using the toolkit of SRILM and implement it for contextual language processing on Chinese
documents. A modified Absolute Discount smoothing algorithm is proposed to reduce the perplexity of the language
model. The word based language model improves the performance of post-processing of online handwritten character
recognition system compared with the character based language model, but it also increases computation and storage
cost greatly. Besides quantizing the model data non-uniformly, we design a new tree storage structure to compress the
model size, which leads to an increase in searching efficiency as well. We illustrate the set of approaches on a test corpus
of recognition results of online handwritten Chinese characters, and propose a modified confidence measure for
recognition candidate characters to get their accurate posterior probabilities while reducing the complexity. The weighted
combination of linguistic knowledge and candidate confidence information proves successful in this paper and can be
further developed to achieve improvements in recognition accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present an OCR validation module, implemented for the System for Preservation of Electronic Resources
(SPER) developed at the U.S. National Library of Medicine.1 The module detects and corrects suspicious words in the OCR
output of scanned textual documents through a procedure of deriving partial formats for each suspicious word, retrieving
candidate words by partial-match search from lexicons, and comparing the joint probabilities of N-gram and OCR edit
transformation corresponding to the candidates. The partial format derivation, based on OCR error analysis, efficiently
and accurately generates candidate words from lexicons represented by ternary search trees. In our test case comprising a
historic medico-legal document collection, this OCR validation module yielded the correct words with 87% accuracy and
reduced the overall OCR word errors by around 60%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Over the last century forensic document science has developed progressively more sophisticated pattern recognition
methodologies for ascertaining the authorship of disputed documents. These include advances not only
in computer assisted stylometrics, but forensic handwriting analysis. We present a writer verification method
and an evaluation of an actual historical document written by an unknown writer. The questioned document
is compared against two known handwriting samples of Herman Melville, a 19th century American author who
has been hypothesized to be the writer of this document. The comparison led to a high confidence result that
the questioned document was written by the same writer as the known documents. Such methodology can be
applied to many such questioned documents in historical writing, both in literary and legal fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A method for text block detection is introduced for old handwritten documents. The proposed method takes
advantage of sequential book structure, taking into account layout information from pages previously transcribed.
This glance at the past is used to predict the position of text blocks in the current page with the help of
conventional layout analysis methods. The method is integrated into the GIDOC prototype: a first attempt to
provide integrated support for interactive-predictive page layout analysis, text line detection and handwritten
text transcription. Results are given in a transcription task on a 764-page Spanish manuscript from 1891.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Collections of documents are sets of heterogeneous documents, like a specific ancient book series, having proper
structural and semantic properties linking them. A particular collection contains document images with specific
physical layouts, like text pages or full-page illustrations, appearing in a specific order. Its contents, like journal
articles, may be shared by several pages, not necessary following, producing strong dependencies between pages
interpretations. In order to build an analysis system which can bring contextual information from the collection
to the appropriate recognition modules for each page, we propose to express the structural and the semantic
properties of a collection with a definite clause grammar. This is made possible by representing collections as
streams of document images, and by using extensions to the formalism we present here. We are then able to
automatically generate a parser dedicated to a collection. Beside allowing structural variations and complex
information flows, we also show that this approach enables the design of analysis stages, on a document or
a set of documents. The interest of context usage is illustrated with several examples and their appropriate
formalization in this framework.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Figures inserted in documents mediate a kind of information for which the visual modality is more appropriate than the
text. A complete understanding of a figure often necessitates the reading of its caption or to establish a relationship with
the main text using a numbered figure identifier which is replicated in the caption and in the main text. A figure and its
caption are closely related; they constitute single multimodal components (FC-pair) that Document Image Analysis
cannot extract with text and graphics segmentation. We propose a method to go further than the graphics and text
segmentation in order to extract FC-pairs without performing a full labelling of the page components. Horizontal and
vertical text lines are detected in the pages. The graphics are associated with selected text lines to initiate the detector of
FC-pairs. Spatial and visual disorders are introduced to define a layout model in terms of properties. It enables to cope
with most of the numerous spatial arrangements of graphics and text lines. The detector of FC-pairs performs operations
in order to eliminate the layout disorder and assigns a quality value to each FC-pair. The processed documents were
collected in medic@, the digital historical collection of the BIUM (Bibliothèque InterUniversitaire Médicale). A first set
of 98 pages constitutes the design set. Then 298 pages were collected to evaluate the system. The performances are the
result of a full process, from the binarisation of the digital images to the detection of FC-pairs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Large degradations in document images impede their readability as well as substantially deteriorating the performance
of automated document processing systems. Image quality metrics have been defined to correlate with
OCR accuracy. However, this does not always correlate with human perception of image quality. When enhancing
document images with the goal of improving readability, it is important to understand human perception
of quality. The goal of this work is to evaluate human perception of degradation and correlate it to known
degradation parameters and existing image quality metrics. The information captured enables the learning and
estimation of human perception of document image quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes two classifiers, Naïve Bayes and Support Vector Machine (SVM), to classify sentences containing
Databank Accession Numbers, a key piece of bibliographic information, from online biomedical articles. The correct
identification of these sentences is necessary for the subsequent extraction of these numbers. The classifiers use words
that occur most frequently in sentences as features for the classification. Twelve sets of word features are collected to train
and test the classifiers. Each set has a different number of word features ranging from 100 to 1,200. The performance of
each classifier is evaluated using four measures: Precision, Recall, F-Measure, and Accuracy. The Naïve Bayes classifier
shows performance above 93.91% at 200 word features for all four measures. The SVM shows 98.80% Precision at 200
word features, 94.90% Recall at 500 and 700, 96.46% F-Measure at 200, and 99.14% Accuracy at 200 and 400. To
improve classification performance, we propose two merging operators, Max and Harmonic Mean, to combine results of
the two classifiers. The final results show a measureable improvement in Recall, F-Measure, and Accuracy rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Biomedical images are invaluable in establishing diagnosis, acquiring technical skills, and implementing best practices in
many areas of medicine. At present, images needed for instructional purposes or in support of clinical decisions appear in
specialized databases and in biomedical articles, and are often not easily accessible to retrieval tools. Our goal is to
automatically annotate images extracted from scientific publications with respect to their usefulness for clinical decision
support and instructional purposes, and project the annotations onto images stored in databases by linking images
through content-based image similarity.
Authors often use text labels and pointers overlaid on figures and illustrations in the articles to highlight regions of
interest (ROI). These annotations are then referenced in the caption text or figure citations in the article text. In previous
research we have developed two methods (a heuristic and dynamic time warping-based methods) for localizing and
recognizing such pointers on biomedical images. In this work, we add robustness to our previous efforts by using a
machine learning based approach to localizing and recognizing the pointers. Identifying these can assist in extracting
relevant image content at regions within the image that are likely to be highly relevant to the discussion in the article
text. Image regions can then be annotated using biomedical concepts from extracted snippets of text pertaining to images
in scientific biomedical articles that are identified using National Library of Medicine's Unified Medical Language
System® (UMLS) Metathesaurus. The resulting regional annotation and extracted image content are then used as indices
for biomedical article retrieval using the multimodal features and region-based content-based image retrieval (CBIR)
techniques. The hypothesis that such an approach would improve biomedical document retrieval is validated through
experiments on an expert-marked biomedical article dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detecting the correct orientation of document images is an important step in large scale digitization processes, as most
subsequent document analysis and optical character recognition methods assume upright position of the document page.
Many methods have been proposed to solve the problem, most of which base on ascender to descender ratio computation.
Unfortunately, this cannot be used for scripts having no descenders nor ascenders. Therefore, we present a trainable
method using character similarity to compute the correct orientation. A connected component based distance measure is
computed to compare the characters of the document image to characters whose orientation is known. This allows to detect
the orientation for which the distance is lowest as the correct orientation. Training is easily achieved by exchanging the
reference characters by characters of the script to be analyzed. Evaluation of the proposed approach showed accuracy of
above 99% for Latin and Japanese script from the public UW-III and UW-II datasets. An accuracy of 98.9% was obtained
for Fraktur on a non-public dataset. Comparison of the proposed method to two methods using ascender / descender ratio
based orientation detection shows a significant improvement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a technique for the logical labelling of document images. It makes use of a decision-tree based
approach to learn and then recognise the logical elements of a page. A state-of-the-art OCR gives the physical
features needed by the system. Each block of text is extracted during the layout analysis and raw physical
features are collected and stored in the ALTO format. The data-mining method employed here is the "Improved
CHi-squared Automatic Interaction Detection" (I-CHAID). The contribution of this work is the insertion of
logical rules extracted from the logical layout knowledge to support the decision tree. Two setups have been
tested; the first uses one tree per logical element, the second one uses a single tree for all the logical elements
we want to recognise. The main system, implemented in Java, coordinates the third-party tools (Omnipage
for the OCR part, and SIPINA for the I-CHAID algorithm) using XML and XSL transforms. It was tested
on around 1000 documents belonging to the ICPR'04 and ICPR'08 conference proceedings, representing about
16,000 blocks. The final error rate for determining the logical labels (among 9 different ones) is less than 6%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper analyzes the size characteristics of character recognition domain with the aim of developing a feature
selection algorithm adequate for the domain. Based on the results, we further analyze the timing requirements of three
popular feature selection algorithms, greedy algorithm, genetic algorithm, and ant colony optimization. For a rigorous
timing analysis, we adopt the concept of atomic operation. We propose a novel scheme called selective evaluation to
improve convergence of ACO. The scheme cut down the computational load by excluding the evaluation of unnecessary
or less promising candidate solutions. The scheme is realizable in ACO due to the valuable information, pheromone trail
which helps identify those solutions. Experimental results showed that the ACO with selective evaluation was promising
both in timing requirement and recognition performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the analysis of handwriting in documents a central task is that of determining line structure of the text,
e.g., number of text lines, location of their starting and end-points, line-width, etc. While simple methods can
handle ideal images, real world documents have complexities such as overlapping line structure, variable line
spacing, line skew, document skew, noisy or degraded images etc. This paper explores the application of the
Hough transform method to handwritten documents with the goal of automatically determining global document
line structure in a top-down manner which can then be used in conjunction with a bottom-up method such as
connected component analysis. The performance is significantly better than other top-down methods, such as
the projection profile method. In addition, we evaluate the performance of skew analysis by the Hough transform
on handwritten documents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a hybrid symbol classifier within a global framework for online handwritten mathematical
expression recognition. The proposed architecture aims at handling mathematical expression recognition as a
simultaneous optimization of symbol segmentation, symbol recognition, and 2D structure recognition under the
restriction of a mathematical expression grammar. To deal with the junk problem encountered when a segmentation
graph approach is used, we consider a two level classifier. A symbol classifier cooperates with a second classifier
specialized to accept or reject a segmentation hypothesis. The proposed system is trained with a set of synthetic online
handwritten mathematical expressions. When tested on a set of real complex expressions, the system achieves promising
results at both symbol and expression interpretation levels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed an online pinyin recognition system which combined hmm method and statistic method together.
Pinyin recognition is useful for those who may forget how to write a certain Chinese character but know how to
pronounce it. We combined HMM model and statistic model to segment a word and recognize it. We have achieved a
writer-independent accuracy of 91.37% for 17745 unconstrained-style Pinyin syllables.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.