Forensic identification is the task of determining whether or not observed evidence arose from a known source. It involves
determining a likelihood ratio (LR) – the ratio of the joint probability of the evidence and source under the identification
hypothesis (that the evidence came from the source) and under the exclusion hypothesis (that the evidence did not arise from
the source). In LR- based decision methods, particularly handwriting comparison, a variable number of input evidences is
used. A decision based on many pieces of evidence can result in nearly the same LR as one based on few pieces of evidence.
We consider methods for distinguishing between such situations. One of these is to provide confidence intervals together
with the decisions and another is to combine the inputs using weights. We propose a new method that generalizes the
Bayesian approach and uses an explicitly defined discount function. Empirical evaluation with several data sets including
synthetically generated ones and handwriting comparison shows greater flexibility of the proposed method.
We provide a statistical basis for reporting the results of handwriting examination by questioned document (QD)
examiners. As a facet of Questioned Document (QD) examination, the analysis and reporting of handwriting
examination suffers from the lack of statistical data concerning the frequency of occurrence of combinations of
particular handwriting characteristics. QD examiners tend to assign probative values to specific handwriting
characteristics and their combinations based entirely on the examiner's experience and power of recall. The
research uses data bases of handwriting samples that are representative of the US population. Feature lists of
characteristics provided by QD examiners, are used to determine as to what frequencies need to be evaluated.
Algorithms are used to automatically extract those characteristics, e.g., a software tool for extracting most
of the characteristics from the most common letter pair th, is functional. For each letter combination the
marginal and conditional frequencies of their characteristics are evaluated. Based on statistical dependencies
of the characteristics the probability of any given letter formation is computed. The resulting algorithms are
incorporated into a system for writer verification known as CEDAR-FOX.
In the analysis of handwriting in documents a central task is that of determining line structure of the text,
e.g., number of text lines, location of their starting and end-points, line-width, etc. While simple methods can
handle ideal images, real world documents have complexities such as overlapping line structure, variable line
spacing, line skew, document skew, noisy or degraded images etc. This paper explores the application of the
Hough transform method to handwritten documents with the goal of automatically determining global document
line structure in a top-down manner which can then be used in conjunction with a bottom-up method such as
connected component analysis. The performance is significantly better than other top-down methods, such as
the projection profile method. In addition, we evaluate the performance of skew analysis by the Hough transform
on handwritten documents.
Over the last century forensic document science has developed progressively more sophisticated pattern recognition
methodologies for ascertaining the authorship of disputed documents. These include advances not only
in computer assisted stylometrics, but forensic handwriting analysis. We present a writer verification method
and an evaluation of an actual historical document written by an unknown writer. The questioned document
is compared against two known handwriting samples of Herman Melville, a 19th century American author who
has been hypothesized to be the writer of this document. The comparison led to a high confidence result that
the questioned document was written by the same writer as the known documents. Such methodology can be
applied to many such questioned documents in historical writing, both in literary and legal fields.
A novel statistical model for determining whether a pair of documents, a known and a questioned, were written
by the same individual is proposed. The goal of this formulation is to learn the specific uniqueness of style in a
particular author's writing, given the known document. Since there are often insufficient samples to extrapolate
a generalized model of an writer's handwriting based solely on the document, we instead generalize over the
differences between the author and a large population of known different writers. This is in contrast to an earlier
model proposed whereby probability distributions were a priori without learning. We show the performance of
the model along with a comparison in performance to the non-learning, older model, which shows significant
Many governments have some form of "direct democracy" legislation procedure whereby individual citizens can
propose various measures creating or altering laws. Generally, such a process is started with the gathering of a
large number of signatures. There is interest in whether or not there are fraudulent signatures present in such a
petition, and if so what percentage of the signatures are indeed fraudulent. However, due to the large number
of signatures (tens of thousands), it is not feasible to have a document examiner verify the signatures directly.
Instead, there is interest in creating a subset of signatures where there is a high probability of fraud that can be
verified. We present a method by which a pairwise comparison of signatures can be performed and subsequent
sorting can generate such subsets.
Writer adaptation or specialization is the adjustment of handwriting recognition algorithms to a specific writer's
style of handwriting. Such adjustment yields significantly improved recognition rates over counterpart general
recognition algorithms. We present the first unconstrained off-line handwriting adaptation algorithm for Arabic
presented in the literature. We discuss an iterative bootstrapping model which adapts a writer-independent
model to a writer-dependent model using a small number of words achieving a large recognition rate increase in
the process. Furthermore, we describe a confidence weighting method which generates better results by weighting
words based on their length. We also discuss script features unique to Arabic, and how we incorporate them into
our adaptation process. Even though Arabic has many more character classes than languages such as English,
significant improvement was observed.
The testing set consisting of about 100 pages of handwritten text had an initial average overall recognition
rate of 67%. After the basic adaptation was finished, the overall recognition rate was 73.3%. As the improvement
was most marked for the longer words, and the set of confidently recognized longer words contained many fewer
false results, a second method was presented using them alone, resulting in a recognition rate of about 75%.
Initially, these words had a 69.5% recognition rate, improving to about a 92% recognition rate after adaptation.
A novel hybrid method is presented with a rate of about 77.2%.