In this paper, we present a novel method for extracting handwritten and printed text zones from noisy document
images with mixed content. We use Triple-Adjacent-Segment (TAS) based features which encode local shape
characteristics of text in a consistent manner. We first construct two codebooks of the shape features extracted
from a set of handwritten and printed text documents respectively. We then compute the normalized histogram
of codewords for each segmented zone and use it to train a Support Vector Machine (SVM) classifier. The
codebook based approach is robust to the background noise present in the image and TAS features are invariant
to translation, scale and rotation of text. In experiments, we show that a pixel-weighted zone classification
accuracy of 98% can be achieved for noisy Arabic documents. Further, we demonstrate the effectiveness of our
method for document page classification and show that a high precision can be achieved for the detection of
machine printed documents. The proposed method is robust to the size of zones, which may contain text content
at line or paragraph level.
In this paper, a new formulation for the parametric active contour
model is presented. The new formulation is based on statistical
pattern recognition theory. A hybrid of kernel density estimation
and fuzzy logic is used to show that active contours can be
thought of as a pattern recognition problem. The proposed approach
is used in two different application domains, with different
performance requirements, to demonstrate its effectiveness. First,
the proposed approach is used for a magnetic resonance image
segmentation problem to demonstrate the segmentation accuracy.
Second, the contour is used in a target tracking experiment to
show its tracking capabilities.