In this paper, we report a breakthrough result on the difficult task of segmentation and recognition of coloured
text from the word image dataset of ICDAR robust reading competition challenge 2: reading text in scene images.
We split the word image into individual colour, gray and lightness planes and enhance the contrast of each of
these planes independently by a power-law transform. The discrimination factor of each plane is computed as
the maximum between-class variance used in Otsu thresholding. The plane that has maximum discrimination
factor is selected for segmentation. The trial version of Omnipage OCR is then used on the binarized words for
recognition. Our recognition results on ICDAR 2011 and ICDAR 2003 word datasets are compared with those
reported in the literature. As baseline, the images binarized by simple global and local thresholding techniques
were also recognized. The word recognition rate obtained by our non-linear enhancement and selection of plance
method is 72.8% and 66.2% for ICDAR 2011 and 2003 word datasets, respectively. We have created ground-truth
for each image at the pixel level to benchmark these datasets using a toolkit developed by us. The recognition
rate of benchmarked images is 86.7% and 83.9% for ICDAR 2011 and 2003 datasets, respectively.
Proc. SPIE. 8658, Document Recognition and Retrieval XX
KEYWORDS: Image processing algorithms and systems, Signal to noise ratio, Detection and tracking algorithms, Image segmentation, Error analysis, Image analysis, Image quality, Neon, Optical character recognition, Binary data
A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation
of the document. In order to binarize a scanned document, we can find several algorithms in the literature.
What is the best binarization result for a given document image? To answer this question, a user needs to check
different binarization algorithms for suitability, since different algorithms may work better for different type of
documents. Manually choosing the best from a set of binarized documents is time consuming. To automate
the selection of the best segmented document, either we need to use ground-truth of the document or propose
an evaluation metric. If ground-truth is available, then precision and recall can be used to choose the best
binarized document. What is the case, when ground-truth is not available? Can we come up with a metric which
evaluates these binarized documents? Hence, we propose a metric to evaluate binarized document images using
eigen value decomposition. We have evaluated this measure on DIBCO and H-DIBCO datasets. The proposed
method chooses the best binarized document that is close to the ground-truth of the document.