24 March 2014 A Markov chain based line segmentation framework for handwritten character recognition
Author Affiliations +
In this paper, we present a novel text line segmentation framework following the divide-and-conquer paradigm: we iteratively identify and re-process regions of ambiguous line segmentation from an input document image until there is no ambiguity. To detect ambiguous line segmentation, we introduce the use of two complimentary line descriptors, referred as to the underline and highlight line descriptors, and identify ambiguities when their patterns mismatch. As a result, we can easily identify already good line segmentations, and largely simplify the original line segmentation problem by only reprocessing ambiguous regions. We evaluate the performance of the proposed line segmentation framework using the ICDAR 2009 handwritten document dataset, and it is close to top-performing systems submitted to the competition. Moreover, the proposed method is also robust against skewness, noise, variable line heights and touching characters. The proposed idea can also be applied to other text analysis tasks such as word segmentation and page layout analysis.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yue Wu, Yue Wu, Shengxin Zha, Shengxin Zha, Huaigu Cao, Huaigu Cao, Daben Liu, Daben Liu, Premkumar Natarajan, Premkumar Natarajan, "A Markov chain based line segmentation framework for handwritten character recognition", Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210C (24 March 2014); doi: 10.1117/12.2042600; https://doi.org/10.1117/12.2042600

Back to Top