Translator Disclaimer
28 January 2008 Word segmentation of off-line handwritten documents
Author Affiliations +
Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150E (2008)
Event: Electronic Imaging, 2008, San Jose, California, United States
Word segmentation is the most critical pre-processing step for any handwritten document recognition and/or retrieval system. When the writing style is unconstrained (written in a natural manner), recognition of individual components may be unreliable, so they must be grouped together into word hypotheses before recognition algorithms can be used. This paper describes a gap metrics based machine learning approach to separate a line of unconstrained handwritten text into words. Our approach uses a set of both local and global features, which is motivated by the ways in which human beings perform this kind of task. In addition, in order to overcome the disadvantage of different distance computation methods, we propose a combined distance measure computed using three different methods. The classification is done by using a three-layer neural network. The algorithm is evaluated using an unconstrained handwriting database that contains 50 pages (1026 line, 7562 words images) handwritten documents. The overall accuracy is 90.8%, which shows a better performance than a previous method.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chen Huang and Sargur N. Srihari "Word segmentation of off-line handwritten documents", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150E (28 January 2008);


Back to Top