Translator Disclaimer
16 January 2006 Spotting words in handwritten Arabic documents
Author Affiliations +
Proceedings Volume 6067, Document Recognition and Retrieval XIII; 606702 (2006)
Event: Electronic Imaging 2006, 2006, San Jose, California, United States
The design and performance of a system for spotting handwritten Arabic words in scanned document images is presented. Three main components of the system are a word segmenter, a shape based matcher for words and a search interface. The user types in a query in English within a search window, the system finds the equivalent Arabic word, e.g., by dictionary look-up, locates word images in an indexed (segmented) set of documents. A two-step approach is employed in performing the search: (1) prototype selection: the query is used to obtain a set of handwritten samples of that word from a known set of writers (these are the prototypes), and (2) word matching: the prototypes are used to spot each occurrence of those words in the indexed document database. A ranking is performed on the entire set of test word images-- where the ranking criterion is a similarity score between each prototype word and the candidate words based on global word shape features. A database of 20,000 word images contained in 100 scanned handwritten Arabic documents written by 10 different writers was used to study retrieval performance. Using five writers for providing prototypes and the other five for testing, using manually segmented documents, 55% precision is obtained at 50% recall. Performance increases as more writers are used for training.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Sargur Srihari, Harish Srinivasan, Pavithra Babu, and Chetan Bhole "Spotting words in handwritten Arabic documents", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 606702 (16 January 2006);


Back to Top