Paper
28 January 2008 Word mining in a sparsely labeled handwritten collection
Author Affiliations +
Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150N (2008) https://doi.org/10.1117/12.766329
Event: Electronic Imaging, 2008, San Jose, California, United States
Abstract
Word-spotting techniques are usually based on detailed modeling of target words, followed by search for the locations of such a target word in images of handwriting. In this study, the focus is on deciding for the presence of target words in lines of text, regardless and disregarding their horizontal position. Line strips are modeled using a Bag-of-Glyphs approach using a self-organized map. This approach uses the presence of fragmented-connected component shapes (glyphs) in a line strip to characterize this text passage, similar to the Bag-of-Words approach for 'ASCII'-encoded documents in regular Information Retrieval. Subsequently, the presence of a word or word category is trained to a support-vector machine in an iterative setup which involves an active group of users. Results are promising for a large proportion of words and are dependent both on the amount of labeled lines as well as shape uniqueness. Particularly useful is the ability to train on abstract content classes such as proper names, municipalities or word-bigram presence in the line-strip images.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
L. R. B. Schomaker "Word mining in a sparsely labeled handwritten collection", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150N (28 January 2008); https://doi.org/10.1117/12.766329
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mining

Image segmentation

Human-machine interfaces

Chemical elements

Inspection

Interfaces

Artificial intelligence

RELATED CONTENT


Back to Top