19 January 2009 Retrieval of historical documents by word spotting
Author Affiliations +
The implementation of word spotting is not an easy procedure and it gets even worse in the case of historical documents since it requires character recognition and indexing of the document images. A general technique for word spotting is presented, independent of OCR, using automatic representation of the text queries of the user by word images and comparing them with the word images extracted from the document images. The proposed system does not require training. The only required preprocessing task is the alphabet determination. Global shape features are used to describe the words. They are very general in order to capture the form of the word and appropriately normalized in order to face the usual problems of variance in resolution, width of words and fonts. A novel technique that makes use of the interpolation method is presented. In our experiments, we analyze the system dependence on its parameters and we prove that its performance is similar to the trainable systems.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Nikoleta Doulgeri, Nikoleta Doulgeri, Ergina Kavallieratou, Ergina Kavallieratou, } "Retrieval of historical documents by word spotting", Proc. SPIE 7247, Document Recognition and Retrieval XVI, 724706 (19 January 2009); doi: 10.1117/12.805602; https://doi.org/10.1117/12.805602

Back to Top