24 January 2011 Robust keyword retrieval method for OCRed text
Author Affiliations +
Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yusaku Fujii, Yusaku Fujii, Hiroaki Takebe, Hiroaki Takebe, Hiroshi Tanaka, Hiroshi Tanaka, Yoshinobu Hotta, Yoshinobu Hotta, } "Robust keyword retrieval method for OCRed text", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 787411 (24 January 2011); doi: 10.1117/12.876470; https://doi.org/10.1117/12.876470


Back to Top