29 January 1999 Performance evaluation of two Arabic OCR products
Author Affiliations +
Proceedings Volume 3584, 27th AIPR Workshop: Advances in Computer-Assisted Recognition; (1999); doi: 10.1117/12.339809
Event: The 27th AIPR Workshop: Advances in Computer-Assisted Recognition, 1998, Washington, DC, United States
Abstract
Numerous Optical Character Recognition (OCR) companies claim that their products have near-perfect recognition accuracy (close to 99.9%). In practice, however, these accuracy rates are rarely achieved. Most systems break down when the input document images are highly degraded, such as scanned images of carbon-copy documents, documents printed on low-quality paper, and documents that are n-th generation photocopies. Besides, the end user cannot compare the relative performances of the products because the various accuracy results are not reported on the same dataset.. In this article we report our evaluation results for two popular Arabic OCR products: (1) Sakhr OCR and (2) OmniPage for Arabic. In our evaluation we establish that the Sakhr OCR product has 15.47% lower page error rate relative to the OmniPage page error rate. The absolute page accuracy rates for Sakhr and Omnipage are 90.33% and 86.89% respectively. Our evaluation was performed using the SAIC Arabic image dataset, and we used only those pages for which both OCR systems produced output. A scatter-plot of the page accuracy-rate pairs reveals that Sakhr in general performs better on low-accuracy (degraded) pages. The scatter-plot visualization technique allows an algorithm developer to easily detect and analyze outliers in the results.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tapas Kanungo, Gregory A. Marton, Osama Bulbul, "Performance evaluation of two Arabic OCR products", Proc. SPIE 3584, 27th AIPR Workshop: Advances in Computer-Assisted Recognition, (29 January 1999); doi: 10.1117/12.339809; https://doi.org/10.1117/12.339809
PROCEEDINGS
8 PAGES


SHARE
KEYWORDS
Optical character recognition

Algorithm development

Lithium

Visualization

Detector development

Visual analytics

Analytical research

Back to Top