17 March 2017 Trigram-based algorithms for OCR result correction
Author Affiliations +
Proceedings Volume 10341, Ninth International Conference on Machine Vision (ICMV 2016); 103410O (2017) https://doi.org/10.1117/12.2268559
Event: Ninth International Conference on Machine Vision, 2016, Nice, France
Abstract
In this paper we consider a task of improving optical character recognition (OCR) results of document fields on low-quality and average-quality images using N-gram models. Cyrillic fields of Russian Federation internal passport are analyzed as an example. Two approaches are presented: the first one is based on hypothesis of dependence of a symbol from two adjacent symbols and the second is based on calculation of marginal distributions and Bayesian networks computation. A comparison of the algorithms and experimental results within a real document OCR system are presented, it's showed that the document field OCR accuracy can be improved by more than 6% for low-quality images.
© (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Konstantin Bulatov, Temudzhin Manzhikov, Oleg Slavin, Igor Faradjev, Igor Janiszewski, "Trigram-based algorithms for OCR result correction", Proc. SPIE 10341, Ninth International Conference on Machine Vision (ICMV 2016), 103410O (17 March 2017); doi: 10.1117/12.2268559; https://doi.org/10.1117/12.2268559
PROCEEDINGS
5 PAGES


SHARE
RELATED CONTENT


Back to Top