17 March 2017 Trigram-based algorithms for OCR result correction
Author Affiliations +
Proceedings Volume 10341, Ninth International Conference on Machine Vision (ICMV 2016); 103410O (2017) https://doi.org/10.1117/12.2268559
Event: Ninth International Conference on Machine Vision, 2016, Nice, France
In this paper we consider a task of improving optical character recognition (OCR) results of document fields on low-quality and average-quality images using N-gram models. Cyrillic fields of Russian Federation internal passport are analyzed as an example. Two approaches are presented: the first one is based on hypothesis of dependence of a symbol from two adjacent symbols and the second is based on calculation of marginal distributions and Bayesian networks computation. A comparison of the algorithms and experimental results within a real document OCR system are presented, it's showed that the document field OCR accuracy can be improved by more than 6% for low-quality images.
© (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Konstantin Bulatov, Konstantin Bulatov, Temudzhin Manzhikov, Temudzhin Manzhikov, Oleg Slavin, Oleg Slavin, Igor Faradjev, Igor Faradjev, Igor Janiszewski, Igor Janiszewski, } "Trigram-based algorithms for OCR result correction", Proc. SPIE 10341, Ninth International Conference on Machine Vision (ICMV 2016), 103410O (17 March 2017); doi: 10.1117/12.2268559; https://doi.org/10.1117/12.2268559


Intelligent word-based text recognition
Proceedings of SPIE (January 31 1991)
A multi-evidence, multi-engine OCR system
Proceedings of SPIE (January 28 2007)
A mixed approach to auto-detection of page body
Proceedings of SPIE (January 27 2008)
Heuristics for test recognition using contextual information
Proceedings of SPIE (January 30 1995)
Character recognition in the presence of occluding clutter
Proceedings of SPIE (January 18 2009)

Back to Top