Paper
17 March 2017 Trigram-based algorithms for OCR result correction
Author Affiliations +
Proceedings Volume 10341, Ninth International Conference on Machine Vision (ICMV 2016); 103410O (2017) https://doi.org/10.1117/12.2268559
Event: Ninth International Conference on Machine Vision, 2016, Nice, France
Abstract
In this paper we consider a task of improving optical character recognition (OCR) results of document fields on low-quality and average-quality images using N-gram models. Cyrillic fields of Russian Federation internal passport are analyzed as an example. Two approaches are presented: the first one is based on hypothesis of dependence of a symbol from two adjacent symbols and the second is based on calculation of marginal distributions and Bayesian networks computation. A comparison of the algorithms and experimental results within a real document OCR system are presented, it's showed that the document field OCR accuracy can be improved by more than 6% for low-quality images.
© (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Konstantin Bulatov, Temudzhin Manzhikov, Oleg Slavin, Igor Faradjev, and Igor Janiszewski "Trigram-based algorithms for OCR result correction", Proc. SPIE 10341, Ninth International Conference on Machine Vision (ICMV 2016), 103410O (17 March 2017); https://doi.org/10.1117/12.2268559
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Detection and tracking algorithms

Associative arrays

Analytical research

Visual process modeling

Error analysis

Image processing

RELATED CONTENT

Comparison of scanned administrative document images
Proceedings of SPIE (January 31 2020)
A multi-evidence, multi-engine OCR system
Proceedings of SPIE (January 29 2007)
Trainable multiscript orientation detection
Proceedings of SPIE (January 18 2010)

Back to Top