In this paper the problem statement is given to compare the digitized pages of the official papers. Such problem appears during the comparison of two customer copies signed at different times between two parties with a view to find the possible modifications introduced on the one hand. This problem is a practically significant in the banking sector during the conclusion of contracts in a paper format. The method of comparison based on the recognition, which consists in the comparison of two bag-of-words, which are the recognition result of the master and test pages, is suggested. The described experiments were conducted using the OCR Tesseract and the siamese neural network. The advantages of the suggested method are the steady operation of the comparison algorithm and the high exacting precision, and one of the disadvantages is the dependence on the chosen OCR.
This paper discusses a task of document recognition on a sequence of video frames. In order to optimize the processing speed an estimation is performed of stability of recognition results obtained from several video frames. Considering identity document (Russian internal passport) recognition on a mobile device it is shown that significant decrease is possible of the number of observations necessary for obtaining precise recognition result.
In this paper we consider a task of improving optical character recognition (OCR) results of document fields on low-quality and average-quality images using N-gram models. Cyrillic fields of Russian Federation internal passport are analyzed as an example. Two approaches are presented: the first one is based on hypothesis of dependence of a symbol from two adjacent symbols and the second is based on calculation of marginal distributions and Bayesian networks computation. A comparison of the algorithms and experimental results within a real document OCR system are presented, it's showed that the document field OCR accuracy can be improved by more than 6% for low-quality images.