Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved.
However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor
results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from the 11th century are
investigated. In order to minimize the consequences of false character segmentation, a binarization-free approach
based on local descriptors is proposed. Additionally local information allows the recognition of partially visible
or washed out characters. The proposed algorithm consists of two steps: character classification and character
localization. Initially Scale Invariant Feature Transform (SIFT) features are extracted which are subsequently
classified using Support Vector Machines (SVM). Afterwards, the interest points are clustered according to their
spatial information. Thereby, characters are localized and finally recognized based on a weighted voting scheme
of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded
manuscript images with background clutter (e.g. stains, tears) and faded out characters.
"Recognizing characters of ancient manuscripts", Proc. SPIE 7531, Computer Vision and Image Analysis of Art, 753106 (16 February 2010); doi: 10.1117/12.843532; https://doi.org/10.1117/12.843532