The segmentation of degraded characters is a challenging problem. A lot of optical character recognition systems remain weak in this problem. Broken and touching characters are two types of degradation that one encounters frequently in the old documents, in newspapers where the writing is very condensed, etc. We propose therefore in this article very efficient techniques to solve the problem of segmentation and recognition of broken and touching characters. These techniques use several mathematical tools as Fuzzy Logic or statistics. An algorithm regrouping all these techniques is exposed at the end of this article. Based on an approach by cooperation between classification and segmentation, our algorithm succeed to treat chains of characters either constituted of a certain number of broken/touching characters without a priori knowledge of the width of characters nor their number.
"Segmentation of degraded characters", Proc. SPIE 3967, Document Recognition and Retrieval VII, (22 December 1999); doi: 10.1117/12.373491; https://doi.org/10.1117/12.373491