Translator Disclaimer
1 April 1998 Compound character recognition by run-number-based metric distance
Author Affiliations +
Abstract
This paper concerns automatic OCR of Bangla, a major Indian Language Script which is the fourth most popular script in the world. A Bangla OCR system has to recognize about 300 graphemic shapes among which 250 compound characters have quite complex stroke patterns. For recognition of such compound characters, feature based approaches are less reliable and template based approaches are less flexible to size and style variation of character font. We combine the positive aspects of feature based and template based approaches. Here we propose a run number based normalized template matching technique for compound character recognition. Run number vectors for both horizontal and vertical scanning are computed. As the number of scans may very from pattern to pattern, we normalize and abbreviate the vector. We prove that this normalized and abbreviated vector induces metric distance metric distance. Moreover, this vector is invariant to scaling, insensitive to character style variation and more effective for more complex-shaped characters than simple-shaped ones. We use this vector representation for matching within a group of compound characters. We notice that the matching is more efficient if the vector is reorganized with respect to the centroid of the pattern. We have tested our approach on a large set of segmented compounds characters at different point sizes as well as different styles. Italic characters are subject to preprocessing. The overall correct recognition rate is 99.69 percent.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Uptal Garain and B. B. Chaudhuri "Compound character recognition by run-number-based metric distance", Proc. SPIE 3305, Document Recognition V, (1 April 1998); https://doi.org/10.1117/12.304622
PROCEEDINGS
8 PAGES


SHARE
Advertisement
Advertisement
RELATED CONTENT

A new method for fast circle detection in a complex...
Proceedings of SPIE (December 02 2011)
Principal curve detection in complicated graph images
Proceedings of SPIE (September 19 2001)
Local window approach to detect line segment based on line...
Proceedings of SPIE (September 20 2001)
Parallel Algorithms For Real-Time Vision
Proceedings of SPIE (April 29 1987)
System for Oriya handwritten numeral recognition
Proceedings of SPIE (December 14 2003)

Back to Top