17 January 2005 Automatic style clustering of printed characters in form images
Author Affiliations +
Style is an important feature of printed or handwritten characters. But it is not studied thoroughly compared with character recognition. In this paper, we try to learn how many typical styles exist in a kind of real world form images. A hierarchical clustering method has been developed and tested. A cross recognition error rate constraint is proposed to reduce the false combinations in the hierarchical clustering process, and a cluster selecting method is used to delete redundant or unsuitable clusters. Only a similarity measure between any patterns is needed by the algorithm. It is tested on a template matching based similarity measure which can be extended to any other feature and distance measure easily. The detailed comparing on every step’s effects is shown in the paper. Total 16 kinds of typical styles are found out, and by giving each character in each style a prototype for recognition, a 0.78% error rate is achieved by recognizing the testing set.
© (2005) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Changsong Liu, Changsong Liu, Xiaoqing Ding, Xiaoqing Ding, } "Automatic style clustering of printed characters in form images", Proc. SPIE 5676, Document Recognition and Retrieval XII, (17 January 2005); doi: 10.1117/12.588138; https://doi.org/10.1117/12.588138


Back to Top