16 January 2006 Robust feature extraction for character recognition based on binary images
Author Affiliations +
Optical Character Recognition (OCR) is a classical research field and has become one of most successful applications in the area of pattern recognition. Feature extraction is a key step in the process of OCR. This paper presents three algorithms for feature extraction based on binary images: the Lattice with Distance Transform (DTL), Stroke Density (SD) and Co-occurrence Matrix (CM). DTL algorithm improves the robustness of the lattice feature by using distance transform to increase the distance of the foreground and background and thus reduce the influence from the boundary of strokes. SD and CM algorithms extract robust stroke features base on the fact that human recognize characters according to strokes, including length and orientation. SD reflects the quantized stroke information including the length and the orientation. CM reflects the length and orientation of a contour. SD and CM together sufficiently describe strokes. Since these three groups of feature vectors complement each other in expressing characters, we integrate them and adopt a hierarchical algorithm to achieve optimal performance. Our methods are tested on the USPS (United States Postal Service) database and the Vehicle License Plate Number Pictures Database (VLNPD). Experimental results shows that the methods gain high recognition rate and cost reasonable average running time. Also, based on similar condition, we compared our results to the box method proposed by Hannmandlu [18]. Our methods demonstrated better performance in efficiency.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Lijun Wang, Lijun Wang, Li Zhang, Li Zhang, Yuxiang Xing, Yuxiang Xing, Zhiming Wang, Zhiming Wang, Hewei Gao, Hewei Gao, "Robust feature extraction for character recognition based on binary images", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 606708 (16 January 2006); doi: 10.1117/12.650386; https://doi.org/10.1117/12.650386


Back to Top