15 December 2003 Design and development of an ancient Chinese document recognition system
Author Affiliations +
The digitization of ancient Chinese documents presents new challenges to OCR (Optical Character Recognition) research field due to the large character set of ancient Chinese characters, variant font types, and versatile document layout styles, as these documents are historical reflections to the thousands of years of Chinese civilization. After analyzing the general characteristics of ancient Chinese documents, we present a solution for recognition of ancient Chinese documents with regular font-types and layout-styles. Based on the previous work on multilingual OCR in TH-OCR system, we focus on the design and development of two key technologies which include character recognition and page segmentation. Experimental results show that the developed character recognition kernel of 19,635 Chinese characters outperforms our original traditional Chinese recognition kernel; Benchmarked test on printed ancient Chinese books proves that the proposed system is effective for regular ancient Chinese documents.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Liangrui Peng, Liangrui Peng, Pingping Xiu, Pingping Xiu, Xiaoqing Ding, Xiaoqing Ding, } "Design and development of an ancient Chinese document recognition system", Proc. SPIE 5296, Document Recognition and Retrieval XI, (15 December 2003); doi: 10.1117/12.529107; https://doi.org/10.1117/12.529107

Back to Top