1 June 1994 Automated analysis of mixed documents consisting of printed Korean/alphanumeric texts and graphic images
Author Affiliations +
Abstract
An efficient algorithm is proposed that recognizes a mixed document consisting of printed Korean/alphanumeric text and graphic images. In the preprocessing step, an input document is skew-normalized, if necessary, by rotating it by an angle detected with the Hough transform. Then we separate the graphic image parts from the text parts by considering chain codes of connected components. We further separate each character using vertical and horizontal projections. In the recognition step, a mixed text consisting of two different sets of characters, e.g. , Korean and alphanumeric characters is recognized. Korean and alphanumeric characters are classified and each is recognized hierarchically using several effective features. The output is obtained by combining the recognized characters and separated graphic parts. An efficient automated analysis algorithm for mixed documents consisting of graphic images and two different sets of characters is proposed and its performance is demonstrated via computer simulation.
Young Kug Ham, Hong Kyu Chung, In Kwon Kim, Rae-Hong Park, "Automated analysis of mixed documents consisting of printed Korean/alphanumeric texts and graphic images," Optical Engineering 33(6), (1 June 1994). https://doi.org/10.1117/12.171323 . Submission:
JOURNAL ARTICLE
9 PAGES


SHARE
Back to Top