7 March 1996 Character recognition of Japanese newspaper headlines with graphical designs
Author Affiliations +
Abstract
Graphical designs are often used in Japanese newspaper headlines to indicate hot articles. However, conventional OCR software seldom recognizes characters in such headlines because of the difficulty of removing the designs. This paper proposes a method that recognizes these characters without needing removal of the graphical designs. First, the number of text-line regions and the averaged character heights are roughly extracted from the local distribution of the black and white runs observed in a rectangular window while the window is shifted pixel- by-pixel along the direction of the text-line. Next, normalized text-line regions are yielded by normalizing their heights to the height of binary reference patterns in a dictionary. Next, displacement matching is applied to the normalized text-line region for character recognition. A square window at each position is matched against binary reference patterns while being shifted pixel-by-pixel along the direction of the text-line. The complementary similarity measure, which is robust against graphical designs, is used as a discriminant function. When the maximum similarity value at each position exceeds the threshold, which is automatically determined from the degree of degradation in the square window, the character category of this similarity value is specified as a recognized category. Experimental results for fifty Japanese newspaper headlines show that the method achieves recognition rates of over 90%, much higher than a conventional method (17%).
© (1996) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Minako Sawaki, Minako Sawaki, Norihiro Hagita, Norihiro Hagita, } "Character recognition of Japanese newspaper headlines with graphical designs", Proc. SPIE 2660, Document Recognition III, (7 March 1996); doi: 10.1117/12.234718; https://doi.org/10.1117/12.234718
PROCEEDINGS
9 PAGES


SHARE
RELATED CONTENT


Back to Top