30 March 1995 Character segmentation using visual interword constraints in a text page
Author Affiliations +
Character segmentation is a critical preprocessing step for text recognition. In this paper a method is presented that utilizes visual inter-word constraints available in a text image to split word images into smaller image pieces. This method is applicable to machine-printed texts in which the same spacing is always used between identical pairs of characters. The visual inter- word constraints considered here include information about whether a word image is a sub- image of another word image. For example, given two word images A and B, which are `mathematical' and `the.' If the short word image B is found to be a sub-image of the long word image A, the longer image A is split into three pieces, A1, A2, and A3, where A2 matches B, A1 corresponds to `ma,' and A3 corresponds to `matical.' The image piece A1 can be further used to split A3 into two parts, `ma' and `tical.' This method is based purely on image processing using the visual context in a text page. No recognition is involved.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tao Hong, Jonathan J. Hull, "Character segmentation using visual interword constraints in a text page", Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205820; https://doi.org/10.1117/12.205820


Back to Top