21 December 2000 Word extraction using irregular pyramid
Author Affiliations +
Abstract
This paper proposed a new algorithm to perform text extraction from imaged documents. The paper focused in the extraction of word group. Irregular pyramid structure is used as the basis of the algorithm. The uniqueness of this algorithm is its inclusion of strategic background information in the analysis where most techniques have discarded. Both foreground (i.e. text area) and portion of background (i.e. white area) regions are examined. The fundamental of the algorithm is based on the concept of 'closeness' where text information within a group is closed to each other, in terms of spatial distance, as compared to other text area. The result produced by the algorithm is encouraging with the ability to correctly group words of different size, font, arrangement and orientation.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
PohKok Loo, PohKok Loo, Chew Lim Tan, Chew Lim Tan, } "Word extraction using irregular pyramid", Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); doi: 10.1117/12.410857; https://doi.org/10.1117/12.410857
PROCEEDINGS
9 PAGES


SHARE
Back to Top