23 October 1996 Character extraction from documents using wavelet maxima
Author Affiliations +
Abstract
The extraction of character image is an important front-end processing for optical character recognition (OCR) and other applications. This process is extremely important because the OCR applications usually extract salient features and process on them. The existence of noise not only destroys features of characters, but also introduces unwanted features. We propose a new algorithm which removes unwanted background noises from a textual image. Our algorithm is based on the observation that the magnitude of the intensity variation of character boundaries differs form that of noises at various scales of their wavelet transform. Therefore, most of the edges corresponding to the character boundaries at each scale can be extracted using a thresholding method. The internal region of characters is determined by a voting procedure, which uses the arguments of the remaining edges. The interior of recovered characters is solid containing no holes. Characters tend to become fattened, because of the smoothness being applied in the calculation of wavelet transform. To obtain a quality restoration of character image, the precise locations of characters at the original image are then estimated using a Bayesian criterion. Detailed algorithm with careful analysis of the free parameters are also conducted in this paper. The method is simple and effective. We also present some experimental results that suggest its effectiveness.
© (1996) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Wen-Liang Hwang, Wen-Liang Hwang, Fu Chang, Fu Chang, } "Character extraction from documents using wavelet maxima", Proc. SPIE 2825, Wavelet Applications in Signal and Image Processing IV, (23 October 1996); doi: 10.1117/12.255222; https://doi.org/10.1117/12.255222
PROCEEDINGS
13 PAGES


SHARE
RELATED CONTENT


Back to Top