When document images are obtained from digital cameras, many imaging problems have to be solved for better extraction of characters from the images. Variation of illumination intensity sensitively affects to color values. A simple colored document image could be converted to a monochrome image by a traditional method and then a binarization algorithm is used. But this method is not stably working to the variation of illumination because sensitivity of colors to variation of illumination. For narrowly distributed colors, the conversion is not working well. Secondly, in case that the number of colors is more than two, it is not easy to figure out which color is for character and which others are for background.
This paper discusses about an extraction method from a colored document image using a color process algorithm based on characteristics of color features. Variation of intensities and color distribution are used to classify character areas and background areas. A document image is segmented into several color groups and similar color groups are merged. In final step, only two colored groups are left for the character and background. The extracted character areas from the document images are entered into optical character recognition system. This method solves a color problem, which comes from traditional scanner based OCR systems. This paper also describes the OCR system for character conversion of a colored document image. Our method is working for the colored document images of cellular phones and digital cameras in real world.
The usage of cellular camera phones and digital cameras is rapidly increasing. but camera imaging application is not so expanded due to the lack of practical camera imaging technology. Especially the acquisition environments of camera images are very different from those of scanner images. The status of light condition, viewing distance and viewing angles constantly varies in case of cameras. The variations of light condition and viewing distance make it difficult to extract character areas from images through binarization and the variation of camera viewing angles makes the images distorted geometrically. Therefore, the extraction of character areas for camera document images is far more complex and difficult than for scanner images.
In this paper, these problems are totally discussed and the resolving methods are suggested for better image recognition. The solutions such as adaptive binarization, color conversion, correction of lens distortion and geometrical distortion correction are discussed and the correction methods are suggested for accurate document image recognition. In experiment, we use the various types of document images captured by mobile phone cameras and digital cameras. The results of distortion correction show that our image processing methods are efficient to increase the accuracy of character recognition for camera based document image.