The JBIG2 standard is widely used for binary document image compression primarily because it achieves much
higher compression ratios than conventional facsimile encoding standards, such as T.4, T.6, and T.82 (JBIG1).
A typical JBIG2 encoder works by first separating the document into connected components, or symbols. Next
it creates a dictionary by encoding a subset of symbols from the image, and finally it encodes all the remaining
symbols using the dictionary entries as a reference.
In this paper, we propose a novel method for measuring the distance between symbols based on a conditionalentropy
estimation (CEE) distance measure. The CEE distance measure is used to both index entries of the
dictionary and construct the dictionary. The advantage of the CEE distance measure, as compared to conventional
measures of symbol similarity, is that the CEE provides a much more accurate estimate of the number of
bits required to encode a symbol. In experiments on a variety of documents, we demonstrate that the incorporation
of the CEE distance measure results in approximately a 14% reduction in the overall bitrate of the JBIG2
encoded bitstream as compared to the best conventional dissimilarity measures.
We propose a page layout analysis algorithm to classify a scanned document into different regions such as text, photo, or strong lines. The proposed scheme consists of five modules. The first module performs several image preprocessing techniques such as image scaling, filtering, color space conversion, and gamma correction to enhance the scanned image quality and reduce the computation time in later stages. Text detection is applied in the second module wherein wavelet transform and run-length encoding are employed to generate and validate text regions, respectively. The third module uses a Markov random field based block-wise segmentation that employs a basis vector projection technique with maximum a posteriori probability optimization to detect photo regions. In the fourth module, methods for edge detection, edge linking, line-segment fitting, and Hough transform are utilized to detect strong edges and lines. In the last module, the resultant text, photo, and edge maps are combined to generate a page layout map using K-Means clustering. The proposed algorithm has been tested on several hundred documents that contain simple and complex page layout structures and contents such as articles, magazines, business cards, dictionaries, and newsletters, and compared against state-of-the-art page-segmentation techniques with benchmark performance. The results indicate that our methodology achieves an average of ∼ 89% classification accuracy in text, photo, and background regions.
Color quantization algorithms are used to select a small number of colors that can accurately represent the
content of a particular image. In this research, we introduce a novel color quantization algorithm which is based
on the minimization of a modified L<sub>p</sub> norm rather than the more traditional L<sub>2</sub> norm associated with mean square
error (MSE). We demonstrate that the L<sub>p</sub> optimization approach has two advantages. First, it distributes the
colors more uniformly over the regions of the image; and second, the norm's value can be used as an effective
criterion for selecting the minimum number of colors necessary to achieve accurate representation of the image.
One potential disadvantage of the modified L<sub>p</sub> norm criteria is that it could increase the computation of the
associated clustering methods. However, we solve this problem by introducing a two stage clustering procedure in
which the first stage (pre-clustering) agglomerates the full set of pixels into a relatively large number of discrete
colors; and the second stage (post-clustering) performs modified L<sub>p</sub> norm minimization using the reduced number
of discrete colors resulting from the pre-clustering step. The number of groups used in the post-clustering is then
chosen to be the smallest number that achieves a selected threshold value of the normalized L<sub>p</sub> norm. This two-stage
clustering process dramatically reduces computation by merging together colors before the computationally
expensive modified L<sub>p</sub> norm minimization is applied.
A framework for region/zone classification in color and gray-scale scanned documents is proposed in this paper.
The algorithm includes modules for extracting text, photo, and strong edge/line regions. Firstly, a text detection
module which is based on wavelet analysis and Run Length Encoding (RLE) technique is employed. Local and
global energy maps in high frequency bands of the wavelet domain are generated and used as initial text maps.
Further analysis using RLE yields a final text map. The second module is developed to detect image/photo and
pictorial regions in the input document. A block-based classifier using basis vector projections is employed to
identify photo candidate regions. Then, a final photo map is obtained by applying probabilistic model based
on Markov random field (MRF) based maximum a posteriori (MAP) optimization with iterated conditional
mode (ICM). The final module detects lines and strong edges using Hough transform and edge-linkages analysis,
respectively. The text, photo, and strong edge/line maps are combined to generate a page layout classification of
the scanned target document. Experimental results and objective evaluation show that the proposed technique
has a very effective performance on variety of simple and complex scanned document types obtained from
MediaTeam Oulu document database. The proposed page layout classifier can be used in systems for efficient
document storage, content based document retrieval, optical character recognition, mobile phone imagery, and
A major challenge facing content-based image retrieval is bridging the
gap between low-level image primitives and high-level semantics.
We have proposed a new approach for semantic image classification that
utilizes the adaptive perceptual color-texture segmentation algorithm
by Chen et al., which segments natural scenes into perceptually
uniform regions. The color composition and spatial texture features
of the regions are used as medium level descriptors, based on which
the segments are classified into semantic categories. The segment
features consist of spatial texture orientation information and color
composition in terms of a limited number of spatially adapted
dominant colors. The feature selection and the performance of the
classification algorithms are based on the segment statistics.
We investigate the dependence of the segment statistics on
the segmentation algorithm. For this, we compare the statistics of
the segment features obtained using the Chen et al. algorithm to those
that correspond to human segmentations, and show that they are
remarkably similar. We also show that when human segmentations are
used instead of the automatically detected segments, the performance
of the semantic classification approach remains approximately the
The accumulation of large collections of digital images has created the need for efficient and intelligent schemes for content-based image retrieval. Our goal is to organize the contents semantically, according to meaningful categories. We present a new approach for semantic classification that utilizes a recently proposed color-texture segmentation algorithm (by Chen et al.), which combines knowledge of human perception and signal characteristics to segment natural scenes into perceptually uniform regions. The color and texture features of these regions are used as medium level descriptors, based on which we extract semantic labels, first at the segment and then at the scene level. The segment features consist of spatial texture orientation information and color composition in terms of a limited number of locally adapted dominant colors. The
focus of this paper is on region classification. We use a hierarchical vocabulary of segment labels that is consistent with
those used in the NIST TRECVID 2003 development set. We test the approach on a database of 9000 segments obtained from 2500 photographs of natural scenes. For training and classification we use the Linear Discriminant Analysis (LDA) technique. We examine the performance of the algorithm (precision and recall rates) when different sets of features (e.g., one or two most dominant colors versus four quantized dominant colors) are used. Our results indicate that the proposed approach offers significant performance improvements over existing approaches.