We propose a page layout analysis algorithm to classify a scanned document into different regions such as text, photo, or strong lines. The proposed scheme consists of five modules. The first module performs several image preprocessing techniques such as image scaling, filtering, color space conversion, and gamma correction to enhance the scanned image quality and reduce the computation time in later stages. Text detection is applied in the second module wherein wavelet transform and run-length encoding are employed to generate and validate text regions, respectively. The third module uses a Markov random field based block-wise segmentation that employs a basis vector projection technique with maximum a posteriori probability optimization to detect photo regions. In the fourth module, methods for edge detection, edge linking, line-segment fitting, and Hough transform are utilized to detect strong edges and lines. In the last module, the resultant text, photo, and edge maps are combined to generate a page layout map using K-Means clustering. The proposed algorithm has been tested on several hundred documents that contain simple and complex page layout structures and contents such as articles, magazines, business cards, dictionaries, and newsletters, and compared against state-of-the-art page-segmentation techniques with benchmark performance. The results indicate that our methodology achieves an average of ∼ 89% classification accuracy in text, photo, and background regions.
In the field of identifying regions-of-interest (ROI) in digital images, several image-sets are referenced in the literature;
the open-source ones typically present a single main object (usually located at or near the image center as a pop-out). In
this paper, we present a comprehensive image-set (with its ground-truth) which will be made publically available. The
database consists of images that demonstrate multiple-regions-of-interest (MROI) or multiple-levels-of-interest (MLOI).
The former terminology signifies that the scene has a group of subjects/objects (not necessarily spatially-connected
regions) that share the same level of perceptual priority to the human observer while the latter indicates that the scene is
complex enough to have primary, secondary, and background objects. The methodology for developing the proposed
image-set is described. A psychophysical experiment to identify MROI and MLOI was conducted, the results of which
are also presented. The image-set has been developed to be used in training and evaluation of ROI detection algorithms.
Applications include image compression, thumbnailing, summarization, and mobile phone imagery.
A framework for region/zone classification in color and gray-scale scanned documents is proposed in this paper.
The algorithm includes modules for extracting text, photo, and strong edge/line regions. Firstly, a text detection
module which is based on wavelet analysis and Run Length Encoding (RLE) technique is employed. Local and
global energy maps in high frequency bands of the wavelet domain are generated and used as initial text maps.
Further analysis using RLE yields a final text map. The second module is developed to detect image/photo and
pictorial regions in the input document. A block-based classifier using basis vector projections is employed to
identify photo candidate regions. Then, a final photo map is obtained by applying probabilistic model based
on Markov random field (MRF) based maximum a posteriori (MAP) optimization with iterated conditional
mode (ICM). The final module detects lines and strong edges using Hough transform and edge-linkages analysis,
respectively. The text, photo, and strong edge/line maps are combined to generate a page layout classification of
the scanned target document. Experimental results and objective evaluation show that the proposed technique
has a very effective performance on variety of simple and complex scanned document types obtained from
MediaTeam Oulu document database. The proposed page layout classifier can be used in systems for efficient
document storage, content based document retrieval, optical character recognition, mobile phone imagery, and
A two-fold image understanding algorithm based on Bayesian networks is introduced. The methodology has modules for
image segmentation evaluation and region of interest (ROI) identification. The former uses a set of segmentation maps
(SMs) of a target image to identify the optimal one. These SMs could be generated from the same segmentation
algorithm at different thresholds or from different segmentation techniques. Global and regional low-level image features
are extracted from the optimal SM and used along with the original image to identify the ROI. The proposed algorithm
was tested on a set of 4000 color images that are publicly available and compared favorably to the state-of-the-art
techniques. Applications of the proposed framework include image compression, image summarization, mobile phone
imagery, digital photo cropping, and image thumb-nailing.
In this paper, a framework for detecting lines in a polished or textured substrate is proposed. Modules for image capture,
rectification, enhancement, and line detection are included. If the surface being examined is specular (mirror-like), the
image capture will be restricted, that is, the camera has to be fixed off-axis in the zenith direction. A module for image
rectification and projection is included to overcome this limitation in order to yield an orthographic image. In addition, a
module for image enhancement that includes high-boost is employed to improve the edge sharpness and decrease the
spatial noise in the image. Finally, a line-integral technique is applied to find the confidence vectors that represent the
spatial positions of the lines of interest. The Full-Width at Half-Max (FWHM) approximation is applied to determine the
corresponding lines in a target image. Experimental results show that our technique has an effective performance on
synthetic and real images. Print quality assessment is the main application of the proposed algorithm; however, it can be
used to detect lines/ streak in prints, on substrate or any type of media where lines are visible.
An image-understanding algorithm for identifying Regions-of-Interest (ROI) in digital images is proposed. Global and
regional features that characterize relations between image segments are fused in a probabilistic framework to generate
ROI for an arbitrary image. Features are introduced as maps for spatial position, weighted similarity, and weighted
homogeneity for image regions. The proposed methodology includes modules for image segmentation, feature
extraction, and probabilistic reasoning. It differs from prior art by using machine learning techniques to discover the
optimum Bayesian Network structure and probabilistic inference. It also eliminates the necessity for semantic
understanding at intermediate stages. Experimental results show a competitive performance in comparison with the state-of-
the-art techniques with an accuracy rate of ~80% on a set of ~20,000 publicly available color images. Applications of
the proposed algorithm include content-based image retrieval, image indexing, automatic image annotation, mobile
phone imagery, and digital photo cropping.
We propose an image-understanding algorithm for identifying and ranking regions of perceptually relevant content in digital images. Global features that characterize relations between image regions are fused in a probabilistic framework to generate a region ranking map (RRM) of an arbitrary image. Features are introduced as maps for spatial position, weighted similarity, and weighted homogeneity for image regions. Further analysis of the RRM, based on the receiver operating characteristic curve, has been utilized to generate a binary map that signifies region of interest in the test image. The algorithm includes modules for image segmentation, feature extraction, and probabilistic reasoning. It differs from prior art by using machine learning techniques to discover the optimum Bayesian Network structure and probabilistic inference. It also eliminates the necessity for semantic understanding at intermediate stages. Experimental results indicate an accuracy rate of ~90% on a set of ~4000 color images that are publicly available and compare favorably to state-of-the-art techniques. Applications of the proposed algorithm include smart image and document rendering, content-based image retrieval, adaptive image compression and coding, and automatic image annotation.
In this paper, we present an algorithm for image stitching that avoids performance hindrance and memory issues in
diverse image processing applications/ environments. High-resolution images could be cut into smaller pieces by various
applications for ease of processing, especially if they are sent over a computer network. Image pieces (from several highresolution
images) could be stored as a single image-set with no information about the original images. We propose a
robust stitching methodology to reconstruct the original high-resolution image(s) from a target image-set that contains
components of various sizes and resolutions. The proposed algorithm consists of three major modules. The first step
sorts image pieces into different planes according to their spatial position, size, and resolution. It avoids sorting
overlapped pieces of the same resolution in the same plane. The second module sorts the pieces from different planes
according to their content by minimizing a cost function based on Mean Absolute Difference (MAD). The third module
relates neighboring pieces and determines output images. The proposed algorithm could be used at a pre-processing
stage in applications such as rendering, enhancement, retrieval etc, as these cannot be carried out without access to
original images as individual whole components.
In this paper, we present an image understanding algorithm for automatically identifying and ranking different image
regions into several levels of importance. Given a color image, specialized maps for classifying image content namely:
weighted similarity, weighted homogeneity, image contrast and memory colors are generated and combined to provide a
metric for perceptual importance classification. Further analysis yields a region ranking map which sorts the image
content into different levels of significance. The algorithm was tested on a large database of color images that consists of
the Berkeley segmentation dataset as well as many other internal images. Experimental results show that our technique
matches human manual ranking with 90% efficiency. Applications of the proposed algorithm include image rendering,
classification, indexing and retrieval.