Identifying figure captions has wide applications in producing high quality e-books such as kindle books or ipad
books. In this paper, we present a rule-based system to detect horizontal figure captions in old-style documents.
Our algorithm consists of three steps: (i) segment images into regions of different types such as text and figures,
(ii) search the best caption region candidate based on heuristic rules such as region alignments and distances,
and (iii) expand caption regions identified in step (ii) with its neighboring text-regions in order to correct oversegmentation
We test our algorithm using 81 images collected from old-style books, with each image containing at least
one figure area. We show that the approach is able to correctly detect figure captions from images with different
layouts, and we also measure its performances in terms of both precision rate and recall rate.
Recognizing texts from images taken by mobile phones with low resolution has wide applications. It has been
shown that a good image binarization can substantially improve the performances of OCR engines. In this paper,
we present a framework to segment texts from outdoor images taken by mobile phones using color features. The
framework consists of three steps: (i) the initial process including image enhancement, binarization and noise
filtering, where we binarize the input images in each RGB channel, and apply component level noise filtering;
(ii) grouping components into blocks using color features, where we compute the component similarities by
dynamically adjusting the weights of RGB channels, and merge groups hierachically, and (iii) blocks selection,
where we use the run-length features and choose the Support Vector Machine (SVM) as the classifier.
We tested the algorithm using 13 outdoor images taken by an old-style LG-64693 mobile phone with 640x480
resolution. We compared the segmentation results with Tsar's algorithm, a state-of-the-art camera text detection
algorithm, and show that our algorithm is more robust, particularly in terms of the false alarm rates. In addition,
we also evaluated the impacts of our algorithm on the Abbyy's FineReader, one of the most popular commercial
OCR engines in the market.
Document layout analysis is a key step in document image understanding with wide applications in document
digitization and reformatting. Identifying correct layout from noisy scanned images is especially challenging.
In this paper, we introduce a semi-supervised learning framework to detect text-lines from noisy document
images. Our framework consists of three steps. The first step is the initial segmentation that extracts text-lines
and images using simple morphological operations. The second step is a grouping-based layout analysis that
identifies text-lines, image zones, column separator and vertical border noise. It is able to efficiently remove the
vertical border noises from multi-column pages. The third step is an online classifier that is trained with the high
confidence line detection results from Step Two, and filters out noise from low confidence lines. The classifier
effectively removes speckle noises embedded inside the content zones.
We compare the performance of our algorithm to the state-of-the-art work in the field on the UW-III database.
We choose the results reported by the Image Understanding Pattern Recognition Research (IUPR) and Scansoft
Omnipage SDK 15.5. We evaluate the performances at both the page frame level and the text-line level. The
result shows that our system has much lower false-alarm rate, while maintains similar content detection rate. In
addition, we also show that our online training model generalizes better than algorithms depending on offline