PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
One of the critical problems of an off-line handwritten character reader system is determining which patterns to read and which to ignore, as a form or a document contains not only characters but also spots and deletions. As long as they don't fit conditions for rejection, they cause recognition errors. Particularly, patterns of deleted single-character are difficult to be distinguished from a character, because their sizes are almost the same as that of a character and their shapes have variety. In this article, we proposed a method to detect such deletions in handwritten digits using topological and geometrical image- features suitable for detecting them; Eular number, pixel density, number of endpoint, maximum crossing counts and number of peaks of histogram. For precise detection, thresholds of the image features are adaptively selected according to their recognition results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is not always easy to correctly extract all of road information about the corresponding road configuration automatically from map images. This is partly because individual map components are not separated explicitly from other map components, and partly because these map components are not necessarily well defined. Thus, it is difficult to overcome these drawbacks by data-driven image processing techniques. Even if model-driven knowledge-based techniques were applied, it is not successful efficiently because the model itself cannot be well specified. In this paper, we address an inference method of unextracted road information on the basis of case-based retrieval mechanism. Our idea is to infer unextracted road information, which could not yet be identified sufficiently by the model-driven knowledge-based approaches, by using case-bases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an efficient approach to identify tabular structures within either electronic or paper documents. The resulting T-Recs system takes word bounding box information as input, and outputs the corresponding logical text block units. Starting with an arbitrary word as block seed the algorithm recursively expands this block to all words that interleave with their vertical neighbors. Since even smallest gaps of table columns prevent their words from mutual interleaving, this initial segmentation is able to identify and isolate such columns. In order to deal with some inherent segmentation errors caused by isolated lines, overhanging words, or cells spawning more than one column, a series of postprocessing steps is added. These steps benefit form a very simple distinction between type 1 and type 2 blocks: type 1 blocks are those of at most one word per line, all others are of type 2. This distinction allows the selective application of heuristics to each group of blocks. The conjoint decomposition of column blocks into subsets of table cells leads to the final block segmentation of a homogeneous abstraction level. These segments serve the final layout analysis which identifies table environments and cells that are stretching over several rows and/or columns.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Map images are complex documents generated from several layers of information overlapped and printed on paper, and usually the only available information is the digitized image of the map. The recovery of the original layers of the map for analysis of its components independently would be useful but would require several steps to be completed. One first step could separate the image in conceptual layers by using basic spectral and spatial properties, giving layers corresponding to basic features in the amp image, which would serve as input for more sophisticated algorithms which could give as results more detailed information and so on, until a complete high-level description of the map information is obtainable. Extraction of the conceptual map layers is often a complex task since the pixels that correspond to the categories in a map image are spectrally and spatially mixed with the pixels of other classes. This paper presents the selective attention filter (SAF) which is able to filter out pixels that are not relevant to the information being extracted or enhance pixels of categories of interest. The SAF filter is robust in presence of noise and result of classification with images filtered with it are quantitatively better than results obtained with other commonly used filters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes current results of Ofr, a system for extracting and understanding mathematical expressions in documents. Such a tool could be really useful to be able to re-use knowledge in scientific books which are not available in electronic form. We currently also study use of this system for direct input of formulas with a graphical tablet for computer algebra system softwares. Existing solutions for mathematical recognition have problems to analyze 2D expressions like vectors and matrices. This is because they often try to use extended classical grammar to analyze formulas, relatively to baseline. But a lot of mathematical notations do not respect rules for such a parsing and that is the reason why they fail to extend text parsing technic. We investigate graph grammar and graph rewriting as a solution to recognize 2D mathematical notations. Graph grammar provide a powerful formalism to describe structural manipulations of multi-dimensional data. The main two problems to solve are ambiguities between rules of grammar and construction of graph.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pat work has shown the directional element feature to be useful for off-line recognition of handwritten Chinese characters. This paper presents refinements to this approach that significantly improve recognition performance. These refinements do the following: 1) assign fuzzy attributes to stroke edge pixels, 2) divide the character image into fuzzy membership cells, 3) smooth saw-toothed edges, 4) enhance information at the boundary of the character image, and 5) enhance horizontal and vertical stroke edges. The first two compensate for variations in stroke position, length inclination and width. Smoothing can correct some aliasing and dropout problems in the character images. The latter two emphasize the more important aspects of a character image. All refinements improve recognition, and when used together, they increase performance by about 10 percent: from 84.15 percent to 94.09 percent on a set of 3,755 Chinese characters. Experiments over all subsets of refinements are included. While not all refinements lead to equal improvement, all are necessary to achieve the highest level of recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes new features for recognizing handwritten Japanese Kanji characters. Many feature extraction methods have been studied for Kanji. In particular, stroke directional features are effective if the Kanji are well formed. Directional features are local shape descriptions of individual strokes and so are not robust against shape distortion, in particular, slanting, rotation, and the fluctuation in stroke direction seen in freely handwritten characters. Against this distortion, the 2D relative arrangement of constituent strokes is rather effective as a structural and global shape description. We focus on this fact and derive new features for measuring the 2D relationship between strokes. We derive new measures that express the 2D relationship from directional features of adjacent strokes, and use these as new features. The new features express the relative angle and the relative position of adjacent strokes as a structural and global shape description. Experiments show that the proposed new measures achieve very high recognition rates of about 95 percent for a data set in the square style and about 80 percent for a data set in the free style. These represent a reduction of about 20 percent in the error rates for both data sets achieved with the original directional features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper concerns automatic OCR of Bangla, a major Indian Language Script which is the fourth most popular script in the world. A Bangla OCR system has to recognize about 300 graphemic shapes among which 250 compound characters have quite complex stroke patterns. For recognition of such compound characters, feature based approaches are less reliable and template based approaches are less flexible to size and style variation of character font. We combine the positive aspects of feature based and template based approaches. Here we propose a run number based normalized template matching technique for compound character recognition. Run number vectors for both horizontal and vertical scanning are computed. As the number of scans may very from pattern to pattern, we normalize and abbreviate the vector. We prove that this normalized and abbreviated vector induces metric distance metric distance. Moreover, this vector is invariant to scaling, insensitive to character style variation and more effective for more complex-shaped characters than simple-shaped ones. We use this vector representation for matching within a group of compound characters. We notice that the matching is more efficient if the vector is reorganized with respect to the centroid of the pattern. We have tested our approach on a large set of segmented compounds characters at different point sizes as well as different styles. Italic characters are subject to preprocessing. The overall correct recognition rate is 99.69 percent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the increasing interest in document analysis research the number of available OCR, segmentation, noise removal and various other document analysis algorithms has grown considerably. However, algorithms are still purpose- specific, and to obtain optimal results, different algorithms for different situations are usually needed. The problem is to reliably evaluate the performance of an algorithm in a given situation. A framework for a benchmarking system for document analysis algorithms is presented. The system consists of a set of test cases for measuring the performance of different document analysis algorithms. The system is expandable, new algorithm types to be tested can be added by creating new test cases and benchmarking methods. The whole benchmarking process can be automated to allow mass performance testing with numerous algorithms. A set of weights is used to adjust the relative significance of the different aspects of a test case. The results of the benchmarking are expressed as a single value, which presents the performance of the algorithm in a given test case. The result can be easily compared with the results of other algorithms, which enables the ranking of the tested algorithms. Experiments with benchmarking system show promising results. The performance ranking also complies well with subjective human evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper discusses methodologies for automatically selecting document pages and zones form the UW databases, having the desired page/zone attributes. The selected pages can then be randomly partitioned into subsets for training and testing purposes. This paper also discusses three degradation methodologies that allow the developers of OCR and document recognition systems to create unlimited 'real- life' degraded images - with geometric distortions, coffee stains and water marks. Since the degraded images are created from the images in the UW databases, the nearly perfect original groundtruth files in the UW databases can be reused. The process of creating the additional document images, the associated groundtruth and attribute files require only a fraction of the original cost and time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we consider the problem of locating and extracting text from WWW images. A previous algorithm based on color clustering and connected components analysis works well as long as the color of each character is relatively uniform and the typography is fairly simple. It breaks down quickly, however, when these assumptions are violated. In this paper, we describe more robust techniques for dealing with this challenging problem. We present an improved color clustering algorithm that measures similarity based on both RGB and spatial proximity. Layout analysis is also incorporated to handle more complex typography. THese changes significantly enhance the performance of our text detection procedure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent document understanding (IDU) systems convert scanned document pages into an electronic format which preserves layout and logical document structure in addition to document content. MOst of the IDU experimental systems, however, lack the capability of full exploitation of recognition results. In this paper we present an integrated IDU system that processes documents all the way from recognition to full utilization using standard generalized markup language (SGML). The standardization and widespread use of SGML-based tools provides the means for filling the gap between document recognition and seamless document reuse. The conversion process involves OCR of a multipage document, document structure analysis, processing of tabular data and mathematical expressions, and generation of the final SGML description. Document structure analysis is reduce here to parsing OCR results and recreating document structure by performing fuzzy searches for standard phrases and format analysis. Tabular data processing utilizes OCR results with positional data, horizontal lines and heuristic rules to determine cell boundaries and contents. Recognition of mathematical expressions involves OCR on an extended symbol set, and equation structure recognition via transformations on a tree representation. The transformations are ordered and involve connecting of separated symbols, context-sensitive OCR correction, extraction of horizontally aligned subexpressions, subscript and superscript processing, and a general processing of symbols detected above or below the target symbol.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The number of WWW documents available to users of the Internet is growing at an incredible rate. Therefore, it is becoming increasingly important to develop systems that aid users in searching, filtering, and retrieving information from the Internet. Currently, only a few prototype systems catalog and index images in Web documents. To greatly improve the cataloging and indexing of images on the Web, we have developed a prototype rule-based systems that detects the content images in Web documents. Content images are images that are associated with the main content of Web documents, as opposed to a multitude of other images that exist in Web documents for different purposes, such as decorative, advertisement and logo images. We present a system that uses decision tree learning for automated rule induction for the content images detection system. The system uses visual features, text-related features and the document context of images in concert for fast and effective content image detection in Web documents. We have evaluated the system by collecting more than 1200 images from 4 different Web sites and we have achieved an overall classification accuracy of 84 percent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Document classification is one of the fundamental technologies prior to document routing, document understanding, and information extraction algorithms. Pattern matchers with rule-based components are in use in news agencies with electronic text as input. However, classification of OCR documents must deal with the ambiguities of the underlying OCR engine. The ambiguities of character segmentation and classification lead towards a directed graph of characters as the results of the OCR process - the so-called character hypothesis lattice. This paper deals with techniques to enhance the pattern matcher in order to cope with CHLs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Duplicate documents are frequently found in large databases of digital documents, such as those found in digital libraries or in the government declassification effort. Efficient duplicate document detection is important not only to allow querying for similar documents, but also to filter out redundant information in large document databases. We have designed three different algorithm to identify duplicate documents. The first algorithm is based on features extracted from the textual content of a document, the second algorithm is based on wavelet features extracted from the document image itself, and the third algorithm is a combination of the first two. These algorithms are integrated within the DocBrowse system for information retrieval from document images which is currently under development at MathSoft. DocBrowse supports duplicate document detection by allowing (1) automatic filtering to hide duplicate documents, and (2) ad hoc querying for similar or duplicate documents. We have tested the duplicate document detection algorithms on 171 documents and found that text-based method has an average 11-point precision of 97.7 percent while the image-based method has an average 11- point precision of 98.9 percent. However, in general, the text-based method performs better when the document contains enough high-quality machine printed text while the image- based method performs better when the document contains little or no quality machine readable text.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
MANICURE is a document processing system that provides integrated facilities for creating electronic forms of printed materials. In this paper the functionalities supported by MANICURE and their implementations are described. In particular, we provide information on specific modules dealing with automatic detection and correction of OCR errors and automatic markup of logical components of the text. We further show that the various text formats produced by MANICURE can be used by web browsers and/or be manipulated by search routines to highlight the requested information on document images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Methods for embedding arbitrary digital data within an iconic representation of a document page image are summarized. The result of the encoding is a small iconic image containing the iconic data as small rectangular blocks of pixels, along with a mixture of reduced document image components such as graphics, text and images. As a first step in ensuring data recovery, the encoder verifies that the iconic image can contain the entire message, and that it can be decoded correctly from the noiseless pre-printing image. To retrieve the message, the data must be separated from the other components in the iconic image and decoded. The decoder is assumed to have no prior information about the location of data within the icon, the encoding channels in which it is encoded, or other meta-data about the message, such as the size or the amount of error-correction encoding. There are three major steps in the decoding process: segmentation, to identify and serialize the datablocks in the icon; measurement of encoding parameters, including determination of the encoding channels; and extraction of the message. Errors can be introduced into the decoding process at a number of places, and it is necessary to provide mechanisms for detecting and correcting them. For the parameters used here, datablocks from icons generated at reductions of up to 7x are robustly decoded, and error-free message decoding is typically achieved for icons derived from arbitrary pages of scanned documents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A method for segmentation of elongated shapes is presented, including two stages: (1) Thinning of elongated shapes into chain coded lines. (2) Extraction of the main features. Thinning process: a square perimeter is developed around each current pixel, initially at level 255, belonging to a line being extracted. THe size of the square is progressively increased until one or more stick(s), frames by background pixels, appear(s) on the perimeter. From the beginning and the final indices of each stick we deduce the Freeman code leading to the following pixel on the line. Generally, two sticks are present on the square perimeter. One corresponds to the backward direction. To discard the non valid stick, each new detected pixel is marked by lowering its value by one shift right. In presence of a fork, or crossing point, there are more than one valid stick: The closest direction to the previous one is chosen; the current pixel is marked and stored in a list of branching points, for later processing. Filtering and segmentation: median filtering of extended codes, obtained from the corrected sums of 4 consecutive Freeman's codes allows to eliminate much of the quantization noise, without altering significant direction changes, and to segment the line into straight segments, arcs and corners.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a new technique for the restoration of low-resolution grayscale text from JPEG- compressed images. An initial evaluation of the JPEG image is performed, using the histogram and co-occurrence matrix, to estimate the distribution of the uncompressed pixels. The results of this estimation are used to create a 2D Gibbs- Markov random field (GMRF) to model the text. Cliques and energy potentials are formed to properly represent text-like images. The sum of clique energy potentials is calculated to measure how well each given JPEG 8 X 8 block of data matches the prior Gibbs-Markov model. The given quantized JPEG discrete cosine transform (DCT) coefficients, combined with the known JPEG quantization matrix, provide a constrained range for the DCT coefficients of the restored image. Using nonlinear optimization techniques, the image is found which is the best combination of the prior GMRF model and the given DCT coefficients.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As digital cameras become cheaper and more powerful, driven by the consumer digital photography market, we anticipate significant value in extending their utility as a general office peripheral by adding a paper scanning capability. The main technical challenges in realizing this new scanning interface are insufficient resolution, blur and lighting variations. We have developed an efficient technique for the recovery of text from digital camera images, which simultaneously treats these three problems, unlike other local thresholding algorithms which do not cope with blur and resolution enhancement. The technique first performs deblurring by deconvolution, and then resolution enhancement by linear interpolation. We compare the performance of a threshold derived from the local mean and variance of all pixel values within a neighborhood with a threshold derived from the local mean of just those pixels with high gradient. We assess performance using OCR error scores.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image thinning methods can be divided into two categories based on the type of image they are designed to thin: binary image thinning and grayscale image thinning. Typically, grayscale images are threshold to allow binary image thinning methods to be applied. However, thresholding grayscale images may introduce uneven object contours that are a difficulty for binary methods. The scale-space approach to image thinning includes scale as an additional dimension where images at scale t are derived from the original image at scale zero by applying the Gaussian filter. As scale increase finer image structure is suppressed. By treating the image as a 3D surface with intensity as the third dimension, the most prominent ridge- line (MPRL) is the union of topographical features: peak, ridge, and saddle point, such that each has greatest contrast with its surroundings. The MPRL is computed by minimizing its second spatial derivative over scale. The result forms a trajectory in scale-space. The thinned image is the projection of the MPRL on the base level. The MPRL has been implemented using the image pyramid data structure, and has been applied to binary and grayscale images of printed characters. Experimental results show that the method is less sensitive to contour unevenness. It also offers the option of choosing different levels of fine structure to include.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A method for restoring degraded faxed document images using the patterns of pixels that construct small areas in a document is proposed. The method effectively restores faxed images that contain the halftone textures and/or density salt-and-pepper noise that degrade OCR system performance. The halftone image restoration process, white-centered 3 X 3 pixels, in which black-and-white pixels alternate, are identified first using the distribution of the pixel values as halftone textures, and then the white center pixels are inverted to black. To remove high-density salt- and-pepper noise, it is assumed that the degradation is caused by ill-balanced bias and inappropriate thresholding of the sensor output which results in the addition of random noise. Restored image can be estimated using an approximation that uses the inverse operation of the assumed original process. In order to process degraded faxed images, the algorithms mentioned above are combined. An experiment is conducted using 24 especially poor quality examples selected from data sets that exemplify what practical fax- based OCR systems cannot handle. The maximum recovery rate in terms of mean square error was 98.8 percent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image binarization is a difficult task for documents with text over textured or shaded backgrounds, poor contrast, and/or considerable noise. Current optical character recognition (OCR) and document analysis technology do not handle such documents well. We have developed a simple yet effective algorithm for document image clean-up and binarization. The algorithm consists of two basic steps. In the first step, the input image is smoothed using a low-pass filter. The smoothing operation enhances the text relative to any background texture. This is because background texture normally has higher frequency than text does. The smoothing operation also removes speckle noise. In the second step, the intensity histogram of the smoothed image is computed and a threshold automatically selected as follows. For black text, the first peak of the histogram corresponds to text. Thresholding the image at the value of the valley between the first and second peaks of the histogram binarizes the image well. In order to reliably identify the valley, the histogram is smoothed by a low-pass filter before the threshold is computed. The algorithm has been applied to some 50 images from a wide variety of source: digitized video frames, photos, newspapers, advertisements in magazines or sales flyers, personal checks, etc. There are 21820 characters and 4406 words in these images. 91 percent of the characters and 86 percent of the words are successfully cleaned up and binarized. A commercial OCR was applied to the binarized text when it consisted of fonts which were OCR recognizable. The recognition rate was 84 percent for the characters and 77 percent for the words.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of word spotting in handwritten documents is addressed as a form of visual keyword search. In the particular approach taken, explicit references to the lexical structure of handwritten data are avoided, which makes it useful for developing applications where the information to be retrieved is not limited to handwriting and may consist of other visual objects, such as hand drawings. The line-oriented structure of handwritten documents is utilized to facilitate the search. In particular, sequential processing methods based on dynamic programming techniques are used during recognition to take advantage of the implicit time information and improve spotting performance. Feature profiles based on the ink- background transition representation of binary images are used to facilitate the sequential data obtained from the Archives of the Indies. Encouraging results of 90 percent target word detection at the cost of one false alarm are reported.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Text/graphics segmentation is a fundamental step in engineering drawing images analysis. However, it is hard to handle when texts touch the graphical entities. In this paper, a solving method called 'the shielding method' is proposed. The main idea lies in labeling the basic graphical entities, which are possible touched by texts, in other words, in other words, it is to shield graphical objects in a line drawing image prior to text/graphics segmentation. A usual long line scanning method can be used to detect horizontal and vertical object lines, and then touching characters can be segmented efficiently at a fast speed. For more common situation, a generic object labeling process based on run-based graph is introduced. Test has shown good results of segmentation compared to that without using this method in a drawing with a lot of graphics touching characters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.