7 March 1996 Document zone classification using sizes of connected components
Author Affiliations +
In this paper, we describe a feature based supervised zone classifier using only the knowledge of the widths and the heights of the connected-components within a given zone. The distribution of the widths and the heights of the connected-components is encoded into a n multiplied by m dimensional vector in the decision making. Thus, the computational complexity is in the order of the number of connected-components within the given zone. A binary decision tree is used to assign a zone class on the basis of its feature vector. The training and testing data sets for the algorithm are drawn from the scientific document pages in the UW-I database. The classifier is able to classify each given scientific and technical document zone into one of the eight labels: text of font size 8-12, text of font size 13-18, text of font size 19-36, display math, table, halftone, line drawing, and ruling, in real time. The classifier is able to discriminate text from non-text with an accuracy greater than 97%.
© (1996) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jisheng Liang, Jisheng Liang, Ihsin T. Phillips, Ihsin T. Phillips, Jaekyu Ha, Jaekyu Ha, Robert M. Haralick, Robert M. Haralick, } "Document zone classification using sizes of connected components", Proc. SPIE 2660, Document Recognition III, (7 March 1996); doi: 10.1117/12.234719; https://doi.org/10.1117/12.234719


Speeding up Boosting decision trees training
Proceedings of SPIE (October 07 2015)
Texel-based image classification with orthogonal bases
Proceedings of SPIE (April 28 2016)
Shape space object recognition
Proceedings of SPIE (October 21 2001)
Estimation of shape mixture by granulometric methods
Proceedings of SPIE (June 29 1994)
Measuring document image skew and orientation
Proceedings of SPIE (March 29 1995)

Back to Top