7 March 1996 Document zone classification using sizes of connected components
Author Affiliations +
Abstract
In this paper, we describe a feature based supervised zone classifier using only the knowledge of the widths and the heights of the connected-components within a given zone. The distribution of the widths and the heights of the connected-components is encoded into a n multiplied by m dimensional vector in the decision making. Thus, the computational complexity is in the order of the number of connected-components within the given zone. A binary decision tree is used to assign a zone class on the basis of its feature vector. The training and testing data sets for the algorithm are drawn from the scientific document pages in the UW-I database. The classifier is able to classify each given scientific and technical document zone into one of the eight labels: text of font size 8-12, text of font size 13-18, text of font size 19-36, display math, table, halftone, line drawing, and ruling, in real time. The classifier is able to discriminate text from non-text with an accuracy greater than 97%.
© (1996) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jisheng Liang, Ihsin T. Phillips, Jaekyu Ha, Robert M. Haralick, "Document zone classification using sizes of connected components", Proc. SPIE 2660, Document Recognition III, (7 March 1996); doi: 10.1117/12.234719; https://doi.org/10.1117/12.234719
PROCEEDINGS
8 PAGES


SHARE
Back to Top