24 March 2014 Form classification and retrieval using bag of words with shape features of line structures
Author Affiliations +
Abstract
In this paper a document form classification and retrieval method using Bag of Words and newly introduced local shape features of form lines is proposed. In a preprocessing step the document is binarized and the form lines (solid and dotted) are detected. The shape features are based on the line information describing local line structures, e.g. line endings, crossings, boxes. The dominant line structures build a vocabulary for each form class. According to the vocabulary an occurrence histogram of structures of form documents can be calculated for the classification and retrieval. The proposed method has been tested on a set of 489 documents and 9 different form classes.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Florian Kleber, Florian Kleber, Markus Diem, Markus Diem, Robert Sablatnig, Robert Sablatnig, } "Form classification and retrieval using bag of words with shape features of line structures", Proc. SPIE 9021, Document Recognition and Retrieval XXI, 902107 (24 March 2014); doi: 10.1117/12.2037210; https://doi.org/10.1117/12.2037210
PROCEEDINGS
9 PAGES


SHARE
RELATED CONTENT


Back to Top