7 January 1999 Learning to identify hundreds of flex-form documents
Author Affiliations +
Abstract
This paper presents an inductive document classifier (IDC) and its application to document identification. The most important features of the presented system are learning capability, handling large volumes of highly variant documents, and high performance. IDC learns new document types (variants) from examples. To this end, it automatically extracts discriminatory features from images of various document types, generates generalized descriptions, and stores them in the knowledge base. The classification of an unknown document is based on matching its description to all general rules in the knowledge base, and selecting the best matching document types as final classifications. Both learning and identification processes are fast and accurate. The speed is gained due to optimal image processing and feature construction procedures. Identification accuracy is very high despite the fact that the discriminatory features are generated solely based on page layout information. IDC operates in two separate components of an EDMS: Knowledge Base Maintainer (KBM) and Production Identifier (PI). KBM builds a knowledge base and maintains its integrity. PI utilizes learned knowledge during the identification processes.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Janusz Wnek, Janusz Wnek, "Learning to identify hundreds of flex-form documents", Proc. SPIE 3651, Document Recognition and Retrieval VI, (7 January 1999); doi: 10.1117/12.335815; https://doi.org/10.1117/12.335815
PROCEEDINGS
10 PAGES


SHARE
RELATED CONTENT

Engineering remote sensing technology
Proceedings of SPIE (August 17 1998)
Asymptotic cost in document conversion
Proceedings of SPIE (January 23 2012)
The Use Of Expert Systems For Preprocessing Image Data
Proceedings of SPIE (March 28 1988)

Back to Top