Translator Disclaimer
Paper
18 December 2001 Document image representation using XML technologies
Author Affiliations +
Proceedings Volume 4670, Document Recognition and Retrieval IX; (2001) https://doi.org/10.1117/12.450720
Event: Electronic Imaging, 2002, San Jose, California, United States
Abstract
Electronic documents have gained wide acceptance due to the ease of editing and sharing of information. However, paper documents are still widely used in many environments. Moving into a paperless and distributed office has become a major goal for document image research. A new approach for form document representation is presented. This approach allows for electronic document sharing over the World Wide Web (WWW) using Extensible Markup Language (XML) technologies. Each document is mapped into three different views, an XML view to represent the preprinted and filled-in data, an XSL (Extensible style Sheets) view to represent the structure of the document, and a DTD (Document Type Definition) view to represent the document grammar and field constraints. The XML and XSL views are generated from a document template, either automatically using image processing techniques, or semi-automatically with minimal user interaction. The DTD representation may be fixed for general documents or may be generated semi-automatically by mining a number of filled-in document examples. Document templates need to be entered once to create the proposed representation. Afterwards, documents may be displayed, updated, or shared over the web. The merits of this approach are demonstrated using a number of examples of widely used forms.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Essam A. El-Kwae and Kusuma Harnath Atmakuri "Document image representation using XML technologies", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); https://doi.org/10.1117/12.450720
PROCEEDINGS
12 PAGES


SHARE
Advertisement
Advertisement
RELATED CONTENT

Multimodal browsing using VoiceXML
Proceedings of SPIE (June 23 2003)
Why can't I manage my digital images like MP3s? The...
Proceedings of SPIE (January 17 2005)
Web-based document image processing
Proceedings of SPIE (December 20 1999)

Back to Top