18 December 2001 Document image representation using XML technologies
Author Affiliations +
Electronic documents have gained wide acceptance due to the ease of editing and sharing of information. However, paper documents are still widely used in many environments. Moving into a paperless and distributed office has become a major goal for document image research. A new approach for form document representation is presented. This approach allows for electronic document sharing over the World Wide Web (WWW) using Extensible Markup Language (XML) technologies. Each document is mapped into three different views, an XML view to represent the preprinted and filled-in data, an XSL (Extensible style Sheets) view to represent the structure of the document, and a DTD (Document Type Definition) view to represent the document grammar and field constraints. The XML and XSL views are generated from a document template, either automatically using image processing techniques, or semi-automatically with minimal user interaction. The DTD representation may be fixed for general documents or may be generated semi-automatically by mining a number of filled-in document examples. Document templates need to be entered once to create the proposed representation. Afterwards, documents may be displayed, updated, or shared over the web. The merits of this approach are demonstrated using a number of examples of widely used forms.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Essam A. El-Kwae, Essam A. El-Kwae, Kusuma Harnath Atmakuri, Kusuma Harnath Atmakuri, } "Document image representation using XML technologies", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); doi: 10.1117/12.450720; https://doi.org/10.1117/12.450720


Back to Top