3 April 1997 Use of document structure analysis to retrieve information from documents in digital libraries
Author Affiliations +
Abstract
This paper describes an approach to retrieving information from document images stored in a digital library by means of knowledge-based layout analysis and logical structure derivation techniques. Queries on document image content are categorized in terms of the type of information that is desired, and are parsed to determine the type of document from which information is desired, the syntactic level of the information desired, and the level of analysis required to extract the information. Using these clauses in the query, a set of salient documents are retrieved, layout analysis and logical structure derivation are performed on the retrieved documents, and the documents are then analyzed in detail to extract the relevant logical components. A 'document browser' application, being developed based on this approach, allows a user to interactively specify queries on the documents in the digital library using a graphical user interface, provides feedback about the candidate documents at each stage of the retrieval process, and allows refinements of the query based on the intermediate results of the search. Results of a query are displayed either as an image or as formatted text.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Debashish Niyogi, Sargur N. Srihari, "Use of document structure analysis to retrieve information from documents in digital libraries", Proc. SPIE 3027, Document Recognition IV, (3 April 1997); doi: 10.1117/12.270074; https://doi.org/10.1117/12.270074
PROCEEDINGS
12 PAGES


SHARE
RELATED CONTENT

Non-Manhattan layout extraction algorithm
Proceedings of SPIE (March 21 2013)
DRR is a teenager
Proceedings of SPIE (January 28 2008)
Style comparisons in calligraphy
Proceedings of SPIE (January 23 2012)

Back to Top