24 March 2014 Semi-automated document image clustering and retrieval
Author Affiliations +
Abstract
In this paper a semi-automated document image clustering and retrieval is presented to create links between different documents based on their content. Ideally the initial bundling of shuffled document images can be reproduced to explore large document databases. Structural and textural features, which describe the visual similarity, are extracted and used by experts (e.g. registrars) to interactively cluster the documents with a manually defined feature subset (e.g. checked paper, handwritten). The methods presented allow for the analysis of heterogeneous documents that contain printed and handwritten text and allow for a hierarchically clustering with different feature subsets in different layers.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Markus Diem, Florian Kleber, Stefan Fiel, Robert Sablatnig, "Semi-automated document image clustering and retrieval", Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210M (24 March 2014); doi: 10.1117/12.2043010; https://doi.org/10.1117/12.2043010
PROCEEDINGS
10 PAGES


SHARE
RELATED CONTENT

Tools for texture- and color-based search of images
Proceedings of SPIE (June 03 1997)
New perspective on visual information retrieval
Proceedings of SPIE (December 18 2003)
Novel image retrieval technique using salient edges
Proceedings of SPIE (December 19 2001)
Efficient retrieval in medical image databases
Proceedings of SPIE (October 04 2000)
Region labelling using a point-based coherence criterion
Proceedings of SPIE (January 16 2006)
Concept-based retrieval of biomedical images
Proceedings of SPIE (May 19 2003)

Back to Top