21 February 2012 Similarity pyramid: browsing a document database with respect to visual similarity
Author Affiliations +
Managing large document databases has become an important task. Sorting documents with respect to their visual similarity and layout features, and visualization of the whole document database is a desirable application. A user may wish to search for documents in a database that are similar to a query in temrs of their stylistic features, or he/she may want to browse the whole database. In these tasks, clustering similar documents and organizing the document database with respect to the clusters is preferable to presenting documents in a random order. In this paper, we propose organization of single-page documents in a 3-D hierarchical structure called a similarity pyramid. The pyramid is constructed from a stack of document database embeddings on a 2-D surface with the help of a nonlinear dimensionality reduction algorithm called Isomap. The mapping algorithm preserves similarity distances between documents by mapping documents that are close to each other in a feature space to points on low-dimensional surface that are close to each other. Higher levels of the pyramid consist of document image icons that represent a large group of roughly similar documents, whereas lower levels contain document image icons representing small groups of very similar documents. A user can browse the database by moving along a certain level of a pyramid by moving between dierent levels
© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ildus Ahmadullin, Jan Allebach, "Similarity pyramid: browsing a document database with respect to visual similarity", Proc. SPIE 8302, Imaging and Printing in a Web 2.0 World III, 83020M (21 February 2012); doi: 10.1117/12.915679; https://doi.org/10.1117/12.915679

Back to Top