16 January 2006 Document clustering: applications in a collaborative digital library
Author Affiliations +
Abstract
This paper introduces a document clustering method within a commercial document repository, FileShare(R). FileShare(R) is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft(R) Internet Explorer(R), Netscape(R) or Opera(R)) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare(R) repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Fuad Rahman, Fuad Rahman, Aman Kumar, Aman Kumar, Yuilya Tarnikova, Yuilya Tarnikova, Hassan Alam, Hassan Alam, "Document clustering: applications in a collaborative digital library", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670K (16 January 2006); doi: 10.1117/12.650161; https://doi.org/10.1117/12.650161
PROCEEDINGS
8 PAGES


SHARE
RELATED CONTENT

Mining the SDSS SkyServer SQL queries log
Proceedings of SPIE (May 11 2016)
Recent trends in print portals and Web2Print applications
Proceedings of SPIE (January 18 2009)
MetaSEEk: a content-based metasearch engine for images
Proceedings of SPIE (December 22 1997)
Dynamic neighborhoods: browsing the World Wide Web together
Proceedings of SPIE (September 15 1998)

Back to Top