16 January 2006 Document clustering: applications in a collaborative digital library
Author Affiliations +
Abstract
This paper introduces a document clustering method within a commercial document repository, FileShare(R). FileShare(R) is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft(R) Internet Explorer(R), Netscape(R) or Opera(R)) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare(R) repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Fuad Rahman, Fuad Rahman, Aman Kumar, Aman Kumar, Yuilya Tarnikova, Yuilya Tarnikova, Hassan Alam, Hassan Alam, } "Document clustering: applications in a collaborative digital library", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670K (16 January 2006); doi: 10.1117/12.650161; https://doi.org/10.1117/12.650161
PROCEEDINGS
8 PAGES


SHARE
RELATED CONTENT

Data processing and control system software for SPM
Proceedings of SPIE (July 16 2002)
Recent trends in print portals and Web2Print applications
Proceedings of SPIE (January 19 2009)
What good is visualization: three experiments
Proceedings of SPIE (May 14 1998)
WISE: a content-based Web image search engine
Proceedings of SPIE (December 22 2000)

Back to Top