28 January 2008 Understanding the practical limits of the Gnutella P2P system: an analysis of query terms and object name distributions
Author Affiliations +
Abstract
A number of prior efforts analyzed the behavior of popular peer-to-peer (P2P) systems and proposed ways for maintaining the overlays as well as methods for searching for contents using these overlays. However, little was known about how successful users could be in locating the shared objects in these system. There might be a mismatch between the way content creators named objects and the way such objects were queried by the consumers. Our aim was to examine the terms used in the queries and shared object names in the Gnutella file-sharing system. We analyzed the object names of over 20 million objects collected from 40,000 peers as well as terms from over 230,000 queries. We observed that almost half (44.4%) of the queries had no matching objects in the system regardless of the overlay or search mechanism used to locate the objects. We also evaluated the query success rates against random peer groups of various sizes (200, 1K, 2K, 3K, 4K, 5K, 10K and 20K peers sampled from the full 40,000 peers). We showed that the success rates increased rapidly from 200 to 5,000 peers, but only exhibited modest improvements when increasing the number of peers beyond 5,000. Finally, we observed Zipf-like distribution for query terms and the object names. However, the relative popularity of a term in the object names did not correlate with the terms popularity in the query workload. This observation affected the ability of hybrid P2P systems to guide searches by creating a synopsis of the peer object names. A synopsis created by using the distribution of terms in the object names need not represent relevant terms for the query. Our results can be used to guide the design of future P2P systems that are optimized for the observed object names and user query behavior.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
William Acosta, William Acosta, Surendar Chandra, Surendar Chandra, } "Understanding the practical limits of the Gnutella P2P system: an analysis of query terms and object name distributions", Proc. SPIE 6818, Multimedia Computing and Networking 2008, 681807 (28 January 2008); doi: 10.1117/12.775128; https://doi.org/10.1117/12.775128
PROCEEDINGS
12 PAGES


SHARE
RELATED CONTENT


Back to Top