Although the advantages to performing the collection of web objects at the site where the objects reside seem obvious, most of the popular search engines use their own centralized or coarsely distributed collectors (robots). Sending each object to the collector is almost always the worst option with respect to resource usage. An alternative is to distribute the collection process by sending the collector to the source site, which has the obvious advantage of distributing the significant computational load involved in cataloguing, as well as giving opportunities for summarizing and compression. In this paper, we propose a system for distributed object cataloguing over the world wide web via lightweight collector agents. This approach differs to previous approaches such as Harvest in using small Java- based collectors that can be very easily deployed on the site being indexed, thus allowing much finer grained distribution of the collection task.
Jesse S. Jin,
John A. Shepherd,
"Web object collection: here or there?", Proc. SPIE 3527, Multimedia Storage and Archiving Systems III, (5 October 1998); doi: 10.1117/12.325835; https://doi.org/10.1117/12.325835