23 December 1999 Influence of data set splitting method on similarity indexing performance
Author Affiliations +
Similarity indexing is the supporting technology for fast content-based retrieval of large media databases, and many similarity index structures have been proposed. Compared with the many structures present, less attention has been paid to performance evaluation of index structures and theoretic analysis son factors influencing index performance. In this paper, we attempt to solve part of the problem and focus our research on analyzing the influence of data splitting methods. To give a formal definition for index structure performance evaluation, we introduce the query distribution probability concept and propose using average search cost to evaluate the performance of a similarity indexing structure. We choose the simplest case of similarity indexing - nearest-neighbor search in our discussion and deduce an expression for the average search cost function. Based on analysis of the expression, we proposed some criteria that may be useful in index design and implementation. Then we extend these conclusions to the general similarity indexing case and use these criteria as general rules in index design and implementation. Basic thoughts and analysis are detailed, as well as experiment results.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xuesheng Bai, Xuesheng Bai, Guang-you Xu, Guang-you Xu, Yuanchun Shi, Yuanchun Shi, Shi-Qiang Yang, Shi-Qiang Yang, } "Influence of data set splitting method on similarity indexing performance", Proc. SPIE 3972, Storage and Retrieval for Media Databases 2000, (23 December 1999); doi: 10.1117/12.373594; https://doi.org/10.1117/12.373594

Back to Top