Constructing a benchmark for content-based image retrieval (CBIR) applications is an important task because researchers in this area highly depend on experiments to compare different systems. Image collection, concept annotation and performance evaluation are the three main issues that should be considered carefully. Based on our previous work and experiments on both Corel image collection and TRECVID dataset, we present some basic principles of constructing a benchmark for CBIR applications. According to our experience in the collaborative annotation of TRECVID 2005 data, we propose a hierarchical concept annotation strategy to produce ground truth for the CBIR benchmark image collection. To address the conflicts among collaborative annotations from multiple annotators, we present a fuzzy annotation method, in which a membership function is defined to indicate the probability that an image contains a given concept. Evaluation criteria corresponding to the fuzzy annotation method are also presented so as to give a more reasonable evaluation of performance for different CBIR applications.