We assume that the goal of content based image retrieval is to find images which are both semantically and visually relevant to users based on image descriptors. These descriptors are often provided by an example image--the query by example paradigm. In this work we develop a very simple method for evaluating such systems based on large collections of images with associated text. Examples of such collections include the Corel image collection, annotated museum collections, news photos with captions, and web images with associated text based on heuristic reasoning on the structure of typical web pages (such as used by Google(tm)). The advantage of using such data is that it is plentiful, and the method we propose can be automatically applied to hundreds of thousands of queries. However, it is critical that such a method be verified against human usage, and to do this we evaluate over 6000 query/result pairs. Our results strongly suggest that at least in the case of the Corel image collection, the automated measure is a good proxy for human evaluation. Importantly, our human evaluation data can be reused for the evaluation of any content based image retrieval system and/or the verification of additional proxy measures.