While methods such as Principle Component Analysis, Linear/Fisher Discriminate Analysis, and Hidden Markov Models provide useful similarity measures between face images, they are not based on factors that humans use to perceive facial similarity. This can make it difficult for humans to work collaboratively with face retrieval systems. For example, if a witness to a crime uses a query-by-example paradigm to retrieve the face of the perpetrator from a database of mug-shots, and if the similarity measures used for retrieval are not based on facial features that are salient or important to humans, the retrievals will likely be of limited value. Based on the observation that humans tend to name things that are particularly salient or important to them, this research uses words (such as bearded, bespectacled, big eared, blond, buck-toothed, bug-eyed, curly-haired, dimpled, freckled, gap-toothed, long-faced, snub-nosed, thin-lipped, or wrinkled) to manually index face images. Pair-wise similarity values are then derived from the resulting feature vectors and are compared to ground-truth similarity values, which have been established by having humans hierarchically sort the same set of face images. This comparison indicates which words are most important for indexing the face images, allows the computation of a weighting factor for each word to enhance the overall quality of indexing, and suggests which facial features might provide a more intuitive basis for evaluating similarity.