Image evaluation tasks are often conducted using paired comparisons or ranking. To elicit interval scales, both methods rely on Thurstone's Law of Comparative Judgment in which objects closer in psychological space are more often confused in preference comparisons by a putative discriminal random process. It is often debated whether paired comparisons and ranking yield the same interval scales. An experiment was conducted to assess scale production using paired comparisons and ranking. For this experiment a Pioneer Plasma Display and Apple Cinema Display were used for stimulus presentation. Observers performed rank order and paired comparisons tasks on both displays. For each of five scenes, six images were created by manipulating attributes such as lightness, chroma, and hue using six different settings. The intention was to simulate the variability from a set of digital cameras or scanners. Nineteen subjects, (5 females, 14 males) ranging from 19-51 years of age participated in this experiment. Using a paired comparison model and a ranking model, scales were estimated for each display and image combination yielding ten scale pairs, ostensibly measuring the same psychological scale. The Bradley-Terry model was used for the paired comparisons data and the Bradley-Terry-Mallows model was used for the ranking data. Each model was fit using maximum likelihood estimation and assessed using likelihood ratio tests. Approximate 95% confidence intervals were also constructed using likelihood ratios. Model fits for paired comparisons were satisfactory for all scales except those from two image/display pairs; the ranking model fit uniformly well on all data sets. Arguing from overlapping confidence intervals, we conclude that paired comparisons and ranking produce no conflicting decisions regarding ultimate ordering of treatment preferences, but paired comparisons yield greater precision at the expense of lack-of-fit.