Evaluation of image quality is important for many image processing systems, such as those used for acquisition, compression, restoration, enhancement, or reproduction. Its measurement is often accompanied by user studies, in which a group of observers rank or rate results of several algorithms. Such user studies, known as subjective image quality assessment experiments, can be very time consuming and do not guarantee conclusive results. This paper is intended to help design an efficient and rigorous quality assessment experiment. We propose a method of limiting the number of scenes that need to be tested, which can significantly reduce the experimental effort and still capture relevant scene-dependent effects. To achieve it, we employ a clustering technique and evaluate it on the basis of compactness and separation criteria. The correlation between the results obtained from a set of images in an initial database and the results received from reduced experiment are analyzed. Finally, we propose a procedure for reducing the initial scenes number. Four different assessment techniques were tested: single stimulus, double stimulus, forced choice, and similarity judgments. We conclude that in most cases, 9 to 12 judgments per evaluated algorithm for a large scene collection is sufficient to reduce the initial set of images.