Content-based image classification is a wide research field addressing the problem of categorizing images according to their content. A common way to approach content-based classification is through learning from examples --- a given class of images is described by means of a suitable training set of data. The main drawback of this approach is the fact that collecting data to build homogeneous training and validation sets is a boring and time consuming task, even if the Web can help providing a potentially inexhaustible source of images. In this paper we present a system to automatically download images from the Web and a selection of techniques useful to prune the images downloaded according to some criteria. These techniques work as filters at various degrees of complexity: some are simple measurements other are image classifiers themselves. We focus on two critical ones (monochrome vs color images and photos vs graphics) showing their effectiveness on a manually labeled validation set of data. We conclude the paper analyzing the overall performance of the system with an a posteriori analysis of the results obtained in a few run.