In this paper, we propose a framework for image classification. An image is represented by multiple feature channels which are computed by the bag-of-words model and organized in a spatial pyramid. The main difference among feature channels resides in what type of base descriptor in the bag-of-words model is extracted. The overall features achieve different levels of the trade-off between discriminative power and invariance. Support vector machines with kernels based on histogram intersection distance and χ2 distance are used to obtain a posteriori probabilities of the image in each feature channel. Then, four data fusion strategies are proposed to combine intermediate results from multiple feature channels. Experimental results show that almost all the proposed strategies can significantly improve the classification accuracy as compared with the single cue methods and, especially, prod-max performs best in all experiments. The framework appears to be general and capable of handling diverse classification problems due to the multiple-feature-channel-based representation. Also, it is demonstrated that the proposed method achieves higher, or comparable, classification accuracies with less computational cost as compared with other multiple cue methods on challenging benchmark datasets.