Emotion categorization of natural scene images represents a very useful task for automatic image analysis systems. Psychological experiments have shown that visual information at the emotion level is aggregated according to a set of rules. Hence, we attempt to discover the emotion descriptors based on the composition of visual word representation. First, the composition of visual word representation models each image as a matrix, where elements record the correlations of pairwise visual words. In this way, an image collection is modeled as a third-order tensor. Then we discover the emotion descriptors using a novel affective-probabilistic latent semantic analysis (affective-pLSA) model, which is an extension of the pLSA model, on this tensor representation. Considering that the natural scene image may evoke multiple emotional feelings, emotion categorization is carried out using the multilabel k-nearest-neighbor approach based on emotion descriptors. The proposed approach has been tested on the International Affective Picture System and a collection of social images from the Flickr website. The experimental results have demonstrated the effectiveness of the proposed method for eliciting image emotions.