We propose to fuse an image's local and global information for scene classification. First, the image's local information is represented by context information exploited using spatial pyramid matching. Images are segmented to patches by a regular grid, and scale invariant feature transform (SIFT) features are extracted. All the patch features are clustered and quantified to get visual words. The visual word pair and visual word triplet are neighboring and different visual words. By an analogy between image pixel space and patch space, we also get visual word groups, which are the continuous occurrence of the same visual words. The spatial envelope is employed for extracting an image's global information. The spatial envelope is a holistic description of the scene, where local information is not taken into account. Finally, a stacked-support vector machine (SVM) fusion method is used to get the scene classification results. Experimented with three benchmark data sets, the results demonstrated that our methods could get better results than most popular scene classification methods presented in recent years.
Due to vast growth of image databases, scene image classification methods have become increasingly important in
computer vision areas. We propose a new scene image classification framework based on combined feature and a latent
semantic model which is based on the Latent Dirichlet Allocation (LDA) in the statistical text literature. Here the model
is applied to visual words representation for images. We use Gibbs sampling for parameter estimation and use several
different numbers of topics at the same time to obtain the latent topic representation of images. We densely extract
multi-scale patches from images and get the combined feature on these patches. Our method is unsupervised. It can also
well represent semantic characteristic of images. We demonstrate the effectiveness of our approach by comparing it to
those used in previous work in this area. Experiments were conducted on three often used image databases, and our
method got better results than the others.
This paper presents a novel multiresolution approach for texture classification based on the moment features, which are extracted in the x direction and y direction, of a histogram in each image resolution. Then we propose a weighted multiresolution moment feature for supervised texture classification. Moreover, because of their capability of expressing image information using moment methods, these features achieved better performance on rotation invariance and noise robustness than most of popular texture features presented in recent years.