High discriminative feature representation is key in remote sensing scene classification. Existing mid-level feature methods for solving the classification show poor performance. The reason includes two aspects. First, the discrimination power of the feature generated by the feature coding method is limited. Second, semantic information hidden in the scene images are not utilized. These essentially prevent them from achieving better performance. To solve these issues, we propose a hierarchical feature coding model with two stacked feature encoding layers. Specifically, in the first coding layer, semantic information from convolutional layers of deep models and complementary structure and spectral features are extracted and encoded into bag of visual word (BOVW) histogram features. Then in the second layer, Dirichlet-based Gaussians mixture model Fisher kernel is adopted to transform the BOVW histogram features to the more discriminative and effective feature vectors. Thus, through feeding the output of the first layer into the second layer, the complex feature representation is refined. Finally, the concatenated feature vectors are put into support vector machine classifier for classification. Experiments on two public high-resolution remote sensing scene datasets demonstrate that the performance of our hierarchical coding method is comparable to the previous state-of-the-art methods, including most multifeature fusion methods and convolutional neural network-based methods.
Aiming at low precision of remote sensing image scene classification, a classification method DCNN_FF based on deep convolutional neural network (DCNN) feature fusion (FF) is proposed. This method utilizes the existing pre-trained network models CaffeNet and GoogLeNet, and extracts the features of the classified remote sensing images by fine tuning on the target dataset. After dimension reduction by principle component analysis (PCA), the features extracted from the two network models are combined. Finally, the support vector machine (SVM) is used for classification of the combined features. The experimental results on the commonly used and latest datasets show that, this method can utilize the existing network models and combine with the structural advantages of different models, and its average classification accuracy is higher than that of single network model by more than 1.68%. Thus it improves the accuracy of remote sensing image scene classification.