Effective spectral and spatial pixel description plays a significant role for the classification of high resolution remote sensing images. Current approaches of pixel-based feature extraction are of two main kinds: one includes the widelyused principal component analysis (PCA) and gray level co-occurrence matrix (GLCM) as the representative of the shallow spectral and shape features, and the other refers to the deep learning-based methods which employ deep neural networks and have made great promotion on classification accuracy. However, the former traditional features are insufficient to depict complex distribution of high resolution images, while the deep features demand plenty of samples to train the network otherwise over fitting easily occurs if only limited samples are involved in the training. In view of the above, we propose a GLCM-based convolution neural network (CNN) approach to extract features and implement classification for high resolution remote sensing images. The employment of GLCM is able to represent the original images and eliminate redundant information and undesired noises. Meanwhile, taking shallow features as the input of deep network will contribute to a better guidance and interpretability. In consideration of the amount of samples, some strategies such as L2 regularization and dropout methods are used to prevent over-fitting. The fine-tuning strategy is also used in our study to reduce training time and further enhance the generalization performance of the network. Experiments with popular data sets such as PaviaU data validate that our proposed method leads to a performance improvement compared to individual involved approaches.