High-resolution remote sensing data classification has been a challenging and promising research topic in the community of remote sensing. In recent years, with the rapid advances of deep learning, remarkable progress has been made in this field, which facilitates a transition from hand-crafted features designing to an automatic end-to-end learning. A deep fully convolutional networks (FCNs) based ensemble learning method is proposed to label the high-resolution aerial images. To fully tap the potentials of FCNs, both the Visual Geometry Group network and a deeper residual network, ResNet, are employed. Furthermore, to enlarge training samples with diversity and gain better generalization, in addition to the commonly used data augmentation methods (e.g., rotation, multiscale, and aspect ratio) in the literature, aerial images from other datasets are also collected for cross-scene learning. Finally, we combine these learned models to form an effective FCN ensemble and refine the results using a fully connected conditional random field graph model. Experiments on the ISPRS 2-D Semantic Labeling Contest dataset show that our proposed end-to-end classification method achieves an overall accuracy of 90.7%, a state-of-the-art in the field.