It is difficult to achieve detailed segmentation since the building size varies in high-resolution remote sensing images, especially for small buildings. To address these problems, a dense feature pyramid fusion deep network is proposed in this study. First, we built an encoder-decoder structure, and combine attention mechanism and atrous convolution to improve the feature extraction results in the encoder. Second, the pyramid pooling module is selected to extract the multi-scale features from different levels. Finally, dense feature pyramid is adopted in the decoder to fuse multi-level and multi-scale features to obtain the final segmentation results. Experiments on Inria Aerial Image Labeling Dataset show that our method achieves competitive performance compared with other classical semantic segmentation networks.
|