The high resolution remote sensing images are characterized by rich surface details and diverse features, and the single-modality high-resolution images suffer from limited expressive ability in the earth object segmentation application scenarios. We propose a multi-modal remote sensing image segmentation method based on attention-driven dual-branch encoding framework. The method involves parallel encoding of multi-modal remote sensing data to thoroughly extract features from each modality. Furthermore, multistage multi-modal features are fused by attention-driven feature fusion modules to generate high-quality multi-modal feature representation. Extensive experiments are carried out on the International Society for Photogrammetry and Remote Sensing Vaihingen and Potsdam 2D semantic labeling datasets. The datasets include both RGB/IRRG images and digital surface model (DSM) images. Experimental results demonstrate that: (1) the elevation information of DSM images can bring obvious benefits to the earth objects with significant heights, and introducing DSM images properly can improve the segmentation accuracy compared to using only RGB/IRRG images; (2) the attention-driven feature fusion module outperforms traditional feature fusion methods in capturing cross-modal complementary features, leading to outstanding segmentation accuracy for each earth object. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Image segmentation
Remote sensing
Feature fusion
RGB color model
Semantics
Image fusion
Feature extraction