Semantic segmentation of remote sensing urban scene imagery is a dense prediction task, which has been applied to the land-cover or land-use category. However, the dimension of remote sensing image is huge, which will result in the huge computation cost. In order to reduce the computation cost, a common method is to design a lightweight decoder to achieve a good trade-off between accuracy and computation cost. For this purpose, we design a lightweight transformer-based decoder GDformer. The GDformer consists of our proposed Global Value Transformer and Dynamic feature fusion module. The Global Value Transformer can extract the global semantic feature and the Dynamic feature fusion module can fuse the local feature and global semantic feature dynamic to capture the local-global context with a good trade-off, and the local-global context has been proved is necessary for the semantic segmentation of remote sensing. Extensive experiments prove that our proposed method can achieve a good trade-off between and state-of-the-art performance.
|