In the field of remote sensing, there are numerous challenges associated with multi-label scene classification, such as the firm correlation among labels, the presence of small-scale noisy objects or areas, cluttered backgrounds, diverse image categories in datasets, differences within classes and similarities among classes, and unbalanced class weights. Moreover, the limited representation ability of convolutional neural networks (CNNs) makes multi-label scene classification a complex endeavor. We address the CNN model's limited representation ability and aim to enhance its performance through the proposed framework. Current research has demonstrated that channel spatial attention modules can boost the representation power of CNNs. However, these modules are not yet widely utilized in the multi-label scene domain. Therefore, we have incorporated state-of-the-art channel–spatial attention modules into MobileNet_v1 to improve its representation ability for multi-label remote sensing scene classification. The proposed method employs two-level feature extraction with double channel–spatial attention residual blocks. We tested it on the UC, AID, and DFC15 multi-label datasets and evaluated its performance using various metrics. The results show that our method improves the representation power of MobileNet_v1 by 3% to 7% in terms of |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Remote sensing
Scene classification
Convolution
Data modeling
Education and training
Performance modeling
Visualization