Due to the particularity of remote sensing images, their target detection performance is much lower than that of natural images. In this article, we designed a new model to improve object detection performance in remote sensing images significantly. First, we redesigned the feature extraction network. We deepen the network to obtain feature maps of more sizes and increase the number of detection heads, making the prediction anchors more precise and able to adapt to detection tasks with large target scale spans. Second, to avoid excessive information loss caused by a deep network, we designed a three-level feature fusion network to supplement as much original information as possible into the output feature map. Third, we have introduced a transformer module in the last layer of the backbone, which can compensate for the convolutional network’s weak global information extraction ability without increasing too much computational complexity. In addition, we replaced the original filter with soft-non-maximum suppression (soft-NMS) to solve the problem of missed detections caused by small target clustering in remote sensing images. Experimental results on the DIOR (optical remote sensing image detection) dataset have shown that our model performs well when there are significant differences in object size and small target clustering. Compared with the original network, the mean average precision has improved by 4.8%. We have expanded the DIOR dataset to enhance the model’s generalization ability and explore the network’s potential. The model trained using the expanded dataset is more robust and can work effectively under various interferences. The mean average precision can reach 76.2%. Our model can achieve good results with a small amount of computing resources. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Object detection
Target detection
Remote sensing
Head
Transformers
Feature fusion
Feature extraction