In the last decades, the amount of data obtained from electro-optical sensor systems has been steadily increasing in remote sensing (RS). Manual analysis of remote sensing images is a time-consuming task. Therefore, machine learning methods for detection and classification have become an appealing field of RS. In particular, the family of region convolutional neural networks (R-CNN) shows considerable success in different RS tasks. Advanced RCNN methods are multistage approaches, where first objects are detected and secondly classified with an optional segmentation step. However, the detection performance of advanced R-CNN algorithms suffers in areas with noticeably varying object densities and scales. Advanced R-CNN architectures usually consist of a detector stage and multiple heads. In the detector stage, regions of interest (ROI) are proposed and filtered by a non-maximum suppression (NMS) layer. In an area with a high density of objects, a strictly adjusted NMS may lead to missed detections. In contrast, a low threshold value for NMS can cause multiple overlapping detections for large objects. To address this challenge, we present our approach improving the results of object detector methods in scenes with varying densities of objects. Therefore, we add an encoder-decoder based density estimation network to our detector network to obtain the location of high-density areas. For these locations, additional fine detection of objects is performed. In order to exhibit the effectiveness of our approach, we evaluate our method on common crowd counting and object detection datasets.
|