Aircraft detection is a challenging task in remote sensing images which attract increasing attention in recent years. Existing methods based on fully-supervised convolutional neural networks (CNN) require expensive labeling information such as bounding box, which is time consuming and difficult to obtain. Recently, weakly supervised methods only using image-level labels has drawn increasing attention in natural imagery. An approach called class activation map (CAM) based on weakly supervised performs well in natural scene images for object detection, but there is a problem when using it in remote sensing images: inaccurate localization. In this paper, we propose a method called Active Region Corrected (ARC) to locate aircraft accurately. We find that generating the localization map in the classified network by extracting the feature before the last pooling layer contains more accurate position information but a lot of noise, and then we use the CAM to generate a localization map which contains rough location information of aircraft. Combining these two localization maps we can get the exact position of the aircraft. Experiments conducted on data set verify that our proposal obtains a superior performance on aircraft detect and localization in remote sensing images.
With the increasing amount of high-resolution remote sensing images, large-scale remote sensing image retrieval(RSIR) becomes more and more significant and has attracted great attention. Traditional image retrieval methods generally use hand-crafted features which are not only time-consuming but also always get poor performance. Deep learning recently achieves remarkable performance due to its powerful ability to learn high-level semantic features, so researchers attempt to take advantage of features derived from Convolutional Neural Networks(CNNs) in RSIR. But remote sensing image is different from natural scene image, its background is more complicated with a lot of noise and existing deep learning method didn’t handle this well. Both the speed and the accuracy achieve unsatisfactory performance. In this paper, we propose a rotation invariant hashing network that represents an image as a binary hash code to retrieve image faster while considering the rotation invariance of the same target. The results of the experiments on some available remote sensing datasets show that our method is effective and outperforms than other features which is usually used in RSIR.
Object detection is one of the most important issues in the field of remote sensing analysis. The lack of semantic information about objects poses difficulty for traditional methods in exploring effective features for object discrimination. Being capable of feature extraction, a series of region-based convolutional neural networks (R-CNN) have been widely and successfully applied for object detection in natural images recently. However, most of them suffer from the poor detection performance of small-sized targets, which means that few of them can be introduced directly for small-sized object detection in remote sensing images. This paper proposes a modified method based on faster R-CNN, which is composed of a feature extraction network, a region proposal network and an object detection network. Compared to faster R-CNN, in the feature extraction network, the proposed method removes the forth pooling layer and employs dilated convolutions on the all subsequent convolutional layers to enhance the resolution of the final feature maps, which provide more detailed and semantic feature information of targets to help detect objects especially the small-sized one. In the object detection network, contextual features around the region proposals are added as complement feature information to help distinguish objects accurately. Experiments conducted on two data sets verify that our proposal obtains a superior performance on small-sized object detection in remote sensing images.
Multiple-instance learning (MIL) has been successfully utilized in image retrieval. Existing approaches cannot select positive instances correctly from positive bags which may result in a low accuracy. In this paper, we propose a new image retrieval approach called multiple instance learning based on instance-consistency (MILIC) to mitigate such issue. First, we select potential positive instances effectively in each positive bag by ranking instance-consistency (IC) values of instances. Then, we design a feature representation scheme, which can represent the relationship among bags and instances, based on potential positive instances to convert a bag into a single instance. Finally, we can use a standard single-instance learning strategy, such as the support vector machine, for performing object-based image retrieval. Experimental results on two challenging data sets show the effectiveness of our proposal in terms of accuracy and run time.
Existing methods for visual saliency based image retrieval typically aim at single instance image. However, without any prior knowledge, the content of single instance image is ambiguous and these methods cannot effectively reflect the object of interest. In this paper, we propose a novel image retrieval framework based on multi-instance saliency model. First, the feature saliency is computed based on global contrast, local contrast and sparsity, and the synthesize saliency map is obtained by using Multi-instance Learning (MIL) algorithm to dynamically weight the feature saliency. Then we employ a fuzzy region-growth algorithm on the synthesize saliency map to extract the saliency object. We finally extract color and texture feature as the retrieval feature and measure feature similarity by Euclidean distance. In the experiments, the proposed method can achieve higher multi-instance image retrieval accuracy than the other single instance image retrieval methods based on saliency model.
Image representation is the key part of image classification, and Fisher kernel has been considered as one of the most effective image feature coding methods. For the Fisher encoding method, there is a critical issue that the single GMM only models features within a rough granularity space. In this paper, we propose a method that is named Multi-scale and Multi-GMM Pooling (MMP), which could effectively represent the image from various granularities. We first conduct pooling using the multi-GMM instead of a single GMM. Then, we introduce multi-scale images to enrich the model’s inputs, which could improve the performance further. Finally, we validate out proposal on PASCAL VOC2007 dataset, and the experimental results show an obvious superiority over the basic Fisher model.
Existing visual saliency detection methods are usually based on single image, however, without priori knowledge, the contents of single image are ambiguous, so visual saliency detection based on single image can’t extract region of interest. To solve it, we propose a novel saliency detection based on multi-instance images. Our method considers human’s visual psychological factors and measures visual saliency based on global contrast, local contrast and sparsity. It firstly uses multi-instance learning to get the center of clustering, and then computes feature relative dispersion. By fusing different weighted feature saliency map, the final synthesize saliency map is generated. Comparing with other saliency detection methods, our method increases the rate of hit.