Object tracking is a core subject in computer vision and has significant meaning in both theory and practice. We propose a tracking method in which a robust discriminative classifier is built based on both object and context information. In this method, we consider multiple frames of local invariant features on and around the object and construct the object template and context template. To overcome the limitation of the invariant representations, we also design a nonparametric learning algorithm using transitive matching perspective transformation. This learning algorithm can keep adding object appearance and can avoid improper updating when occlusions appear. We also analyze the asymptotic stability of our method and prove its drift-free capability in long-term tracking. Extensive experiments using challenging publicly available video sequences that cover most of the critical conditions in tracking demonstrate the enhanced strength and robustness of our method.
Existing methods to detect vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times of the same scene, rely on simple and fast models to label track pixels. These models, however, are unable to capture natural track features, such as continuity and parallelism. More powerful but computationally expensive models can be used in offline settings. We present an approach that uses dilated convolutional networks consisting of a series of 3×3 convolutions to segment vehicle tracks. The design of our networks considers the fact that remote sensing applications tend to operate in low power and have limited training data. As a result, we aim for small and efficient networks that can be trained end-to-end to learn natural track features entirely from limited training data. We demonstrate that our six-layer network, trained on just 90 images, is computationally efficient and improves the F-score on a standard dataset to 0.992, up from 0.959 obtained by the current state-of-the-art method.
Classification of hyperspectral remote sensing imagery is one of the most popular topics because of its intrinsic potential to gather spectral signatures of materials and provides distinct abilities to object detection and recognition. In the last decade, an enormous number of methods were suggested to classify hyperspectral remote sensing data using spectral features, though some are not using all information and lead to poor classification accuracy; on the other hand, the exploration of deep features is recently considered a lot and has turned into a research hot spot in the geoscience and remote sensing research community to enhance classification accuracy. A deep learning architecture is proposed to classify hyperspectral remote sensing imagery by joint utilization of spectral–spatial information. A stacked sparse autoencoder provides unsupervised feature learning to extract high-level feature representations of joint spectral–spatial information; then, a soft classifier is employed to train high-level features and to fine-tune the deep learning architecture. Comparative experiments are performed on two widely used hyperspectral remote sensing data (Salinas and PaviaU) and a coarse resolution hyperspectral data in the long-wave infrared range. The obtained results indicate the superiority of the proposed spectral–spatial deep learning architecture against the conventional classification methods.
Hyperspectral anomaly detection (AD) is an important technique of unsupervised target detection and has significance in real situations. Due to the high dimensionality of hyperspectral data, AD will be influenced by noise, nonlinear correlation of band, or other factors that lead to the decline of detection accuracy. To overcome this problem, a method of hyperspectral AD based on stacked denoising autoencoders (AE) (HADSDA) is proposed. Simultaneously, two different feature detection models, spectral feature (SF) and fused feature by clustering (FFC), are constructed to verify the effectiveness of the proposed algorithm. The SF detection model uses the SF of each pixel. The FFC detection model uses a similar set of pixels constructed by clustering and then fuses the set of pixels by the stacked denoising autoencoders algorithm (SDA). The SDA is an algorithm that can automatically learn nonlinear deep features of the image. Compared with other linear or nonlinear feature extraction methods, the detection result of the proposed algorithm is greatly improved. Experiment results show that the proposed algorithm is an excellent feature learning method and can achieve higher detection performance.
Deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition from natural images. In contrast, object recognition from remote sensing images is more challenging, due to the complex background and inadequate data for training a deep network with a huge number of parameters. We propose a unified deep CNN, called DeepPlane, to simultaneously detect the position and classify the category of aircraft in remote sensing images. This model consists of two correlative deep networks: the first one is designed to generate object proposals as well as feature maps and the second one is cascaded upon the first one to perform classification and box regression in one shot. The “inception module” is introduced to tackle the insufficient training data problem that is one of the most challenging obstacles of detection in remote sensing images. Extensive experiments demonstrate the efficiency of the proposed DeepPlane model. Specifically, DeepPlane could model detection and classification jointly and achieves 91.9% mAP in six categories of aircraft, which advances the state-of-the-art, sometimes considerably, for both tasks.
Recent advances in remote sensing technology have made multisensor data available for the same area, and it is well-known that remote sensing data processing and analysis often benefit from multisource data fusion. Specifically, low spatial resolution of hyperspectral imagery (HSI) degrades the quality of the subsequent classification task while using visible (VIS) images with high spatial resolution enables high-fidelity spatial analysis. A collaborative classification framework is proposed to fuse HSI and VIS images for finer classification. First, the convolutional neural network model is employed to extract deep spectral features for HSI classification. Second, effective binarized statistical image features are learned as contextual basis vectors for the high-resolution VIS image, followed by a classifier. The proposed approach employs diversified data in a decision fusion, leading to an integration of the rich spectral information, spatial information, and statistical representation information. In particular, the proposed approach eliminates the potential problems of the curse of dimensionality and excessive computation time. The experiments evaluated on two standard data sets demonstrate better classification performance offered by this framework.
Automated classification of images across image archives requires reducing the semantic gap between high-level features perceived by humans and low-level features encoded in images. Due to rapidly growing image archives in the Earth science domain, it is critical to automatically classify images for efficient sorting and discovery. In particular, classifying images based on the presence of Earth science phenomena allows users to perform climatology studies and investigate case studies. We present applications of deep learning-based classification of Earth science images.
Ocean fronts have been a subject of study for many years, a variety of methods and algorithms have been proposed to address the problem of ocean fronts. However, all these existing ocean front recognition methods are built upon human expertise in defining the front based on subjective thresholds of relevant physical variables. This paper proposes a deep learning approach for ocean front recognition that is able to automatically recognize the front. We first investigated four existing deep architectures, i.e., AlexNet, CaffeNet, GoogLeNet, and VGGNet, for the ocean front recognition task using remote sensing (RS) data. We then propose a deep network with fewer layers compared to existing architecture for the front recognition task. This network has a total of five learnable layers. In addition, we extended the proposed network to recognize and classify the front into strong and weak ones. We evaluated and analyzed the proposed network with two strategies of exploiting the deep model: full-training and fine-tuning. Experiments are conducted on three different RS image datasets, which have different properties. Experimental results show that our model can produce accurate recognition results.
Automatic ship detection in optical remote sensing images has attracted wide attention for its broad applications. Major challenges for this task include the interference of cloud, wave, wake, and the high computational expenses. We propose a fast and robust ship detection algorithm to solve these issues. The framework for ship detection is designed based on deep convolutional neural networks (CNNs), which provide the accurate locations of ship targets in an efficient way. First, the deep CNN is designed to extract features. Then, a region proposal network (RPN) is applied to discriminate ship targets and regress the detection bounding boxes, in which the anchors are designed by intrinsic shape of ship targets. Experimental results on numerous panchromatic images demonstrate that, in comparison with other state-of-the-art ship detection methods, our method is more efficient and achieves higher detection accuracy and more precise bounding boxes in different complex backgrounds.
We investigate the effectiveness of deep neural network for cross-domain classification of remote sensing images in this paper. In the network, class centroid alignment is utilized as a domain adaptation strategy, making the network able to transfer knowledge from the source domain to target domain on a per-class basis. Since predicted labels of target data should be used to estimate the centroid of each class, we use overall centroid alignment as a coarse domain adaptation method to improve the estimation accuracy. In addition, rectified linear unit is used as the activation function to produce sparse features, which may improve the separation capability. The proposed network can provide both aligned features and an adaptive classifier, as well as obtain label-free classification of target domain data. The experimental results using Hyperion, NCALM, and WorldView-2 remote sensing images demonstrated the effectiveness of the proposed approach.
Deep convolutional neural networks (CNNs) have been widely used to obtain high-level representation in various computer vision tasks. However, for remote scene classification, there are not sufficient images to train a very deep CNN from scratch. From two viewpoints of generalization power, we propose two promising kinds of deep CNNs for remote scenes and try to find whether deep CNNs need to be deep for remote scene classification. First, we transfer successful pretrained deep CNNs to remote scenes based on the theory that depth of CNNs brings the generalization power by learning available hypothesis for finite data samples. Second, according to the opposite viewpoint that generalization power of deep CNNs comes from massive memorization and shallow CNNs with enough neural nodes have perfect finite sample expressivity, we design a lightweight deep CNN (LDCNN) for remote scene classification. With five well-known pretrained deep CNNs, experimental results on two independent remote-sensing datasets demonstrate that transferred deep CNNs can achieve state-of-the-art results in an unsupervised setting. However, because of its shallow architecture, LDCNN cannot obtain satisfactory performance, regardless of whether in an unsupervised, semisupervised, or supervised setting. CNNs really need depth to obtain general features for remote scenes. This paper also provides baseline for applying deep CNNs to other remote sensing tasks.