To incorporate object locations in a multi-target detection model, we assume that a close duplicate cannot be learned by the model efficiently. So, we use a region-based approach which uses more object location compared to the ground truth locations to localize the targets. The proposed model is able to learn a similarity metric with respect to the ground truth locations which is robust (low false positives) enough for varying images conditions, small aerial target sizes and using few training samples. We report preliminary results on how transfer learning of meta-data a effects small aerial target localization accuracies. Quality ranking from Intersection-over-Union (IOU) in region segmentation models on the aerial ground truth data using pre-trained models from ImageNet, AlexNet, and CIFAR-10 and initialization with three aerial datasets such as the satellite imagery XView2.
Satellite and aerial imaging is being one of the most attractive source of information for the governmental agencies and the commercial companies in recent times. The quality of the images is very important to pick the valuable information from the details especially for high value targets. Satellite images may have unwanted signals called as noise in addition to useful information for several reasons such as heat generated electrons, bad sensor, vibration and clouds. There are several image denoising algorithms to reduce the effects of noise over the image to see the details and gather meaningful information. Many of the traditional denoising methods can filter noise, but at the same time they make the image details fuzzy. This paper presents the convolutional neural network (CNN) based image denoising method that retains the image detail but removes the unwanted noise. The proposed method employs a residual learning strategy, meaning that the CNN network learns to estimate the residual image. A residual image is the difference between a pristine image and a distorted copy of the image, and contains information about the image distortion. An extensive experiments demonstrate that proposed CNN denoising model can not only exhibit high effectiveness in image denoising tasks, but also be efficiently implemented by benefiting from GPU computing. The proposed method is applied to both aerial and satellite imagery and effectiveness is measured using Peak Signal to Noise Ratio (PSNR), Structure Similarity Index Metric (SSIM), and Naturalness Image Quality Evaluator (NIQE), which is also called perceptual quality index.
The emergence of Generative Adversarial Network (GAN)-based single-image super-resolution (SISR) has allowed for finer textures in the super-resolved images, thus making them seem realistic to humans. However, GANbased models may depend on extensive high-quality data and are known to be very costly and unstable to train. On the other hand, Variational Autoencoders (VAEs) have inherent mathematical properties, and they are relatively cheap and stable to train; but VAEs produce blurry images that prevent them from being used for super-resolution. In this paper, we propose a first of its kind SISR method that takes advantage of a selfevaluating Variational Autoencoder (IntroVAE). Our network, called SRVAE, judges the quality of generated high-resolution (HR) images with the target images in an adversarial manner, which allows for high perceptual image generation. First, the encoder and the decoder of our introVAE-based method learn the manifold of HR images. In parallel, another encoder and decoder are simultaneously learning the reconstruction of the lowresolution (LR) images. Next, reconstructed LR images are fed to the encoder of the HR network to learn a mapping from LR images to corresponding HR versions. Using the encoder as a discriminator allows SRVAE to be a fast single-stream framework that performs super-resolution through generating photo-realistic images. Moreover, SRVAE has the same training stability and "nice" latent manifold structure as of VAEs, while playing a max-min adversarial game between the generator and the encoder like GANs. Our experiments show that our super-resolved images are comparable to the state-of-the-art GAN-based super-resolution.
Super-resolution is the process of creating high-resolution (HR) images from low-resolution (LR) images. Single Image Super Resolution (SISR) is challenging because high-frequency image content typically cannot be recovered from the low-resolution image and the absence of high-frequency information thus limits the quality of the HR image. Furthermore, SISR is an ill-posed problem because a LR image can yield several possible high-resolution images. To address this issue, numerous techniques have been proposed but recently deep learning based methods have become popular. Convolutional Neural Network (CNN) approaches to deep learning have shown great success in numerous computer vision tasks. Therefore, it is worthwhile to explore CNN-based approaches to address this challenging problem. This paper presents a deep learning based super resolution (DLSR) approach to find a HR image from its LR counterpart by learning the mapping between them. This mapping is possible because LR and HR images have similar image contents and differ primarily in high-frequency details. In addition, DLSR utilizes residual learning strategy where the network learns to estimate a residual image. DLSR is applied to both aerial and satellite imagery and resulting estimates are compared against the traditional methods using metrics such as Peak Signal to Noise Ratio (PSNR), Structure Similarity Index Metric (SSIM), and Naturalness Image Quality Evaluator (NIQE) also called perceptual quality index. Results obtained depict that DLSR outperform the traditional approaches.
The purpose of this paper is on the study of data fusion applications in traditional, spatial and aerial video stream applications which addresses the processing of data from multiple sources using co-occurrence information and uses a common semantic metric. Use of co-occurrence information to infer semantic relations between measurements avoids the need to make use of such external information, such as labels. Many of the current Vector Space Models (VSM) do not preserve the co-occurrence information leading to a not so useful similarity metric. We propose a proximity matrix embedding part of the learning metric embedding which has entries showing the relations between co-occurrence frequency observed in input sets. First, we show an implicit spatial sensor proximity matrix calculation using Jaccard similarity for an array of sensor measurements and compare with the state-of-the-art kernel PCA learning from feature space proximity representation; it relates to a k-radius ball of nearest neighbors. Finally, we extend the class co-occurrence boosting of our unsupervised model using pre-trained multi-modal reuse.
Generative Adversarial Networks (GANs) are one of the most popular Machine Learning algorithms developed in recent times, and are a class of neural networks that are used in unsupervised machine learning. The advantage of unsupervised machine learning approaches such as GANs is that they do not need a large amount of labeled data, which is costly and time consuming. GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. In this work, GANs are utilized to solve the single image super-resolution problem. This approach in literature is referred to as super resolution GANs (SRGAN), and employs a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and the original photo-realistic images, and the content loss is motivated by the perceptual similarity and not the similarity in the pixel space. This paper presents implementation of SRGAN using Deep convolution network applied to both the aerial and satellite imagery of the aircrafts. The results thus obtained are compared with traditional super resolution methods. The resulting estimates of SRGAN are compared against the traditional methods using peak signal to noise ratio (PSNR) and structure similarity index metric (SSIM). The PSNR and SSIM of SRGAN estimates are similar to traditional method such as Bicubic interpolation but traditional methods are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution.
The ability to accurately detect a target of interest in a hyperspectral imagery (HSI) is largely dependent on
the spatial and spectral resolution. While hyperspectral imaging provides high spectral resolution, the spatial
resolution is mostly dependent on the optics and distance from the target. Many times the target of interest
does not occupy a full pixel and thus is concealed within a pixel, i.e. the target signature is mixed with
other constituent material signatures within the field of view of that pixel. Extraction of spectral signatures
of constituent materials from a mixed pixel can assist in the detection of the target of interest. Hyperspectral
unmixing is a process to identify the constituent materials and estimate the corresponding abundances from the
mixture. In this paper, a framework based on non-negative matrix factorization (NMF) is presented, which is
utilized to extract the spectral signature and fractional abundance of human skin in a scene. The NMF technique
is employed in a supervised manner such that the spectral bases of each constituent are computed first, and then
these bases are applied to the mixed pixel. Experiments using synthetic and real data demonstrate that the
proposed algorithm provides an effective supervised technique for hyperspectral unmixing of skin signatures.
Cadence analysis has been the main focus for discriminating between the seismic signatures of people and animals.
However, cadence analysis fails when multiple targets are generating the signatures. We analyze the mechanism
of human walking and the signature generated by a human walker, and compare it with the signature generated
by a quadruped. We develop Fourier-based analysis to differentiate the human signatures from the animal
signatures. We extract a set of basis vectors to represent the human and animal signatures using non-negative
matrix factorization, and use them to separate and classify both the targets. Grazing animals such as deer, cows,
etc., often produce sporadic signals as they move around from patch to patch of grass and one must characterize
them so as to differentiate their signatures from signatures generated by a horse steadily walking along a path.
These differences in the signatures are used in developing a robust algorithm to distinguish the signatures of
animals from humans. The algorithm is tested on real data collected in a remote area.
Seismic footstep detection based systems for homeland security applications are important to perimeter protection
and other security systems. This paper reports seismic footstep signal separation for a walking horse and a
walking human. The well-known Independent Component Analysis (ICA) approach is employed to accomplish
this task. ICA techniques have become widely used in audio analysis and source separation. The concept of
lCA may actually be seen as an extension of the principal component analysis (PCA), which can only impose
independence up to the second order and, consequently, defines directions that are orthogonal. They can also be
used in conjunction with a classification method to achieve a high percentage of correct classification and reduce
false alarms. In this paper, an ICA based algorithm is developed and implemented on seismic data of human
and horse footsteps. The performance of this method is very promising and is demonstrated by the experimental
This paper describes a new wavelet-based anomaly detection technique for Forward Looking Infrared (FLIR)
sensor consisting a Long-wave (LW) and a Mid-wave (MW) sensor. The proposed approach called wavelet-RX
algorithm consists of a combination of a two-dimensional (2-D) wavelet transform and the well-known multivariate
anomaly detector called the RX algorithm. In our wavelet-RX algorithm, a 2-D wavelet transform is first applied
to decompose the input image into uniform subbands. A number of significant subbands (high energy subbands)
are concatenated together to form a subband-image cube. The RX algorithm is then applied to each subbandimage
cube obtained from wavelet decomposition of LW and MW sensor data separately. Experimental results
are presented for the proposed wavelet-RX and the classical CFAR algorithm for detecting anomalies (targets)
in a single broadband FLIR (LW or MW) sensors. The results show that the proposed wavelet-RX algorithm
outperforms the classical CFAR detector for both LW and for MW FLIR sensors data.