Vehicle detection in aerial imagery is still an open research challenge although it has received some breakthroughs in the computer vision research community. Most of the existing state-of-the-art vehicle detection algorithms have ignored to consider some major factors which may have a great influence on the detection task. The low-resolution characteristic of aerial images is considered one of the major factors. Although the super-resolution technique can resolve this problem which learns a mapping between the low-resolution (LR) images and their corresponding high-resolution (HR) counterparts, however, the problem still remains when detection needs to take place at night or in a dark environment. Therefore, RGB-based detection can be another vital problem specifically for the detection task in a dark environment. For such environment infrared (IR) imaging becomes necessary which again may not be available during training an IR detector. To address these challenges, we propose a joint cross-modal and super-resolution framework based on the Generative Adversarial Network (GAN) for vehicle detection in aerial images. Our proposed joint network consists of two deep sub-networks. The first sub-network utilizes the GAN architecture to generate super-resolved (SR) images across two different domains (cross-domain translation). The second sub-network performs detection on these cross-domain translated and super-resolved images using one of the state-of-the-art object detectors i.e., You Only Look Once version 3 (YOLOv3). To evaluate the efficacy of our proposed model, we conduct several experiments on a publicly available Vehicle Detection in Aerial Imagery (VEDAI) dataset. We further compare our proposed network with state-of-the-art image generation methods to show the adequacy of our model.
Images can be captured using devices operating at different light spectrum's. As a result, cross domain image translation becomes a nontrivial task which requires the adaptation of Deep convolutional networks (DCNNs) to resolve the aforementioned imagery challenges. Automatic target recognition(ATR) from infrared imagery in a real time environment is one of such difficult tasks. Generative Adversarial Network (GAN) has already shown promising performance in translating image characteristic from one domain to another. In this paper, we have explored the potential of GAN architecture in cross-domain image translation. Our proposed GAN model maps images from the source domain to the target domain in a conditional GAN framework. We verify the performance of the generated images with the help of a CNN-based target classifier. Classification results of the synthetic images achieve a comparable performance to the ground truth ensuring realistic image generation of the designed network.
Vehicle detection in aerial imagery has become tremendously a challenging task due to the low resolution characteristics of the aerial images. Super-Resolution; a technique which recovers high-resolution image from a single low-resolution image can be an effective approach to resolve this shortcoming. Hence, our prime focus is to design a framework for detecting vehicles in super resolved aerial images. Our proposed system can be represented as a combination of two deep sub-networks. The first sub-network aims to use a Generative Adversarial Network (GAN) for getting super resolved images. A GAN consists of two networks: a generator network and a discriminator network. It ensures recovery of photo-realistic images from down-sampled images. The second sub-network consists of a deep neural network (DNN)-based object detector for detecting vehicles in super resolved images. In our architecture, the Single Shot Multi Box Detector (SSD) is used for vehicle detection. The SSD generates fixed-size bounding boxes with predicting scores for different object class instances in those boxes. It also employs a non-maximum suppression step to produce final detections. In our algorithm, our deep SSD detector is trained with the predicted super resolved images and its performance is then compared with an SSD detector that is trained only on the low-resolution images. Finally, we compare the performance of our proposed pre-trained SSD detector on super-resolved images with an SSD that is trained only on the original high resolution images.