Most state-of-the-art Convolutional Neural Networks (CNNs) are bulky and cannot be deployed on resourceconstrained edge devices. In order to leverage the exceptional generalizability of CNNs on edge-devices, they need to be made efficient in terms of memory usage, model size, and power consumption, while maintaining acceptable performance. Neural architecture search (NAS) is a recent approach for developing efficient, edgedeployable CNNs. On the other hand, CNNs used for classification, albeit developed using NAS, often contain large fully-connected (FC) layers with thousands of parameters, contributing to the bulkiness of CNNs. Recent works have shown that FC layers can be compressed, with minimal loss in performance, if any, using tensor processing methods. In this work, for the first time in literature, we leverage tensor methods in the NAS framework to discover efficient CNNs. Specifically, we employ tensor contraction layers (TCLs) to compress fully connected layers in the NAS framework and control the trade-off between compressibility and classification performance by handcrafting the ranks of TCLs. Additionally, we modify the NAS procedure to incorporate automatic TCL rank search in an end-to-end fashion, without human intervention. Our numerical studies on a wide variety of datasets including CIFAR-10, CIFAR-100, and Imagenette (a subset of ImageNet) demonstrate the superior performance of the proposed method in the automatic discovery of CNNs, whose model sizes are manyfold smaller than other cutting-edge mobile CNNs, while maintaining similar classification performance.
Target detection is an important problem in remote-sensing with crucial applications in law-enforcement, military and security surveillance, search-and-rescue operations, and air traffic control, among others. Owing to the recently increased availability of computational resources, deep-learning based methods have demonstrated state-of- the-art performance in target detection from unimodal aerial imagery. In addition, owing to the availability of remote-sensing data from various imaging modalities, such as RGB, infrared, hyper-spectral, multi-spectral, synthetic aperture radar, and lidar, researchers have focused on leveraging the complementary information offered by these various modalities. Over the past few years, deep-learning methods have demonstrated enhanced performance using multi-modal data. In this work, we propose a method for vehicle detection from multi-modal aerial imagery, by means of a modified YOLOv3 deep neural network that conducts mid-level fusion. To the best of our knowledge, the proposed mid-level fusion architecture is the first of its kind to be used for vehicle detection from multi-modal aerial imagery using a hierarchical object detection network. Our experimental studies corroborate the advantages of the proposed method.
Most commonly used classification algorithms process data in the form of vectors. At the same time, mod- ern datasets often comprise multimodal measurements that are naturally modeled as multi-way arrays, also known as tensors. Processing multi-way data in their tensor form can enable enhanced inference and classification accuracy. Tucker decomposition is a standard method for tensor data processing, which however has demonstrated severe sensitivity to corrupted measurements due to its L2-norm formulation. In this work, we present a selection of classification methods that employ an L1-norm-based, corruption-resistant reformulation of Tucker (L1-Tucker). Our experimental studies on multiple real datasets corroborate the corruption-resistance and classification accuracy afforded by L1-Tucker.