Recently, intelligent machine agents, such as a deep neural network (DNN), have been showing unparalleled capabilities in recognizing visual patterns, objects, semantic activities/events embedded in real-world images and videos. Hence, there has been an increasing need to deploy DNNs, to a battlefield to provide the Solider with realtime situational understanding by capturing a holistic view of battlespace. Soldiers engaged in tactical operations can greatly benefit from leveraging advanced at-the-point-of-need data analytics running on multimodal and heterogeneous platforms in distributed and constrained network environments. The proposed work aims to decompose DNNs and then distribute over edge nodes in such a way that a trade-off between resources available in the constrained network and recognition performance can be optimized. In this work, we decompose DNNs into two stages: an initial stage on an edge device and the remaining portion running on an edge cloud. To effectively and efficiently divide DNNs into two separate stages, we will rigorously analyze multiple widely used DNN architectures with respect to its memory size and FLOPs (Floating Point Operations) per each layer. Based on these analyses, we will develop advanced splitting strategies for DNNs to handle various network constraints.
Terror attacks are often targeted towards the civilians gathered in one location (e.g., Boston Marathon bombing). Distinguishing such ’malicious’ scenes from the ’normal’ ones, which are semantically different, is a difficult task as both scenes contain large groups of people with high visual similarity. To overcome the difficulty, previous methods exploited various contextual information, such as language-driven keywords or relevant objects. Although useful, they require additional human effort or dataset. In this paper, we show that using more sophisticated and deeper Convolutional Neural Networks (CNNs) can achieve better classification accuracy even without using any additional information outside the image domain. We have conducted a comparative study where we train and compare seven different CNN architectures (AlexNet, VGG-M, VGG16, GoogLeNet, ResNet- 50, ResNet-101, and ResNet-152). Based on the experimental analyses, we found out that deeper networks typically show better accuracy, and that GoogLeNet is the most favorable among the seven architectures for the task of malicious event classification.
Efficient and accurate real-time perception systems are critical for Unmanned Aerial Vehicle (UAV) applications that aim to provide enhanced situational awareness to users. Specifically, object recognition is a crucial element for surveillance and reconnaissance missions since it provides fundamental semantic information of the aerial scene. In this study, we describe the development and implementation of a perception frame-work on an embedded computer vision platform, mounted on a hexacopter for real-time object detection. The framework includes a camera driver and a deep neural network based object detection module and has distributed computing capabilities between the aerial platform and the corresponding ground station. Preliminary aerial real-time object detections using YOLO are performed onboard a UAV and a sequence of images are streamed to the base station where an advanced computer vision algorithm, referred to as Multi-Expert Region-based CNN (ME- RCNN), is leveraged to provide enhanced and fine-grained analytics on the aerial video feeds. Since annotated aerial imagery in the UAV domain is hard to obtain and not routinely available, we use a combination of aerial data as well as air-to-ground synthetic images, such as vehicles, generated by video gaming engines for training the neural network. Through this study, we quantify the level of improvements with the use of the synthetic dataset and the efficacy of using advanced object detection algorithms.
Proc. SPIE. 9836, Micro- and Nanotechnology Sensors, Systems, and Applications VIII
KEYWORDS: Convolutional neural networks, Electroencephalography, Data modeling, Visualization, Signal to noise ratio, Visual process modeling, Machine vision, Performance modeling, Convolution, Data fusion
Traditionally, Brain-Computer Interfaces (BCI) have been explored as a means to return function to paralyzed or otherwise debilitated individuals. An emerging use for BCIs is in human-autonomy sensor fusion where physiological data from healthy subjects is combined with machine-generated information to enhance the capabilities of artificial systems. While human-autonomy fusion of physiological data and computer vision have been shown to improve classification during visual search tasks, to date these approaches have relied on separately trained classification models for each modality. We aim to improve human-autonomy classification performance by developing a single framework that builds codependent models of human electroencephalograph (EEG) and image data to generate fused target estimates. As a first step, we developed a novel convolutional neural network (CNN) architecture and applied it to EEG recordings of subjects classifying target and non-target image presentations during a rapid serial visual presentation (RSVP) image triage task. The low signal-to-noise ratio (SNR) of EEG inherently limits the accuracy of single-trial classification and when combined with the high dimensionality of EEG recordings, extremely large training sets are needed to prevent overfitting and achieve accurate classification from raw EEG data. This paper explores a new deep CNN architecture for generalized multi-class, single-trial EEG classification across subjects. We compare classification performance from the generalized CNN architecture trained across all subjects to the individualized XDAWN, HDCA, and CSP neural classifiers which are trained and tested on single subjects. Preliminary results show that our CNN meets and slightly exceeds the performance of the other classifiers despite being trained across subjects.
A novel approach for the fusion of heterogeneous object classification methods is proposed. In order to effectively integrate the outputs of multiple classifiers, the level of ambiguity in each individual classification score is estimated using the precision/recall relationship of the corresponding classifier. The main contribution of the proposed work is a novel fusion method, referred to as Dynamic Belief Fusion (DBF), which dynamically assigns probabilities to hypotheses (target, non-target, intermediate state (target or non-target) based on confidence levels in the classification results conditioned on the prior performance of individual classifiers. In DBF, a joint basic probability assignment, which is obtained from optimally fusing information from all classifiers, is determined by the Dempster's combination rule, and is easily reduced to a single fused classification score. Experiments on RSVP dataset demonstrates that the recognition accuracy of DBF is considerably greater than that of the conventional naive Bayesian fusion as well as individual classifiers used for the fusion.