Recently deep learning-based methods for small object detection have been improved by leveraging temporal information. The capability of detecting objects down to five pixels, provides new opportunities for automated surveillance with high resolution wide field of view cameras. However, integration on unmanned vehicles generally comes with strict demands on size, weight and power. This poses a challenge for processing high framerate high resolution data, especially when multiple camera streams need to be analyzed in parallel for 360 degrees situational awareness. This paper presents results of the Penta Mantis-Vision project where we investigated the parallel processing of four 4K camera video streams with commercially available edge computing hardware, specifically the Nvidia Jetson AGX Orin. As the computational power of the GPU on an embedded platform is a critical bottleneck we explore widely available techniques to accelerate inference or reduce power consumption. Specifically we analyze the effect of INT8 quantization and replacement of the activation function on small object detection. Furthermore we propose a prioritized a tiling strategy to process camera frames in such a way that new objects can be detected anywhere in the camera view while previously detected objects can still be tracked robustly. We implemented a video processing pipeline for different temporal YOLOv8 models and evaluated these with respect to object detection accuracy and throughput. Our results demonstrate that recently developed deep learning models can be deployed on embedded devices for real-time multi-cam detection and tracking of small objects without compromising object detection accuracy.
KEYWORDS: Data modeling, Object detection, Transformers, Education and training, Performance modeling, 3D modeling, Sensors, Visual process modeling, Linear filtering, Computer vision technology
Collecting and annotating real-world data for the development of object detection models is a time-consuming and expensive process. In the military domain in particular, data collection can also be dangerous or infeasible. Training models on synthetic data may provide a solution for cases where access to real-world training data is restricted. However, bridging the reality gap between synthetic and real data remains a challenge. Existing methods usually build on top of baseline Convolutional Neural Network (CNN) models that have been shown to perform well when trained on real data, but have limited ability to perform well when trained on synthetic data. For example, some architectures allow for fine-tuning with the expectation of large quantities of training data and are prone to overfitting on synthetic data. Related work usually ignores various best practices from object detection on real data, e.g. by training on synthetic data from a single environment with relatively little variation. In this paper we propose a methodology for improving the performance of a pre-trained object detector when training on synthetic data. Our approach focuses on extracting the salient information from synthetic data without forgetting useful features learned from pre-training on real images. Based on the state of the art, we incorporate data augmentation methods and a Transformer backbone. Besides reaching relatively strong performance without any specialized synthetic data transfer methods, we show that our methods improve the state of the art on synthetic data trained object detection for the RarePlanes and DGTA-VisDrone datasets, and reach near-perfect performance on an in-house vehicle detection dataset.
Automated object detection is becoming more relevant in a wide variety of applications in the military domain. This includes the detection of drones, ships, and vehicles in video and IR video. In recent years, deep learning based object detection methods, such as YOLO, have shown to be promising in many applications for object detection. However, current methods have limited success when objects of interest are small in number of pixels, e.g. objects far away or small objects closer by. This is important, since accurate small object detection translates to early detection and the earlier an object is detected the more time is available for action. In this study, we investigate novel image analysis techniques that are designed to address some of the challenges of (very) small object detection by taking into account temporal information. We implement six methods, of which three are based on deep learning and use the temporal context of a set of frames within a video. The methods consider neighboring frames when detecting objects, either by stacking them as additional channels or by considering difference maps. We compare these spatio-temporal deep learning methods with YOLO-v8 that only considers single frames and two traditional moving object detection methods. Evaluation is done on a set of videos that encompasses a wide variety of challenges, including various objects, scenes, and acquisition conditions to show real-world performance.
Person re-identification (Re-ID) can be used to find the owner of lost luggage, to find suspects after a terrorist attack, or to fuse multiple sensors. Common state-of-the-art deep-learning technology performs well on a large public dataset but it does not generalize well to other environments, which makes it less suitable for practical applications. In this paper, we present and evaluate a new strategy for rapid Re-ID retraining to increase flexibility for deployment in new environments. In addition, we pay special attention to make our method work with anonymized data due to the sensitive nature of the collected data. A training set with anonymized snippets is automatically collected using additional cameras and person tracking. The evaluation results show that this rapid training approach obtains high performance scores.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.