In recent years the rise of deep learning neural networks has shown great results in image classification. Most of the previous work focuses on classification of fairly large objects in visual imagery. This paper presents a method of detecting and classifying small objects in thermal imagery using a deep learning method based on a RetinaNet network. The result shows that a deep neural network with a relative small set of labelled images can be trained to classify objects in thermal imagery. Objects from classes with the most training examples (cars, trucks and persons) can with relative high confidence be classified given an object size of 32×32 pixels or smaller.
This paper presents components of a sensor management architecture for autonomous UAV systems equipped with IR and video sensors, focusing on two main areas. Firstly, a framework inspired by optimal control and information theory is presented for concurrent path and sensor planning. Secondly, a method for visual landmark selection and recognition is presented. The latter is intended to be used within a SLAM (Simultaneous Localization and Mapping) architecture for visual navigation. Results are presented on both simulated and real sensor data, the latter from the MASP system (Modular Airborne Sensor Platform), an in-house developed UAV surrogate system containing a gimballed IR camera, a video sensor, and an integrated high performance navigation system.
We present a two-stage process for target identification and pose estimation. A database of possible target states, i.e. identity and pose, is precomputed by a two-step clustering procedure, reflecting the two stages of the identification process. The current database is based on images generated from 3D CAD models of military ground vehicles on which realistic infrared textures have been applied. At the coarse level, the database is divided into a set of clusters, each represented by a small set of eigenimages, obtained through principal component analysis (PCA). The classification at this level is achieved by measuring the orthogonal distance between the region of interest (ROI) and the eigenspace of each cluster. Each cluster itself contains a few subclusters. A support vector machine is employed for a pairwise discrimination of subclusters. The likelihood that the target belongs to a particular cluster/subcluster is based on histograms, obtained at the time of training of the system. In addition to the classification of individual images it is also possible to handle image sequences where the pose of the target might vary in subsequent image frames. In this situation, the pose is assumed to change according to a first-order Markov process. The overall probability for each target state is accumulated through recursive Bayesian estimation. The performance of the above procedure has been evaluated through the identification of targets in synthetic image sequences, where the targets are placed in realistic backgrounds. Currently , we are able to correctly identify the targets in more than 80 percent of the image sequences. In about 60 (80) percent of the cases the pose can be estimated within an accuracy of 10 (20) degrees. The accuracy of the pose estimation is limited by the size of the sub-clusters.
One way to increase the robustness and efficiency of unmanned surveillance platforms is to introduce an autonomous data acquisition capability. In order to mimic a sensor operator's search pattern, combining wide area search with detailed study of detected regions of interest, the system must be able to produce target indications in real time. Rapid detection algorithms are also useful for cueing image analysts that process large amounts of aerial reconnaissance imagery. Recently, the use of a sequence of increasingly complex classifiers has by several authors been suggested as a means to achieve high processing rates at low false alarm and miss rates. The basic principle is that much of the background can be rejected by a simple classifier before more complex classifiers are applied to analyse more difficult remaining image regions. Even higher performance can be achieved if each detector stage is implemented as a set of expert classifiers, each specialised to a subset of the target training set. In order to cope with the increasingly difficult classification problem faced at successive stages, the partitioning of the target training set must be made increasingly fine-grained, resulting in a coarse-to-fine hierarchy of detectors. Most of the literature on this type of detectors is concerned with face detection. The present paper describes a system designed for detection of military ground vehicles in thermal imagery from airborne
platforms. The classifier components used are trained using a variant of the LogitBoost algorithm. The results obtained are encouraging, and suggest that it is possible to achieve very low false alarm and miss rates for this very demanding application.
This paper describes a framework for image processing and sensor
management for an autonomous unmanned airborne surveillance system
equipped with infrared and video sensors. Our working hypothesis
is that integration of the detection-tracking-classification chain
with spatial awareness makes possible intelligent autonomous data
acquisition by means of active sensor control. A central part of
the framework is a surveillance scene representation, suitable for
target tracking, geolocation, and sensor data fusion involving
multiple platforms. The representation, based on Simultaneous
Localization and Mapping, SLAM, take into account uncertainties
associated with sensor data, platform navigation, and prior
knowledge. A client/server approach, for on-line adaptable
surveillance missions, is introduced. The presented system is
designed to simultaneously and autonomously perform the following
tasks: provide wide area coverage from multiple viewpoints by
means of a step-stare procedure, detect and track multiple
stationary and moving ground targets, perform a detailed analysis
of detected regions-of-interest, and generate precise target
coordinates by means of multi-view geolocation techniques.
In several recent articles it has been suggested that the shape of the correlation peak be used to distinguish between target and clutter. The peak shape is characterized in terms of some features, such as geometrical moments, which are then fed into a classifier that decides whether the peak was generated by target or clutter. The classification can be facilitated by an appropriate filter design. The maximum average correlation height (MACH) filter was designed to product similar correlation planes for target variations present in the training set. In this article we present generalizations of the MACH filter with the intention of decreasing the peak shape variation for targets in severe clutter. We show that by taking into account the non- overlapping character of the background noise and focusing the MACH correlation plane similarity requirement to the peak neighborhood, it is possible to simultaneously achieve a small variation in correlation peak shape and high peak- to-sidelobe ratios for cluttered images.