The rapid adoption of autonomous Unmanned Aerial Vehicles (UAVs) for various real-world applications in both industry and the military is driving the need for efficient UAV surveillance and countering systems, as these vehicles create new threats to the safety of people and assets. These systems typically contain a variety of sensors and effectors, including video sensors that are used for both human confirmation of a potential menacing UAV, and visual servicing of effectors used to counter an aerial threat. In this case, the performance of the system depends on the accuracy of the algorithm chain (classification, localization and threat identification) used for video tracking. In this paper, we study an original approach for temporally stable video tracking of targets. Specifically, we use state-of-the-art algorithms for semantic object detection and then consolidate them with a pose estimation method to enhance the perception performance. This paper compares different approaches on real data.
|