Translator Disclaimer
30 April 2018 Benchmarking deep learning trackers on aerial videos
Author Affiliations +
In this paper, we benchmark five state-of-the-art trackers on aerial platform videos: Multi-domain Convolutional Neural Network (MDNET) tracker, which was the winner of the VOT2015 tracking challenge, the Fully Convolutional Neural network Tracker (FCNT), the Spatially Regularized Correlation Filter (SRDCF) tracker, the Continuous Convolution Operator Tracker (CCOT) tracker, which was the winner of the VOT2016 challenge, and the Tree structure Convolutional Neural Network (TCNN) tracker. We assess performance in terms of both tracking accuracy and processing speed based on two sets of videos: a subset of the OTB dataset where the cameras are located at a high vantage point and a new dataset of aerial videos captured by a moving platform. Our results indicate that these trackers performed as expected for the videos in the OTB subset, however, tracker performance degraded significantly in aerial videos due to target size, camera motion and target occlusions. The CCOT tracker yielded the best overall performance in terms of accuracy, while the SRDCF tracker was the fastest.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Breton Minnehan, Anthony Salmin, Karl Salva, and Andreas Savakis "Benchmarking deep learning trackers on aerial videos", Proc. SPIE 10649, Pattern Recognition and Tracking XXIX, 1064915 (30 April 2018);


Back to Top