Paper
1 August 2022 A ViT-based lightweight method for the UAV platform object detection tasks
Zhi Fang, ZhiZhong Xi, MengEn Xu, XiHui Fan
Author Affiliations +
Proceedings Volume 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022); 1225717 (2022) https://doi.org/10.1117/12.2639525
Event: 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 2022, Guangzhou, China
Abstract
Lightweight design is a key way to realize the engineering application of deep learning algorithms on Unmanned Aerial Vehicle (UAV) platform. Aiming at the low detection accuracy of the current real-time object detection algorithms of the UAV platform, a lightweight model based on Vision Transformer (ViT) is designed in this paper. Firstly, a small Convolutional Neural Network (CNN) is used to extract primary features for reducing the number of parameters and computation amount of ViT network in this model, and using window modeling to replace part of the global modeling. Then, a feature-level mask self-supervised training method is applied to pre-train the ViT structure, which helps to accelerate the convergence and avoid a lot of labeling work. Finally, the result compared with other UAV lightweight object detection algorithms in the visdrone2018 dataset shows that this model has higher average accuracy on ensuring real-time speed, and verifies the effectiveness and reference value of the lightweight design method proposed in this paper.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhi Fang, ZhiZhong Xi, MengEn Xu, and XiHui Fan "A ViT-based lightweight method for the UAV platform object detection tasks", Proc. SPIE 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 1225717 (1 August 2022); https://doi.org/10.1117/12.2639525
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Computer programming

Unmanned aerial vehicles

Feature extraction

Data modeling

Transformers

Visual process modeling

Detection and tracking algorithms

RELATED CONTENT

Lightweight 3D DenseNet with improved attention mechanism
Proceedings of SPIE (November 23 2022)
Rotary transformer for image captioning
Proceedings of SPIE (September 09 2022)

Back to Top