Paper
11 November 2021 Learning human-object interaction detection via deformable transformer
Author Affiliations +
Proceedings Volume 12076, 2021 International Conference on Image, Video Processing, and Artificial Intelligence; 1207602 (2021) https://doi.org/10.1117/12.2606873
Event: Fourth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2021), 2021, Shanghai, China
Abstract
The goal of human-object interaction (HOI) detection is to localize both the human and object in a picture and recognize the interactions between them. HOIs are always scattering in the image. The traditional methods based on CNNs are unable to aggregate the information scattered in the image. Many new methods utilizing the contextual features cropped from the outputs of the CNNs, which sometimes are not effective enough. To overcome the challenge, we utilize the deformable transformer to aggregate the whole features output form the CNNs. The attention mechanism and query-based predictions are the keys. In view of the success of the methods based on graph neural networks, the attention mechanism is proved to be effective to aggregate the contextual information image-wide. The queries can extract the features of each human-object pair without mixing up the features of other instances. The deformable transformer can extract effective embeddings and the prediction heads can be fairly simple. Experimental results show that the proposed method is effective in HOI detection.
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Shuang Cai, Shiwei Ma, and Dongzhou Gu "Learning human-object interaction detection via deformable transformer", Proc. SPIE 12076, 2021 International Conference on Image, Video Processing, and Artificial Intelligence, 1207602 (11 November 2021); https://doi.org/10.1117/12.2606873
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Computer programming

Target detection

Visualization

Neural networks

Feature extraction

Network architectures

RELATED CONTENT


Back to Top