7 June 2024 Spatio-temporal co-attention fusion network for video splicing localization
Man Lin, Gang Cao, Zijie Lou, Chi Zhang
Author Affiliations +
Abstract

Digital video splicing has become easy and ubiquitous. Malicious users copy some regions of a video and paste them into another video to create realistic forgeries. It is important to blindly detect such forgery regions in videos. A spatio-temporal co-attention fusion network (SCFNet) is proposed for video splicing localization. Specifically, a three-stream network is used as an encoder to capture manipulation traces across multiple frames. The deep interaction and fusion of spatio-temporal forensic features are achieved by the novel parallel and cross co-attention fusion modules. A lightweight multilayer perceptron decoder is adopted to yield a pixel-level tampering localization map. A new large-scale video splicing dataset is created for training the SCFNet. Extensive tests on benchmark datasets show that the localization and generalization performances of our SCFNet outperform the state-of-the-art. Code and datasets are available at https://github.com/multimediaFor/SCFNet.

© 2024 SPIE and IS&T
Man Lin, Gang Cao, Zijie Lou, and Chi Zhang "Spatio-temporal co-attention fusion network for video splicing localization," Journal of Electronic Imaging 33(3), 033027 (7 June 2024). https://doi.org/10.1117/1.JEI.33.3.033027
Received: 30 January 2024; Accepted: 20 May 2024; Published: 7 June 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Video compression

Fusion splicing

Video acceleration

Education and training

Feature fusion

Video coding

RELATED CONTENT


Back to Top