Paper
4 January 2021 Multimodal features for shots boundary detection
Author Affiliations +
Proceedings Volume 11605, Thirteenth International Conference on Machine Vision; 116052A (2021) https://doi.org/10.1117/12.2587152
Event: Thirteenth International Conference on Machine Vision, 2020, Rome, Italy
Abstract
Shot Boundary Detection (SBD) also known as a temporal video segmentation is a preprocessing task for multiple videos applications, such as indexing and retrieval. The SBD output provides coherent temporal units which are easy to manipulate. The Most previous works implement theirs frameworks based on visual features to measure similarity for transition detection task. However, the video is very enriched by data which could be beneficial. In this paper, referring to recent multimodal works, we propose to introduce the audio components to increase the SBD task. Firstly, we worked on candidate segments obtained by measuring similarity between low features (SURF, HSF) from original video. Then we used deep features obtained from trained model (Resnet-50) for visual similarity and we introduced the audio segmentation based on Power Spectrum Density (PSD) to contribute for transition detection. The proposed method is evaluated on the clip shots dataset. Experiments on this data show that the proposed multimodal approach can achieve a better performance compared with the state-of-the-art of methods that used visual approach.
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Mohamed Bouyahi and Yassine Ben Ayed "Multimodal features for shots boundary detection", Proc. SPIE 11605, Thirteenth International Conference on Machine Vision, 116052A (4 January 2021); https://doi.org/10.1117/12.2587152
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top