Shot Boundary Detection (SBD) also known as a temporal video segmentation is a preprocessing task for multiple videos applications, such as indexing and retrieval. The SBD output provides coherent temporal units which are easy to manipulate. The Most previous works implement theirs frameworks based on visual features to measure similarity for transition detection task. However, the video is very enriched by data which could be beneficial. In this paper, referring to recent multimodal works, we propose to introduce the audio components to increase the SBD task. Firstly, we worked on candidate segments obtained by measuring similarity between low features (SURF, HSF) from original video. Then we used deep features obtained from trained model (Resnet-50) for visual similarity and we introduced the audio segmentation based on Power Spectrum Density (PSD) to contribute for transition detection. The proposed method is evaluated on the clip shots dataset. Experiments on this data show that the proposed multimodal approach can achieve a better performance compared with the state-of-the-art of methods that used visual approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.