19 December 2001 Extracting movie scenes based on multimodal information
Author Affiliations +
Proceedings Volume 4676, Storage and Retrieval for Media Databases 2002; (2001); doi: 10.1117/12.451109
Event: Electronic Imaging, 2002, San Jose, California, United States
This research addresses the problem of automatically extracting semantic video scenes from daily movies using multimodal information. A 3-stage scene detection scheme is proposed. In the first stage, we use pure visual information to extract a coarse-level scene structure based on generated shot sinks. In the second stage, the audio cue is integrated to further refine scene detection results by considering various kinds of audio scenarios. Finally, in the third stage, we allow users to directly interact with the system so as to fine-tune the detection results to their own satisfaction. The generated scene structure can provide a compact yet meaningful abstraction of the video data, which will apparently facilitate the content access. Preliminary experiments on integrating multiple media cues for movie scene extraction have yielded encouraging results.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ying Li, C.-C. Jay Kuo, "Extracting movie scenes based on multimodal information", Proc. SPIE 4676, Storage and Retrieval for Media Databases 2002, (19 December 2001); doi: 10.1117/12.451109; https://doi.org/10.1117/12.451109



Information visualization

Semantic video

Feature extraction

Signal detection



Semantic filtering of video content
Proceedings of SPIE (January 01 2001)
Content-based video retrieval and summarization using MPEG-7
Proceedings of SPIE (December 15 2003)
MPEG-7 applications
Proceedings of SPIE (March 22 2001)

Back to Top