19 December 2001 Extracting movie scenes based on multimodal information
Author Affiliations +
This research addresses the problem of automatically extracting semantic video scenes from daily movies using multimodal information. A 3-stage scene detection scheme is proposed. In the first stage, we use pure visual information to extract a coarse-level scene structure based on generated shot sinks. In the second stage, the audio cue is integrated to further refine scene detection results by considering various kinds of audio scenarios. Finally, in the third stage, we allow users to directly interact with the system so as to fine-tune the detection results to their own satisfaction. The generated scene structure can provide a compact yet meaningful abstraction of the video data, which will apparently facilitate the content access. Preliminary experiments on integrating multiple media cues for movie scene extraction have yielded encouraging results.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ying Li, Ying Li, C.-C. Jay Kuo, C.-C. Jay Kuo, } "Extracting movie scenes based on multimodal information", Proc. SPIE 4676, Storage and Retrieval for Media Databases 2002, (19 December 2001); doi: 10.1117/12.451109; https://doi.org/10.1117/12.451109


Back to Top