Translator Disclaimer
20 July 2001 Automatic movie index generation based on multimodal information
Author Affiliations +
Proceedings Volume 4519, Internet Multimedia Management Systems II; (2001)
Event: ITCom 2001: International Symposium on the Convergence of IT and Communications, 2001, Denver, CO, United States
A fundamental task in video analysis is to organize and index multimedia data in a meaningful manner so as to facilitate user access for tasks such as browsing and retrieval. This paper addresses the problem of automatic index generation of movie databases based on audiovisual information. In particular, given a movie we first extract key movie events including two-speaker dialog scenes, multiple-speaker dialog scenes and hybrid scenes by using the proposed window-based sweep algorithm and the K-means clustering algorithms. Following event detection, the identity of each individual speaker in a dialog scene is recognized based on a statistical maximum likelihood approach. The identification relies on the likelihood ratio calculation between the incoming speech data and Gaussian mixture models of the speakers and the background. It is evident that the event and the speaker identity information will serve as a crucial part of the movie index table. Preliminary experimental results show that, by integrating multiple media information, we can obtain robust and meaningful event detection and speaker identification results.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ying Li, Shrikanth S Narayanan, Wei H. Ming, and C.-C. Jay Kuo "Automatic movie index generation based on multimodal information", Proc. SPIE 4519, Internet Multimedia Management Systems II, (20 July 2001);

Back to Top