Translator Disclaimer
24 August 1999 Video content parsing based on combined audio and visual information
Author Affiliations +
Proceedings Volume 3846, Multimedia Storage and Archiving Systems IV; (1999)
Event: Photonics East '99, 1999, Boston, MA, United States
While previous research on audiovisual data segmentation and indexing primarily focuses on the pictorial part, significant clues contained in the accompanying audio flow are often ignored. A fully functional system for video content parsing can be achieved more successfully through a proper combination of audio and visual information. By investigating the data structure of different video types, we present tools for both audio and visual content analysis and a scheme for video segmentation and annotation in this research. In the proposed system, video data are segmented into audio scenes and visual shots by detecting abrupt changes in audio and visual features, respectively. Then, the audio scene is categorized and indexed as one of the basic audio types while a visual shot is presented by keyframes and associate image features. An index table is then generated automatically for each video clip based on the integration of outputs from audio and visual analysis. It is shown that the proposed system provides satisfying video indexing results.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tong Zhang and C.-C. Jay Kuo "Video content parsing based on combined audio and visual information", Proc. SPIE 3846, Multimedia Storage and Archiving Systems IV, (24 August 1999);


Back to Top