17 December 1998 Audio-guided audiovisual data segmentation, indexing, and retrieval
Author Affiliations +
Abstract
While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data, based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e., speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based upon morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes, such as applause, explosions, bird sounds, etc. This fine-level classification and indexing step is based upon time- frequency analysis of audio signals and the use of the hidden Markov model as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90 percent for the coarse-level classification, and higher than 85 percent for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tong Zhang, C.-C. Jay Kuo, "Audio-guided audiovisual data segmentation, indexing, and retrieval", Proc. SPIE 3656, Storage and Retrieval for Image and Video Databases VII, (17 December 1998); doi: 10.1117/12.333851; https://doi.org/10.1117/12.333851
PROCEEDINGS
12 PAGES


SHARE
Back to Top