10 January 2003 Procedure for audio-assisted browsing of news video using generalized sound recognition
Author Affiliations +
In Casey describes a generalized sound recognition framework based on reduced rank spectra and Minimum-Entropy Priors. This approach enables successful recognition of a wide variety of sounds such as male speech, female speech, music, animal sounds etc. In this work, we apply this recognition framework to news video to enable quick video browsing. We identify speaker change positions in the broadcast news using the sound recognition framework. We combine the speaker change position with color & motion cues from video and are able to locate the beginning of each of the topics covered by the news video. We can thus skim the video by merely playing a small portion starting from each of the locations where one of the principal cast begins to speak. In combination with our motion-based video browsing approach, our technique provides simple automatic news video browsing. While similar work has been done before, our approach is simpler and faster than competing techniques, and provides a rich framework for further analysis and description of content.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ajay Divakaran, Ajay Divakaran, Regunathan Radhakrishnan, Regunathan Radhakrishnan, Ziyou Xiong, Ziyou Xiong, Michael Casey, Michael Casey, } "Procedure for audio-assisted browsing of news video using generalized sound recognition", Proc. SPIE 5021, Storage and Retrieval for Media Databases 2003, (10 January 2003); doi: 10.1117/12.476294; https://doi.org/10.1117/12.476294

Back to Top