26 November 2003 Investigation on effectiveness of mid-level feature representation for semantic boundary detection in news video
Author Affiliations +
In our past work, we have attempted to use a mid-level feature namely the state population histogram obtained from the Hidden Markov Model (HMM) of a general sound class, for speaker change detection so as to extract semantic boundaries in broadcast news. In this paper, we compare the performance of our previous approach with another approach based on video shot detection and speaker change detection using the Bayesian Information Criterion (BIC). Our experiments show that the latter approach performs significantly better than the former. This motivated us to examine the mid-level feature closely. We found that the component population histogram enabled discovery of broad phonetic categories such as vowels, nasals, fricatives etc, regardless of the number of distinct speakers in the test utterance. In order for it to be useful for speaker change detection, the individual components should model the phonetic sounds of each speaker separately. From our experiments, we conclude that state/component population histograms can only be useful for further clustering or semantic class discovery if the features are chosen carefully so that the individual states represent the semantic categories of interest.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Regunathan Radhakrishan, Regunathan Radhakrishan, Ziyou Xiong, Ziyou Xiong, Ajay Divakaran, Ajay Divakaran, Bhiksha Raj, Bhiksha Raj, } "Investigation on effectiveness of mid-level feature representation for semantic boundary detection in news video", Proc. SPIE 5242, Internet Multimedia Management Systems IV, (26 November 2003); doi: 10.1117/12.514397; https://doi.org/10.1117/12.514397


Hierarchical video classification with mixed media cues
Proceedings of SPIE (December 31 2000)
Supervised multimedia categorization
Proceedings of SPIE (January 09 2003)

Back to Top