19 December 2001 Automatic categorization design for broadcast news
Author Affiliations +
Abstract
This paper discusses our work on automatic categorization of broadcast news based on close caption texts. The multimedia news data under study are first segmented into story units based on video and audio signals with our previous developed algorithms. Based on the time stamp information, close caption texts are segmented into text units corresponding to each story unit. A Bayes network is then trained to automatically classify the story units into fourteen categories. The major contribution of this paper is the idea of category, which represents a higher level of semantic generalization as compared with traditional topics. We discusses in detail the administrated bottom-up clustering algorithm to generate semantically meaningful category framework as well as the training procedures to build the brief network that covers the large broadcast news data set. Using LDC (Linguistic Data Consortium)'s CSR LM 1996 data set, we designed a number of experiments to discuss the relationship between categorization design and the classification performance.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Huitao Luo, Huitao Luo, Qian Huang, Qian Huang, } "Automatic categorization design for broadcast news", Proc. SPIE 4676, Storage and Retrieval for Media Databases 2002, (19 December 2001); doi: 10.1117/12.451099; https://doi.org/10.1117/12.451099
PROCEEDINGS
11 PAGES


SHARE
Back to Top