Translator Disclaimer
17 January 2005 Multi-modal analysis for person type classification in news video
Author Affiliations +
Classifying the identities of people appearing in broadcast news video into anchor, reporter, or news subject is an important topic in high-level video analysis, which remains as a missing piece in the existing research. Given the visual resemblance of different types of people, this work explores multi-modal features derived from a variety of evidences, including the speech identity, transcript clues, temporal video structure, named entities, and face information. A Support Vector Machine (SVM) model is trained on manually-classified people to combine the multitude of features to predict the types of people who are giving monologue-style speeches in news videos. Experiments conducted on ABC World News Tonight video have demonstrated that this approach can achieve over 93% accuracy on classifying person types. The contributions of different categories of features have been compared, which shows that the relatively understudied features such as speech identities and video temporal structure are very effective in this task.
© (2005) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jun Yang and Alexander G. Hauptmann "Multi-modal analysis for person type classification in news video", Proc. SPIE 5682, Storage and Retrieval Methods and Applications for Multimedia 2005, (17 January 2005);


Back to Top