6 October 1997 Keyword spotting for multimedia document indexing
Author Affiliations +
Proceedings Volume 3229, Multimedia Storage and Archiving Systems II; (1997) https://doi.org/10.1117/12.290357
Event: Voice, Video, and Data Communications, 1997, Dallas, TX, United States
Abstract
We tackle the problem of multimedia indexing using keyword spotting on the spoken part of the data. Word spotting systems for indexing have to meet vary hard specifications: short response times to queries, speaker independent mode, open vocabulary in order to be able to track any keyword. To meet these constraints keyword models should be build according to their phonetic spelling and the process should be divided in two parts: preprocessing of the speech signal and query over a lattice of hypotheses. Different classification criteria have been studied for hypothesis generation: frame labeling, maximum likelihood and maximum a posteriori (MAP). The hypothesis probability is computed either through standard gaussian model or through a hybrid Hidden Markov Model-Neural Network. The training of the phonemic models is based either on Viterbi alignment or on recursive estimation and maximization of a posteriori probabilities. In the latter discriminant properties between phonemes are enforced. Tests have been conducted on TIMIT database as well as on TV news soundtracks. Interesting results have been obtained in time saving for the documentalist. The ultimate goal is to couple the soundtrack indexing with tools for video indexing in order to enhance the robustness of the system.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Philippe Gelin, Philippe Gelin, Christian J. Wellekens, Christian J. Wellekens, } "Keyword spotting for multimedia document indexing", Proc. SPIE 3229, Multimedia Storage and Archiving Systems II, (6 October 1997); doi: 10.1117/12.290357; https://doi.org/10.1117/12.290357
PROCEEDINGS
12 PAGES


SHARE
Back to Top