Paper
29 December 1997 Cross-modal retrieval of scripted speech audio
Charles B. Owen, Fillia Makedon
Author Affiliations +
Proceedings Volume 3310, Multimedia Computing and Networking 1998; (1997) https://doi.org/10.1117/12.298423
Event: Photonics West '98 Electronic Imaging, 1998, San Jose, CA, United States
Abstract
This paper describes an approach to the problem of searching speech-based digital audio using cross-modal information retrieval. Audio containing speech (speech-based audio) is difficult to search. Open vocabulary speech recognition is advancing rapidly, but cannot yield high accuracy in either search or transcription modalities. However, text can be searched quickly and efficiently with high accuracy. Script- light digital audio is audio that has an available transcription. This is a surprisingly large class of content including legal testimony, broadcasting, dramatic productions and political meetings and speeches. An automatic mechanism for deriving the synchronization between the transcription and the audio allows for very accurate retrieval of segments of that audio. The mechanism described in this paper is based on building a transcription graph from the text and computing biphone probabilities for the audio. A modified beam search algorithm is presented to compute the alignment.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Charles B. Owen and Fillia Makedon "Cross-modal retrieval of scripted speech audio", Proc. SPIE 3310, Multimedia Computing and Networking 1998, (29 December 1997); https://doi.org/10.1117/12.298423
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Speech recognition

Legal

Detection and tracking algorithms

Multimedia

Correlation function

Databases

Associative arrays

Back to Top