Paper
18 April 2006 CommonSense: a preprocessing system to identify errors in large transcribed corpora
Author Affiliations +
Abstract
A system was designed to locate and correct errors in large transcribed corpora. The program, called CommonSense, relies on a set of rules that identify mistakes related to homonyms, words with distinct definitions but identical pronunciations. The system was run on the 1996 and 1997 Broadcast News Speech Corpora, and correctly identified more than 400 errors in these data. Future work may extend CommonSense to automatically correct errors in hypothesis files created as the output of speech recognition systems.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ryan Propper, Keyvan Mohajer, and Vaughan Pratt "CommonSense: a preprocessing system to identify errors in large transcribed corpora", Proc. SPIE 6242, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006, 62420B (18 April 2006); https://doi.org/10.1117/12.663836
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Speech recognition

Associative arrays

System identification

Error analysis

Systems modeling

Computing systems

Acoustics

Back to Top