18 April 2006 CommonSense: a preprocessing system to identify errors in large transcribed corpora
Author Affiliations +
Abstract
A system was designed to locate and correct errors in large transcribed corpora. The program, called CommonSense, relies on a set of rules that identify mistakes related to homonyms, words with distinct definitions but identical pronunciations. The system was run on the 1996 and 1997 Broadcast News Speech Corpora, and correctly identified more than 400 errors in these data. Future work may extend CommonSense to automatically correct errors in hypothesis files created as the output of speech recognition systems.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ryan Propper, Ryan Propper, Keyvan Mohajer, Keyvan Mohajer, Vaughan Pratt, Vaughan Pratt, } "CommonSense: a preprocessing system to identify errors in large transcribed corpora", Proc. SPIE 6242, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006, 62420B (18 April 2006); doi: 10.1117/12.663836; https://doi.org/10.1117/12.663836
PROCEEDINGS
6 PAGES


SHARE
Back to Top