10 September 2007 Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation
Author Affiliations +
Proceedings Volume 6777, Multimedia Systems and Applications X; 67770H (2007) https://doi.org/10.1117/12.740116
Event: Optics East, 2007, Boston, MA, United States
This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
© (2007) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hanwu Sun, Hanwu Sun, Tin Lay Nwe, Tin Lay Nwe, Eugene Chin Wei Koh, Eugene Chin Wei Koh, Ma Bin, Ma Bin, Haizhou Li, Haizhou Li, } "Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation", Proc. SPIE 6777, Multimedia Systems and Applications X, 67770H (10 September 2007); doi: 10.1117/12.740116; https://doi.org/10.1117/12.740116

Back to Top