19 January 2009 Face and lip tracking in unconstrained imagery for improved automatic speech recognition
Author Affiliations +
Abstract
When combined with acoustical speech information, visual speech information (lip movement) significantly improves Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in mainstream ASR systems due to the difficulty in accurately tracking lips in real-world conditions. This paper presents our current progress in tracking face and lips in visually challenging environments. Findings suggest the mean shift algorithm performs poorly for small regions, in this case the lips, but it achieves near 80% accuracy for facial tracking.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Brandon Crow, Jane Xiaozheng Zhang, "Face and lip tracking in unconstrained imagery for improved automatic speech recognition", Proc. SPIE 7257, Visual Communications and Image Processing 2009, 72571Y (19 January 2009); doi: 10.1117/12.817092; https://doi.org/10.1117/12.817092
PROCEEDINGS
11 PAGES


SHARE
Back to Top