19 January 2009 Face and lip tracking in unconstrained imagery for improved automatic speech recognition
Author Affiliations +
When combined with acoustical speech information, visual speech information (lip movement) significantly improves Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in mainstream ASR systems due to the difficulty in accurately tracking lips in real-world conditions. This paper presents our current progress in tracking face and lips in visually challenging environments. Findings suggest the mean shift algorithm performs poorly for small regions, in this case the lips, but it achieves near 80% accuracy for facial tracking.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Brandon Crow, Brandon Crow, Jane Xiaozheng Zhang, Jane Xiaozheng Zhang, "Face and lip tracking in unconstrained imagery for improved automatic speech recognition", Proc. SPIE 7257, Visual Communications and Image Processing 2009, 72571Y (19 January 2009); doi: 10.1117/12.817092; https://doi.org/10.1117/12.817092


Automatic lip reading by using multimodal visual features
Proceedings of SPIE (February 02 2014)
An analysis of automatic human detection and tracking
Proceedings of SPIE (December 07 2015)
Lip-reading enhancement for law enforcement
Proceedings of SPIE (September 27 2006)
Sled tracking system
Proceedings of SPIE (July 31 1991)

Back to Top