19 January 2009 Face and lip tracking in unconstrained imagery for improved automatic speech recognition
Author Affiliations +
When combined with acoustical speech information, visual speech information (lip movement) significantly improves Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in mainstream ASR systems due to the difficulty in accurately tracking lips in real-world conditions. This paper presents our current progress in tracking face and lips in visually challenging environments. Findings suggest the mean shift algorithm performs poorly for small regions, in this case the lips, but it achieves near 80% accuracy for facial tracking.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Brandon Crow, Brandon Crow, Jane Xiaozheng Zhang, Jane Xiaozheng Zhang, } "Face and lip tracking in unconstrained imagery for improved automatic speech recognition", Proc. SPIE 7257, Visual Communications and Image Processing 2009, 72571Y (19 January 2009); doi: 10.1117/12.817092; https://doi.org/10.1117/12.817092


Automatic lip reading by using multimodal visual features
Proceedings of SPIE (February 03 2014)
A template matching acceleration algorithm based on Cuda
Proceedings of SPIE (August 09 2018)
Lip-reading enhancement for law enforcement
Proceedings of SPIE (September 28 2006)
Tracking white road line by particle filter from the video...
Proceedings of SPIE (February 02 2012)
Visual tracking by threshold and scale-based particle filter
Proceedings of SPIE (November 15 2007)

Back to Top