6 March 2002 Robust speech separation using visually constructed speech signals
Author Affiliations +
A technique to virtually recreate speech signals entirely from the visual lip motions of a speaker is proposed. By using six geometric parameters of the lips obtained from the Tulips1 database, a virtual speech signal is recreated by using a 3.6s audiovisual training segment as a basis for the recreation. It is shown that the virtual speech signal has an envelope that is directly related to the envelope of the original acoustic signal. This visual signal envelope reconstruction is then used as a basis for robust speech separation where all the visual parameters of the different speakers are available. It is shown that, unlike previous signal separation techniques, which required an ideal mixture of independent signals, the mixture coefficients can be very accurately estimated using the proposed technique in even non-ideal situations.
© (2002) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Parham Aarabi, Parham Aarabi, Negar Habibi Khameneh, Negar Habibi Khameneh, "Robust speech separation using visually constructed speech signals", Proc. SPIE 4731, Sensor Fusion: Architectures, Algorithms, and Applications VI, (6 March 2002); doi: 10.1117/12.458389; https://doi.org/10.1117/12.458389


Automatic lip reading by using multimodal visual features
Proceedings of SPIE (February 02 2014)
Visual words for lip-reading
Proceedings of SPIE (April 28 2010)
Is automated conversion of video to text a reality?
Proceedings of SPIE (October 29 2012)
MPEG-4 outer-inner lip FAP interpolation
Proceedings of SPIE (March 13 2005)

Back to Top