Translator Disclaimer
24 August 1999 Audio-video feature correlation: faces and speech
Author Affiliations +
Proceedings Volume 3846, Multimedia Storage and Archiving Systems IV; (1999)
Event: Photonics East '99, 1999, Boston, MA, United States
This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Gwenael Durand, Claude Montacie, Marie-Jose Caraty, and Pascal Faudemay "Audio-video feature correlation: faces and speech", Proc. SPIE 3846, Multimedia Storage and Archiving Systems IV, (24 August 1999);

Back to Top