Paper
12 April 2010 Learning one-to-many mapping functions for audio-visual integrated perception
Jung-Hui Lim, Do-Kwan Oh, Soo-Young Lee
Author Affiliations +
Abstract
In noisy environment the human speech perception utilizes visual lip-reading as well as audio phonetic classification. This audio-visual integration may be done by combining the two sensory features at the early stage. Also, the top-down attention may integrate the two modalities. For the sensory feature fusion we introduce mapping functions between the audio and visual manifolds. Especially, we present an algorithm to provide one-to-many mapping function for the videoto- audio mapping. The top-down attention is also presented to integrate both the sensory features and classification results of both modalities, which is able to explain McGurk effect. Each classifier is separately implemented by the Hidden-Markov Model (HMM), but the two classifiers are combined at the top level and interact by the top-down attention.
© (2010) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jung-Hui Lim, Do-Kwan Oh, and Soo-Young Lee "Learning one-to-many mapping functions for audio-visual integrated perception", Proc. SPIE 7703, Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering VIII, 77030E (12 April 2010); https://doi.org/10.1117/12.855241
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Associative arrays

Visualization

Video

Sensors

Acoustics

Detection and tracking algorithms

Electronic filtering

RELATED CONTENT

Human assisted robotic exploration
Proceedings of SPIE (May 25 2016)
Is automated conversion of video to text a reality?
Proceedings of SPIE (October 30 2012)
Cross-modal retrieval of scripted speech audio
Proceedings of SPIE (December 29 1997)

Back to Top