Paper
21 August 2013 Implicit prosody mining based on the human eye image capture technology
Author Affiliations +
Proceedings Volume 8908, International Symposium on Photoelectronic Detection and Imaging 2013: Imaging Sensors and Applications; 89081X (2013) https://doi.org/10.1117/12.2034656
Event: ISPDI 2013 - Fifth International Symposium on Photoelectronic Detection and Imaging, 2013, Beijing, China
Abstract
The technology of eye tracker has become the main methods of analyzing the recognition issues in human-computer interaction. Human eye image capture is the key problem of the eye tracking. Based on further research, a new human-computer interaction method introduced to enrich the form of speech synthetic. We propose a method of Implicit Prosody mining based on the human eye image capture technology to extract the parameters from the image of human eyes when reading, control and drive prosody generation in speech synthesis, and establish prosodic model with high simulation accuracy. Duration model is key issues for prosody generation. For the duration model, this paper put forward a new idea for obtaining gaze duration of eyes when reading based on the eye image capture technology, and synchronous controlling this duration and pronunciation duration in speech synthesis. The movement of human eyes during reading is a comprehensive multi-factor interactive process, such as gaze, twitching and backsight. Therefore, how to extract the appropriate information from the image of human eyes need to be considered and the gaze regularity of eyes need to be obtained as references of modeling. Based on the analysis of current three kinds of eye movement control model and the characteristics of the Implicit Prosody reading, relative independence between speech processing system of text and eye movement control system was discussed. It was proved that under the same text familiarity condition, gaze duration of eyes when reading and internal voice pronunciation duration are synchronous. The eye gaze duration model based on the Chinese language level prosodic structure was presented to change previous methods of machine learning and probability forecasting, obtain readers’ real internal reading rhythm and to synthesize voice with personalized rhythm. This research will enrich human-computer interactive form, and will be practical significance and application prospect in terms of disabled assisted speech interaction. Experiments show that Implicit Prosody mining based on the human eye image capture technology makes the synthesized speech has more flexible expressions.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Pei-pei Gao and Feng Liu "Implicit prosody mining based on the human eye image capture technology", Proc. SPIE 8908, International Symposium on Photoelectronic Detection and Imaging 2013: Imaging Sensors and Applications, 89081X (21 August 2013); https://doi.org/10.1117/12.2034656
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Eye

Eye models

Control systems

Visual process modeling

Cognitive modeling

Mining

Signal processing

RELATED CONTENT

Text density, eye movements, and reading
Proceedings of SPIE (October 01 1990)
The scanpath theory: its definition and later developments
Proceedings of SPIE (February 09 2006)
Top-down vision in humans and robots
Proceedings of SPIE (September 08 1993)

Back to Top