Paper
26 February 2010 Voice conversion using dynamic features for high quality transformation
Wei Wang, Zhen Yang
Author Affiliations +
Proceedings Volume 7546, Second International Conference on Digital Image Processing; 75463Q (2010) https://doi.org/10.1117/12.855168
Event: Second International Conference on Digital Image Processing, 2010, Singapore, Singapore
Abstract
A novel voice morphing method is proposed to make the speech of the source speaker sound like the voice uttered by a target speaker. This method is based on the Gaussian Mixture Model (GMM). However, the traditional GMM has the over-smoothed phenomenon and may get discontinuity of the converted speech due to the inaccuracy of the extracted feature information. In order to overcome it, we consider the dynamic spectral features between frames. The conversion function is also modified to deal with the discontinuities. The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrogram (STRAIGHT) algorithm is adopted for the analysis and synthesis process. Objective and perceptual experiments show that the quality of the speech converted by our proposed method is significantly improved compared with the traditional GMM method.
© (2010) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Wei Wang and Zhen Yang "Voice conversion using dynamic features for high quality transformation", Proc. SPIE 7546, Second International Conference on Digital Image Processing, 75463Q (26 February 2010); https://doi.org/10.1117/12.855168
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Expectation maximization algorithms

Distortion

Dynamical systems

Molybdenum

Feature extraction

Systems modeling

Filtering (signal processing)

Back to Top