Translator Disclaimer
19 October 1998 Warping functions in speech
Author Affiliations +
We describe experiments that we have performed that address the issue of the relation between the same enunciations by different speakers. Our previous work indicated that frequencies are approximately scaled uniformity. In this paper we report results addressing possible corrections to uniform scaling. Our results show that the scaling is non uniform, that is the format frequencies of different speakers scale differently at different frequencies. We discuss how this leads to the mathematical issue of separating the spectrum into a speaker dependent and speaker independent parts. We introduce the concept of a universal scaling function that is aimed at achieving this separation. The fundamental idea is to find a frequency axis transformation (warping function) which transforms the energy density spectrum (the squared absolute value of the Fourier transform of the enunciation) in such a way that a further Fourier transform of the resulting function achieves this separation. We discuss this procedure and relate it to the scale transform. Using real speech data we obtain such a transformation function. The resulting function is very similar to the Mel scale, which has been previously obtained only from psychoacoustic (hearing based) experiments. That similar scales are obtained from both hearing and speech production (as reported here) is fundamental to the understanding of speech and hearing.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
S. Umesh, Leon Cohen, and Douglas J. Nelson "Warping functions in speech", Proc. SPIE 3458, Wavelet Applications in Signal and Imaging Processing VI, (19 October 1998);

Back to Top