23 October 1996 Psychoacoustic frequency scales versus frequency warping in scale cepstrum
Author Affiliations +
Abstract
In this paper, we derive a frequency-warping function by analyzing speech data obtained from the TIMIT database. Until now, numerous frequency scales have been proposed, based purely on psychoacoustic studies. Many speech recognition algorithms have been using such frequency scales for the spectral analysis at the signal processing front- end. The motivation for the use of such psychoacoustic frequency scales, is that, since these are based on the properties of the human auditory perception, they may provide accurate representation of the relevant spectral information in speech. Since the preference of one scale over another is ad hoc, and since the goal is to achieve better recognition, experiments are conducted to determine if better recognition rates are indeed obtained using any one such scale. In this paper, we analyze actual speech data, and present evidence of the kind of frequency-warping that may be necessary to achieve speaker-independent recognition of vowels. This provides us with the motivation to use such frequency-warping functions in speech recognition. Surprisingly, the frequency-warping obtained is similar to the Mel-scale obtained from psychoacoustic studies. This suggests that the ear may be using such a frequency-warping to remove extraneous speaker-specific information, while identifying and recognizing phonemes.
© (1996) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Srinivasan Umesh, Leon Cohen, Nenad M. Marinovic, Douglas J. Nelson, "Psychoacoustic frequency scales versus frequency warping in scale cepstrum", Proc. SPIE 2825, Wavelet Applications in Signal and Image Processing IV, (23 October 1996); doi: 10.1117/12.255264; https://doi.org/10.1117/12.255264
PROCEEDINGS
10 PAGES


SHARE
Back to Top