6 April 1995 Isolated digit recognition without time alignment
Author Affiliations +
A method for isolated digit recognition without time alignment is examined in this paper. Rather than providing a classifier with feature vectors generated from frames of data (typically at rates near 100 per second) over the word's duration, this method uses only one feature vector per word. A baseline speaker-independent recognition accuracy of 98.1% is established with intraword time alignment from the male speaker/digit subset of a Texas Instruments database using dynamic time warping (DTW) and 12 LPC cepstral coefficients as features. Without intraword time alignment and 12 time-averages LPC cepstral coefficients as feature vectors with a multilayer perceptron (MLP) classifier, the recognition accuracy is 78.4%. By augmenting the feature vectors with 9 time-averaged critical band energy elements and 10 time-averaged LPC coefficients, the accuracy increases to 97.1%. This difference between methods is not statistically significant at the 95% confidence level. Thus, time alignment is demonstrated not to be a critical factor for the digit recognition task. Advantages of the proposed method are that (1) intraword time alignment is not required, and (2) only a single feature vector is computed per utterance. The advantages come at the expense of requiring additional information in the feature vectors relative to a DTW-based classifier.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jeffrey M. Gay, Jeffrey M. Gay, Martin P. DeSimio, Martin P. DeSimio, } "Isolated digit recognition without time alignment", Proc. SPIE 2492, Applications and Science of Artificial Neural Networks, (6 April 1995); doi: 10.1117/12.205184; https://doi.org/10.1117/12.205184


Back to Top