The article presents the results of the adaptation of the hybrid HMM-DNN speech synthesis model for use in automated speaker recognition system for critical use (ASRSCU). In particular, the process of learning the HMM-DNN speech synthesis model with the estimation of the difference between the posterior probability distributions of all HMM states and the actual a posteriori probability distribution, calculated by DNN, and the use of semantic information in the speaker recognition process, has been improved. The features that are observed in the sequence of frames to which the input phonogram is divided describe this information. The obtained results allowed improving the efficiency of the textdependent speaker recognition when using ASRSCU in a noisy acoustic environment. The article formulated measures for the structural integration of the HMM-DNN component in ASRSCU and describes the practical aspects of this process. In particular, the choice of the type and the method of normalization of the vectors of basic informative features at the frame level was substantiated, the number of HMM states and GMM parameters were determined depending on the parameters of the chosen formation model, and the procedure for interpreting the recognition results was described. The paper formulates measures to optimize the learning process of the ASRSCU with the HMM-DNN component, which will be exploited in noisy environments.
The article presents the model of neural network in the form of rank configuration. The neurons are assumed to be the nodes of simplex, which presents a rank configuration, and the weights of the neural network are the edges of this simplex in the proposed model. Edges of simplex are marked by ranks of the weights. This approach allows us to evaluate the adequacy of rank configurations to make decisions on a system that already had proven effective in this application. Also such model gives an opportunity to present neurons as binary codes that preserve ranks of distances (DRP-codes) and to build digital model of memory core of memcomputer. The research of the model is carried out on the process of decimal digits recognition by Hopfield net.
The article presents the determination of limitations of automated speaker recognition systems for critical use. The system, unlike the existing speaker recognition systems, allows predicting the authenticity of the recognition results. It was obtained due to the recognition of matrices with values estimates of interclass relations and interclass distances in the space of classes of speakers, which had fallen to the formulated universal criteria of authenticity evaluation of speaker recognition. The obtained theoretical results were embodied in a modification of the input layer of the threelayered perceptron, which finalizes speaker recognition process by convolution deep neural network.
The article studies the dependence of the quality of speakers recognition by convolutional neural network from the type of chosen informative features for use it in automated systems for critical use especially when they are used in the environmental influences. The environmental influences are the noise of high level with a spectrum that correlates with the spectrum of the speech signal or the signal of speaker simulator. Сonvolutional network operation principles for the case of speaker signal recognition, as well as experiments on neural network training and the recognition of speakers on a test samples have been considered. According to the research, it was concluded that the bark-cepstral coefficients make it possible to perform recognition with greater reliability than the spectral parameters of the signal.