The pre-research shows that Linear prediction (LP) residual contains more discriminative information related to replay spoofing attacks, so this paper proposes three features based on LP residual and IMel filter-banks which closely distributed in the high-frequency regions for replay spoofing countermeasures. They are residual IMel frequency cepstral coefficient (RIMFC), LP residual Hilbert envelope IMel frequency cepstral coefficient (LHIMFC) and residual phase cepstral coefficient (RPC). The effectiveness of these features is demonstrated on ASVspoofing2017 Challenge Version 2.0 dataset. Experimental results indicate that the proposed features outperform the baseline system using constant Q cepstral coefficient (CQCC), and the equal error rate (EER) is reduced under the same conditions. Moreover, feature fusions help to achieve higher performance than traditional IMel frequency cepstral coefficient (IMFCC) and CQCC, which indicates that the complementary information of different features is beneficial for detecting replay attacks.
Exemplar-based voice conversion (VC) methods have several disadvantages: too many exemplars, phoneme mismatches, and low conversion efficiency. To solve these problems, this paper proposes a voice conversion method based on nonnegative matrix factorization (NMF) using Dictionary optimization and clustering, which applies low-resolution features instead of high-resolution features to construct dictionaries. Dictionary optimization based on minimizing cepstrum distortion selects some fitter exemplars from the original dictionary. Exemplar clustering divides the dictionary into multiple sub-dictionaries which have better representation based on feature parameters. The ARCTIC database is used for experiments. Results show that the proposed method can significantly improve the quality of converted speech while reducing the number of exemplars and improving efficiency.
Compared with the original speech, the replay attack speech passes through a complex channel mainly composed of a recording device and a playback device, and the frequency response of the channel causes a obvious change to the high and low frequency bands of the original speech spectrum. This paper proposed a Channel Difference Enhancement Cepstral Coefficient (CDECC) feature that enhances the channel frequency response difference, and detects the replay attack speech by enhancing the spectral difference caused by the channel frequency response. Experiments based on the ASVspoof 2017 Challenge data set show that the proposed method has a significant improvement in detection performance compared to the baseline system using Constant Q Cepstral Coefficients (CQCC), and the equal error rate (EER) is reduced by 18.20% under the same conditions, indicating that the performance of the CDECC feature is more effective than that of CQCC and MFCC features in detecting replay attack speech.