19 February 2014 Additive attacks on speaker recognition
Author Affiliations +
Abstract
Speaker recognition is used to identify a speaker's voice from among a group of known speakers. A common method of speaker recognition is a classification based on cepstral coefficients of the speaker's voice, using a Gaussian mixture model (GMM) to model each speaker. In this paper we try to fool a speaker recognition system using additive noise such that an intruder is recognized as a target user. Our attack uses a mixture selected from a target user's GMM model, inverting the cepstral transformation to produce noise samples. In our 5 speaker data base, we achieve an attack success rate of 50% with a noise signal at 10dB SNR, and 95% by increasing noise power to 0dB SNR. The importance of this attack is its simplicity and flexibility: it can be employed in real time with no processing of an attacker's voice, and little computation is needed at the moment of detection, allowing the attack to be performed by a small portable device. For any target user, knowing that user's model or voice sample is sufficient to compute the attack signal, and it is enough that the intruder plays it while he/she is uttering to be classiffed as the victim.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Alireza Farrokh Baroughi, Alireza Farrokh Baroughi, Scott Craver, Scott Craver, } "Additive attacks on speaker recognition", Proc. SPIE 9028, Media Watermarking, Security, and Forensics 2014, 90280Q (19 February 2014); doi: 10.1117/12.2040872; https://doi.org/10.1117/12.2040872
PROCEEDINGS
13 PAGES


SHARE
Back to Top