We propose a new user authentication system based on spoken signatures, where online signature and speech signals are acquired simultaneously. The main benefit of this multimodal approach is better accuracy at no extra cost for the user in terms of access time or inconvenience. Another benefit lies in a better robustness against intentional forgeries due to the extra difficulty for the forger to produce both signals. We set up an experimental framework to measure these benefits on MyIDea, a realistic multimodal biometric database publicly available. More specifically, we evaluate the performance of state of the art modeling systems based on Gaussian mixture models (GMM) and hidden Markov models (HMM) applied independently to the pen and voice signal, where a simple rule-based score fusion procedure is used. We conclude that the best performance is achieved by the HMMs, provided that their topology is optimized on a per user basis. Furthermore, we show that more precise models can be obtained through the use of maximum a posteriori probability (MAP) training instead of the classically used expectation maximization (EM). We also measure the impact of multisession scenarios versus monosession scenarios, and the impact of skilled versus unskilled signature forgeries attacks.