In this paper we present a statistical analysis of a particular audio fingerprinting method proposed by Haitsma et al.1 Due to the excellent robustness and synchronisation properties of this particular fingerprinting method, we would like to examine its performance for varying values of the parameters involved in the computation and ascertain its capabilities. For this reason, we pursue a statistical model of the fingerprint (also known as a hash, message digest or label). Initially we follow the work of a previous attempt made by Doets and Lagendijk2-4 to obtain such a statistical model. By reformulating the representation of the fingerprint as a quadratic form, we present a model in which the parameters derived by Doets and Lagendijk may be obtained more easily. Furthermore, our model allows further insight into certain aspects of the behaviour of the fingerprinting algorithm not previously examined. Using our model, we then analyse the probability of error (Pe) of the hash. We identify two particular error scenarios and obtain an expression for the probability of error in each case. We present three methods of varying accuracy to approximate Pe following Gaussian noise addition to the signal of interest. We then analyse the probability of error following desynchronisation of the signal at the input of the hashing system and provide an approximation to Pe for different parameters of the algorithm under varying degrees of desynchronisation.
Compact representation of perceptually relevant parts of multimedia data, referred to as robust hashing or fingerprinting, is often used for efficient retrieval from databases and authentication. In previous work, we introduced a framework for robust hashing which improves the performance of any particular feature extraction method. The hash generation was achieved from a feature vector in three distinct stages, namely: quantization, bit assignment and application of the decoding stage of an error correcting code. Results were obtained for unidimensional quantization and bit assignment, on one code only. In this work, we provide a generalisation of those techniques to higher dimensions. Our framework is analysed under different conditions at each stage. For the quantization, we consider both the case where the codevectors are uniformly and nonuniformly distributed. For multidimensional quantizers, bit assignment to the resulting indexes is a non-trivial task and a number of techniques are evaluated. We show that judicious assignment of binary indices to the codevectors of the quantizer improves the performance of the hashing method. Finally, the robustness provided by a number of different channel codes is evaluated.