We are presenting a method for the improvement of small scale text independent automatic speaker identification
systems. A small scale identification system is a system with a relatively small number of enrolled speakers
(20 or less). The proposed improvement is obtained from adaptive frequency warping. Most modern speaker
identification systems employ a short-time speech feature extraction method that relies on frequency warped
cepstral representations. One of the most popular frequency warping types is based on the mel-scale. While
the mel-scale provides a substantial boost in recognition performance for large scale systems, it is suboptimal
for small scale systems. With experiments we have shown that our methodology has the potential to reduce the
error rate of small scale systems by 24% over the mel-scale approach.
We are presenting a new method for super resolution tracking of frequency modulated sinusoids in white noise.
The method is specifically designed to handle the rapid transient problem, i.e. the problem of tracking a
continuous, rapidly changing instantaneous frequency contour. The proposed method employs two components: 1)
an adaptive generalized scale transform 1,2 which applies a localized change of time-frequency coordinates within
the given signal, and 2) an estimation of signal parameters by rotational invariance techniques3 (ESPRIT). With
experiments we have shown that the proposed method provides a significantly higher estimation accuracy than
conventional methods.3 With an optimal choice of transform parameters the estimation error can be reduced
dramatically. Error reductions of over 40% have been observed.
We are proposing a new method for the denoising of speech in dedicated single channel speech communication
systems. Dedicated speech communication systems are optimized for the use by a dedicated speaker. Our procedure
employs a speech production model that combines a fixed (but speaker dependent) glottal excitation process
with an adaptive autoregressive filter. Denoising is performed in two steps: 1) model parameter estimation and
2) signal resynthesis from the proposed model. Our parameter estimation procedure is inspired by a maximum
likelihood approach that utilizes learned parameter statistics from a training process. The procedure produces
improvements that are perceptually comparable to improvements obtained by adaptive Wiener filtering and
spectral subtraction. Performance validation was performed with speech signals from the VOICES database and
noise signals from the NOISEX database.
We are presenting an extension to the classic multiple signal classification method (MUSIC) developed by Schmidt and Bienvenu in 1979. While the classic MUSIC algorithm is limited to the detection of constant frequency sinusoids in white noise, the proposed new method is capable of detecting signals with a continuously varying instantaneous frequency. The method is based on the development of a discrete-time version of the generalized scale transform (GST) which was introduced by Nickel and Williams in 1999. As a byproduct we obtain techniques for discrete-time warp-shift invariant filtering which can be used in addition to the signal detection to separate signals with different instantaneous frequency contours.
Proc. SPIE. 4116, Advanced Signal Processing Algorithms, Architectures, and Implementations X
KEYWORDS: Fourier transforms, Transform theory, Numerical analysis, Signal processing, Solids, System identification, Signal generators, Information theory, Optimization (mathematics), Probability theory
We have recently introduced the class of generalized scale transforms and its subclass of warped Fourier transforms. Members in each class are defined by continuous time warping functions. While the two transforms admit a mathematically elegant analysis of warp-shift invariant systems it is still unclear how to design warping functions that deliver optimal representations for a given class of signals or systems. In many cases we can obtain an optimal choice for the warping function via a closed form analysis of the system that generates the signal of interest. In cases in which a closed form analysis is not possible we have to rely on a warp function estimation method. The approach we are taking in this paper is founded in information theory. We consider the observed signal as a random process. A power estimate of the warped Fourier transform parameterized by an underlying warping function is obtained from a finite number of realizations. We treat the power estimate as a probability density in warp-frequency and minimize its differential entropy over the space of admissible warping functions. We use an iterative numerical method for the minimization process. A proper formulation of a discrete time warped Fourier transform is employed as a foundation for the numerical analysis. Applications of the proposed algorithm can be found in detection, system identification, and data-compression.
We are presenting a new class of transforms which facilitates the processing of signals that are nonlinearly stretched or compressed in time. We refer to nonlinear stretching and compression as warping. While the magnitude of the Fourier transform is invariant under time shift operations, and the magnitude of the scale transform is invariant under (linear) scaling operations, the new class of transforms is magnitude invariant under warping operations. The new class contains the Fourier transform and the scale transform as special cases. Important theorems, like the convolution theorem for Fourier transforms, are generalized into theorems that apply to arbitrary members of the transform class. Cohen's class of time-frequency distributions is generalized to joint representations in time and arbitrary warping variables. Special attention is paid to a modification of the new class of transforms that maps an arbitrary time-frequency contour into an impulse in the transforms that maps an arbitrary time-frequency contour into an impulse in the transform domain. A chirp transform is derived as an example.
Proc. SPIE. 3461, Advanced Signal Processing Algorithms, Architectures, and Implementations VIII
KEYWORDS: Nickel, Interference (communication), Linear filtering, Frequency modulation, Signal processing, Smoothing, Spectral resolution, Electrical engineering, Time-frequency analysis, Correlation function
One of the key problems in high resolution, time-varying spectral analysis is the suppression of interference terms which can obscure the true location of auto components in the resulting time-frequency distribution (TFD). Commonly used reduced interference distributions tackle the problem with a properly chosen 2D low pass filter (kernel). A recently published novel approach uses alternative means to achieve the desired goal. The idea of the new method is to obtain an estimate of the cross terms form a given prior distribution based on the magnitude and location of its negative components. The estimate is constructed via an iterative projection method that guarantees that the resulting distribution is positive and satisfies the marginals. Even though the marginals are usually a desirable property of TFDs in general, they can impose an undesirably strong constraint on positive TFDs in particular. For these cases it is thus beneficial to relax the marginals-constraints. In this paper we present a new method that does not require the incorporation of this constraint and thus leads to positive TFDs with reduced interference terms but without the restrictions due to the marginals.