In this paper an audio separation algorithm is presented, which is based on Independent Component Analysis (ICA). Audio separation could be the basis for many applications for example in the field of telecommunications, quality enhancement of audio recordings or audio classification tasks. Well known ICA algorithms are not usable for real-world recordings at the time, because they are designed for signal mixtures based on linear and over time constant mixing matrices. To adapt a standard ICA algorithm for real-world two-channel auditory scenes with two audio sources, the input audio
streams are segmented in the time domain and a constant mixing matrix within a segment is assumed. The next steps are a time-delay estimation for each audio source in the mixture and a determination of the number of existing sources. In the following processing steps, for each source the input signals are time shifted and a standard ICA for linear mixtures is performed. After that, the remaining tasks are an evaluation of the ICA results and the construction of the resulting audio streams containing the separated sources.
In this paper we present a technique for the separation of harmonic sounds within real sound mixtures for automatic music transcription using Independent Subspace Analysis (ISA). The algorithm is based on the assumption that tones played by an instrument within polyphonic music consist of components that are statistically independent from components of other tones. The first step of the algorithm is a temporal segmentation into note events. Both features in the time domain and in the frequency domain are used to detect segment boundaries, which are represented by starting or decaying tones. Each segment is now examined using the ISA and a set of statistically independent components is calculated. One tone played by an instrument consists of the fundamental frequency and its harmonics. Usually, the ISA results in more independent components than played notes, because not all harmonics are separated to the component containing their fundamental frequencies. Some harmonics are separated in components of its own. Using the Kullback-Leibler divergence components belonging together are grouped. A note classification, which is trained for piano music at the time, is the last step of the algorithm. Results show, that statistic independence is a promising measure for separating sounds into single notes using ISA as a step towards automatic music transcription.
With the multimedia content description interface MPEG-7, we have powerful tools for video indexing, based on which content-based search-and-retrieval with respect to separate shots and scenes in video can be performed. We especially focus on the parametric motion descriptor. The motion parameters, being finally coded in the descriptor values, require robust content extraction methods. In this paper, we introduce our approach to the extraction of global motion from video. For this purpose, we apply a constraint feature point selection and matching approach in order to find correspondences in images. Subsequently, an M-estimator is used for robust estimation of the motion model parameters. We evaluate the performance of our approach using affine and biquadratic motion models, also in comparison with a standard least-median-of-squares based approach to global motion estimation.
Proc. SPIE. 5307, Storage and Retrieval Methods and Applications for Multimedia 2004
KEYWORDS: Detection and tracking algorithms, Databases, Feature extraction, Data acquisition, Data processing, Analytical research, Multimedia, Data communications, Signal detection, Communication engineering
Segmenting audio data into the smallest musical components is the basis for many further meta data extraction algorithms. For example, an automatic music transcription system needs to know where the exact boundaries of each tone are. In this paper a note accurate audio segmentation algorithm based on MPEG-7 low level descriptors is introduced. For a reliable detection of different notes, both features in the time and the frequency domain are used. Because of this, polyphonic instrument mixes and even melodies characterized by human voices can be examined with this alogrithm. For testing
and verification of the note accurate segmentation, a simple music transcription system was implemented. The dominant frequency within each segment is used to build a MIDI file representing the processed audio data.
The huge amount of multimedia data produced worldwide requires
annotation in order to enable universal content access and to
provide content-based search-and-retrieval functionalities. Since
manual video annotation can be time consuming, automatic
annotation systems are required. We review recent approaches to
content-based indexing and annotation of videos for different kind
of sports and describe our approach to automatic annotation of
equestrian sports videos. We especially concentrate on MPEG-7
based feature extraction and content description, where we apply
different visual descriptors for cut detection. Further, we
extract the temporal positions of single obstacles on the course
by analyzing MPEG-7 edge information. Having determined single
shot positions as well as the visual highlights, the information
is jointly stored with meta-textual information in an MPEG-7
description scheme. Based on this information, we generate content
summaries which can be utilized in a user-interface in order to
provide content-based access to the video stream, but further for
media browsing on a streaming server.
In this paper we present an audio thumbnailing technique based on audio segmentation by similarity search. The segmentation is performed on MPEG-7 low level audio feature descriptors as a growing source of multimedia meta data. Especially for database applications or audio-on-demand services this technique could be very helpful, because there is no need to have access to the probably copyright protected original audio material. The result of the similarity search is a matrix which contains off-diagonal stripes representing similar regions, which are usually the refrains of a song and thus a very suitable segment to be used as audio thumbnail. Using the a priori knowledge that we search off-diagonal stripes which must represent several seconds of audio data and that the adjustment of the stripes must be characteristically, we implemented a filter to enhance the structure of the similarity matrix and to extract a relevant segment as an audio thumbnail.
Driven by increasing amount of music available electronically the need and possibility of automatic classification systems for music becomes more and more important. Currently most search engines for music are based on textual descriptions like artist or/and title.
This paper presents a system for automatic music description, classification and visualization for a set of songs. The system is designed to extract significant features of a piece of music in order to find songs of similar genre or a similar sound characteristics. The description is done with the help of MPEG-7 only. The classification and visualization is done with the self organizing map algorithm.
In this paper we present an audio segmentation technique by searching similar sections of a song. The search is performed on MPEG-7 low-level audio feature descriptors as a growing source of multimedia meta data. These descriptors are available every 10 ms of audio data. For each block the similarity to each other block is determined. The result of this operation is a matrix which contains off-diagonal stripes representing similar regions. At that point some postprocessing is necessary due to a very disturbed structure of the similarity matrix. Using the a-priori knowledge that we search off-diagonal stripes which must represent several seconds of audio data we implemented a filter to enhance the structure of the similarity matrix. The last step is to extract the off-diagonal stripes and match them into the time domain of the audio data.