Automatic music segmentation and structure analysis from audio waveforms based on a three-level hierarchy is examined in this research, where the three-level hierarchy includes notes, measures and parts. The pitch class profile (PCP) feature is first extracted at the note level. Then, a similarity matrix is constructed at the measure level, where a dynamic time warping (DTW) technique is used to enhance the similarity computation by taking the temporal distortion of similar audio segments into account. By processing the similarity matrix, we can obtain a coarse-grain music segmentation result. Finally, dynamic programming is applied to the coarse-grain segments so that a song can be decomposed into several major parts such as intro, verse, chorus, bridge and outro. The performance of the proposed music structure analysis system is demonstrated for pop and rock music.
Music genre provides an efficient way to index songs in a music database, and can be used as an effective means to retrieval music of a similar type, i.e. content-based music retrieval. A new two-stage scheme for music genre classification is proposed in this work. At the first stage, we examine a couple of different features, construct their corresponding parametric models (e.g. GMM and HMM) and compute their likelihood functions to yield soft classification results. In particular, the timbre, rhythm and temporal variation features are considered. Then, at the second stage, these soft classification results are integrated to result in a hard decision for final music genre classification.
Experimental results are given to demonstrate the performance of the proposed scheme.
Music genre provides an efficient way to index songs in the music database, and can be used as an effective means to retrieval music of a similar type, i.e. content-based music retrieval. In addition to other features, the temporal domain features of a music signal are exploited so as to increase the classification rate in this research. Three temporal techniques are examined in depth. First, the hidden Markov model (HMM) is used to emulate the time-varying properties of music signals. Second, to further increase the classification rate, we propose another feature set that focuses on the residual part of music signals. Third, the overall classification rate is enhanced by classifying smaller segments from a test material individually and making decision via majority voting. Experimental results are given to demonstrate the performance of the proposed techniques.
In this work, we present an audio content identification system that identifies some unknown audio material by comparing its fingerprint with those extracted off-line and saved in the music database. We will describe in detail the procedure to extract audio fingerprints and demonstrate that they are robust to noise and content-preserving manipulations. The main feature in the proposed system is the zero-crossing rate extracted with the octave-band filter bank. The zero-crossing rate can be used to describe the dominant frequency in each subband with a very low computational cost. The size of audio fingerprint is small and can be efficiently stored along with the compressed files in the database. It is also robust to many modifications such as tempo change and time-alignment distortion. Besides, the octave-band filter bank is used to enhance the robustness to distortion, especially those localized on some frequency regions.