Proc. SPIE. 6820, Multimedia Content Access: Algorithms and Systems II
KEYWORDS: Video, Video surveillance, Data modeling, Analytical research, Information visualization, Image segmentation, Systems modeling, FDA class I medical device development, Visualization, Semantic video
In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio
scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture
analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the
semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to
an overall audio texture change and those corresponding to a special "transition marker" used by the content creator,
such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific
heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition
markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about
the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety
of broadcast content genres.
In this correspondence, a new block sum pyramid algorithm (NBPSA) to motion estimation is presented. Compared with BSPA, NBSPA estimate the vector of the minimum mean absolute difference ( MADmin ) In the mean time, instead of update the value level by level, we update the estimation of MAD row by row, up to down. Experimental results showed that, with the search result ofthe algorithm identical to the search result ofthe exhaustive search and BSPA, NBSPA reduced the computation complexity greatly.