Automatic video summarization has become an active research topic in content-based video processing. However, not much emphasis has been placed on developing rigorous summary evaluation methods and developing summarization systems based on a clear understanding of user needs, obtained through user centered design. In this paper we address these two topics and propose an automatic video summary evaluation algorithm adapted from teh text summarization domain.
Text data forms the largest bulk of digital data that people encounter and exchange daily. For this reason the potential usage of text data as a covert channel for secret communication is an imminent concern. Even though information hiding into natural language text has started to attract great interest, there has been no study on attacks against these applications. In this paper we examine the robustness of lexical steganography systems.In this paper we used a universal steganalysis method based on language models and support vector machines to differentiate sentences modified by a lexical steganography algorithm from unmodified sentences. The experimental accuracy of our method on classification of steganographically modified sentences was 84.9%. On classification of isolated sentences we obtained a high recall rate whereas the precision was low.
With the proliferation of cameras in handheld devices that allows users to capture still images and videos, providing users with software tools to efficiently manage multimedia content has become essential. In many cases users desire to organize their personal media content using high-level semantic labels. In this paper we will describe low-complexity algorithms that can be used to derive semantic labels, such as "indoor/outdoor," "face/not face," and "motion/not motion" for mobile video sequences. We will also describe a method for summarizing mobile video sequences. We demonstrate the classification performance of the methods and their computational complexity using a typical processor used in many mobile terminals.
Compact representations of video, or video summaries, data greatly enhances efficient video browsing. However, rigorous evaluation of video summaries generated by automatic summarization systems is a complicated process. In this paper we examine the summary evaluation problem. Text summarization is the oldest and most successful summarization domain. We show some parallels between these to domains and introduce methods and terminology. Finally, we present results for a comprehensive evaluation summary that we have performed.
In this paper we discuss natural language watermarking, which uses the structure of the sentence constituents in natural language text
in order to insert a watermark. This approach is different from techniques, collectively referred to as "text watermarking," which embed information by modifying the appearance of text elements,
such as lines, words, or characters. We provide a survey of the current state of the art in natural language watermarking and introduce terminology, techniques, and tools for text processing. We also examine the parallels and differences of the two watermarking domains and outline how techniques from the image watermarking domain may be applicable to the natural language watermarking domain.
The process of identifying speakers in a news program is difficult using only text information. We propose a system that will first perform text and video processing separately to identify the start of speech of a speaker. These start of speech locations are aligned and used to identify a change of speaker in the program. An analysis is performed to identify the contribution of the text and video information. It will be be shown that the change of speaker locations identified by our alignment algorithm is more accurate then either mode individually.
The production of closed captions is an important but expensive process in video broadcasting. We propose a method to generate highly accurate off-line captions efficiently. Our system uses text alignment to synchronize program transcripts obtained for a video program with text produced by an automatic speech recognition (ASR) system. We will also describe the accuracy in both closed-caption text and the ASR output for a number of news programs and provide a detailed analysis of the errors that occur.
In this paper we investigate the distribution of shot lengths for video sequences containing diverse content. Accurate models for shot lengths are important to model video both for content-based retrieval applications and for performing queuing analysis for the design of video buffers in multimedia networks. Using a large dataset collected from CSPAN programs we have analyze the Pareto, Weibull, and gamma distributions as possible models for the shot length distribution. We have compare the goodness of fit of these possible distribution models using the Kolmogorov-Smirnov statistic.
Compact representations of video data can enable efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essential message. We propose a method to automatically generate video summaries for long videos. Our video summarization approach involves mainly two tasks: first, segmenting the video into small, coherent segments and second, ranking the resulting segments. Our proposed algorithm scores segments based on word frequency analysis of speech transcripts. Then a summary is generated by selecting the segments with the highest score to duration ratios and these are concatenating them. We have designed and performed a user study to evaluate the quality of summaries generated. Comparisons are made using our proposed algorithm and a random segment selection scheme based on statistical analysis of the user study results. Finally we discuss various issues that arise in evaluation of automatically generated video summaries.
In this paper, we describe a framework of analyzing programs belonging to different TV program genres Hidden Markov Models and pseudo-semantic feature s derived from video shots. Clustering using Gaussian mixture models is used to determine the order of the modes. Results for initial genre classification experiments using two simple features derived from video shots are given.
In this paper we extend the shot transition detection component of the ViBE video database system to include gradual scene changes. ViBE, a browsable/searchable paradigm for organizing video data containing a large number of sequences, is being developed at Purdue as a testbed to explore ideas and concepts in video databases. We also present result on the performance of our cut detection algorithm using a large test set. The performance of two other techniques are compared against our method.
In this paper, we describe a unique new paradigm for video database management known as ViBE. ViBE is a browseable/searchable paradigm for organizing video data containing a large number of sequences. We describe how ViBE performs on a database of MPEG sequences.