According to the literature, automatic video summarization techniques can be classified in two parts, following the
output nature: "video skims", which are generated using portions of the original video and "key-frame sets", which
correspond to the images, selected from the original video, having a significant semantic content. The difference between
these two categories is reduced when we consider automatic procedures. Most of the published approaches are based on
the image signal and use either pixel characterization or histogram techniques or image decomposition by blocks.
However, few of them integrate properties of the Human Visual System (HVS). In this paper, we propose to extract keyframes
for video summarization by studying the variations of salient information between two consecutive frames. For
each frame, a saliency map is produced simulating the human visual attention by a bottom-up (signal-dependent)
approach. This approach includes three parallel channels for processing three early visual features: intensity, color and
temporal contrasts. For each channel, the variations of the salient information between two consecutive frames are
computed. These outputs are then combined to produce the global saliency variation which determines the key-frames.
Psychophysical experiments have been defined and conducted to analyze the relevance of the proposed key-frame
extraction algorithm.
At high compression ratios, the current lossy compression algorithms introduce distortions that are generally exploited by the No-Reference quality assessment. For JPEG-2000 compressed images, the blurring and ringing effects cause the principal embarrassment for a human observer. However, the Human Visual System does not carry out a systematic and local research of these impairments in the whole image, but rather, it identifies some regions of interest for judging the perceptual quality. In this paper, we propose to use both of these distortions (ringing and blurring effects), locally weighted by an importance map generated by a region-based attention model, to design a new reference free quality metric for JPEG-2000 compressed images. For the blurring effect, the impairment measure depends on spatial information contained in the whole image while, for the ringing effect, only the local information localized around strong edges is used. To predict the regions in the scene that potentially attract the human attention, a stage of the proposed metric consists to generate an importance map issued from a region-based attention model, defined by Osberger et al [1]. First, explicit regions are obtained by color image segmentation. The segmented image is then analyzed by different factors, known to influence the human attention. The produced importance map is finally used to locally weight each distortion measure. The predicted scores have been compared on one hand, to the subjective scores and on other hand, to previous results, only based on the artefact measurement. This comparative study demonstrates the efficiency of the proposed quality metric.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.