Translator Disclaimer
4 March 2008 Automatic video summarization driven by a spatio-temporal attention model
Author Affiliations +
Proceedings Volume 6806, Human Vision and Electronic Imaging XIII; 68060Q (2008)
Event: Electronic Imaging, 2008, San Jose, California, United States
According to the literature, automatic video summarization techniques can be classified in two parts, following the output nature: "video skims", which are generated using portions of the original video and "key-frame sets", which correspond to the images, selected from the original video, having a significant semantic content. The difference between these two categories is reduced when we consider automatic procedures. Most of the published approaches are based on the image signal and use either pixel characterization or histogram techniques or image decomposition by blocks. However, few of them integrate properties of the Human Visual System (HVS). In this paper, we propose to extract keyframes for video summarization by studying the variations of salient information between two consecutive frames. For each frame, a saliency map is produced simulating the human visual attention by a bottom-up (signal-dependent) approach. This approach includes three parallel channels for processing three early visual features: intensity, color and temporal contrasts. For each channel, the variations of the salient information between two consecutive frames are computed. These outputs are then combined to produce the global saliency variation which determines the key-frames. Psychophysical experiments have been defined and conducted to analyze the relevance of the proposed key-frame extraction algorithm.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
R. Barland and A. Saadane "Automatic video summarization driven by a spatio-temporal attention model", Proc. SPIE 6806, Human Vision and Electronic Imaging XIII, 68060Q (4 March 2008);

Back to Top