Visual saliency maps already proved their efficiency in a large variety of image/video communication application fields, covering from selective compression and channel coding to watermarking. Such saliency maps are generally based on different visual characteristics (like color, intensity, orientation, motion,…) computed from the pixel representation of the visual content. This paper resumes and extends our previous work devoted to the definition of a saliency map solely extracted from the MPEG-4 AVC stream syntax elements. The MPEG-4 AVC saliency map thus defined is a fusion of static and dynamic map. The static saliency map is in its turn a combination of intensity, color and orientation features maps. Despite the particular way in which all these elementary maps are computed, the fusion techniques allowing their combination plays a critical role in the final result and makes the object of the proposed study. A total of 48 fusion formulas (6 for combining static features and, for each of them, 8 to combine static to dynamic features) are investigated. The performances of the obtained maps are evaluated on a public database organized at IRCCyN, by computing two objective metrics: the Kullback-Leibler divergence and the area under curve.
A saliency map provides information about the regions inside some visual content (image, video, ...) at which a human
observer will spontaneously look at. For saliency maps computation, current research studies consider the uncompressed
(pixel) representation of the visual content and extract various types of information (intensity, color, orientation, motion
energy) which are then fusioned. This paper goes one step further and computes the saliency map directly from the
MPEG-4 AVC stream syntax elements with minimal decoding operations. In this respect, an a-priori in-depth study on
the MPEG-4 AVC syntax elements is first carried out so as to identify the entities appealing the visual attention.
Secondly, the MPEG-4 AVC reference software is completed with software tools allowing the parsing of these elements
and their subsequent usage in objective benchmarking experiments. This way, it is demonstrated that an MPEG-4
saliency map can be given by a combination of static saliency and motion maps.
This saliency map is experimentally validated under a robust watermarking framework. When included in an m-QIM
(multiple symbols Quantization Index Modulation) insertion method, PSNR average gains of 2.43 dB, 2.15dB, and 2.37
dB are obtained for data payload of 10, 20 and 30 watermarked blocks per I frame, i.e. about 30, 60, and 90 bits/second,
respectively. These quantitative results are obtained out of processing 2 hours of heterogeneous video content.