Visual saliency maps already proved their efficiency in a large variety of image/video communication application fields, covering from selective compression and channel coding to watermarking. Such saliency maps are generally based on different visual characteristics (like color, intensity, orientation, motion,…) computed from the pixel representation of the visual content. This paper resumes and extends our previous work devoted to the definition of a saliency map solely extracted from the MPEG-4 AVC stream syntax elements. The MPEG-4 AVC saliency map thus defined is a fusion of static and dynamic map. The static saliency map is in its turn a combination of intensity, color and orientation features maps. Despite the particular way in which all these elementary maps are computed, the fusion techniques allowing their combination plays a critical role in the final result and makes the object of the proposed study. A total of 48 fusion formulas (6 for combining static features and, for each of them, 8 to combine static to dynamic features) are investigated. The performances of the obtained maps are evaluated on a public database organized at IRCCyN, by computing two objective metrics: the Kullback-Leibler divergence and the area under curve.
By combining four different open standards belonging to the ISO/IEC JTC1/SC29 WG11 (a.k.a. MPEG) and W3C, this paper advances an architecture for mobile, medical oriented virtual collaborative environments. The various users are represented according to MPEG-UD (MPEG User Description) while the security issues are dealt with by deploying the WebID principles. On the server side, irrespective of their elementary types (text, image, video, 3D, …), the medical data are aggregated into hierarchical, interactive multimedia scenes which are alternatively represented into MPEG-4 BiFS or HTML5 standards. This way, each type of content can be optimally encoded according to its particular constraints (semantic, medical practice, network conditions, etc.). The mobile device should ensure only the displaying of the content (inside an MPEG player or an HTML5 browser) and the capturing of the user interaction. The overall architecture is implemented and tested under the framework of the MEDUSA European project, in partnership with medical institutions. The testbed considers a server emulated by a PC and heterogeneous user devices (tablets, smartphones, laptops) running under iOS, Android and Windows operating systems. The connection between the users and the server is alternatively ensured by WiFi and 3G/4G networks.
This paper investigates three key issues related to full reference subjective quality evaluation tests for stereoscopic
video, namely, the number of quality levels on the grading scale, the number of observers in the evaluation panel,
and the inter-gender variability. It is theoretically demonstrated that the scores assigned by the observers on a
continuous grading scale can be a posteriori mapped to any discrete grading scale, with controlled statistical
accuracy. The experiments, performed in laboratory conditions, consider image quality, depth perception and visual
comfort. The original content (i.e. the full reference) is represented by the 3DLive corpus, composed of 2 hours 11
minutes of HD 3DTV content. The modified content (i.e. the content to be evaluated) is obtained by watermarking
this corpus with four methods. A panel of 60 observers (32 males and 28 females) was established from which
further randomly selected sub-panels of 30 and 15 observers were also subsequently extracted. In order to simulate a
continuous scale, the subjective evaluation was carried out on 100 quality levels, which are a posteriori mapped to
discrete scales of q quality levels, with q between 2 and 9. The statistical investigation focused on the Mean Opinion
Score and considered three types of statistical inferences: outliers detection, confidence limits, and paired t-tests.
A saliency map provides information about the regions inside some visual content (image, video, ...) at which a human
observer will spontaneously look at. For saliency maps computation, current research studies consider the uncompressed
(pixel) representation of the visual content and extract various types of information (intensity, color, orientation, motion
energy) which are then fusioned. This paper goes one step further and computes the saliency map directly from the
MPEG-4 AVC stream syntax elements with minimal decoding operations. In this respect, an a-priori in-depth study on
the MPEG-4 AVC syntax elements is first carried out so as to identify the entities appealing the visual attention.
Secondly, the MPEG-4 AVC reference software is completed with software tools allowing the parsing of these elements
and their subsequent usage in objective benchmarking experiments. This way, it is demonstrated that an MPEG-4
saliency map can be given by a combination of static saliency and motion maps.
This saliency map is experimentally validated under a robust watermarking framework. When included in an m-QIM
(multiple symbols Quantization Index Modulation) insertion method, PSNR average gains of 2.43 dB, 2.15dB, and 2.37
dB are obtained for data payload of 10, 20 and 30 watermarked blocks per I frame, i.e. about 30, 60, and 90 bits/second,
respectively. These quantitative results are obtained out of processing 2 hours of heterogeneous video content.
While intra frame drifting is a concern for all types of MPEG-4 AVC compressed-domain video processing applications, it has a particular negative impact in watermarking. In order to avoid the drift drawbacks, two classes of solutions are currently considered in the literature. They try either to compensate the drift distortions at the expense of complex decoding/estimation algorithms or to restrict the insertion to the blocks which are not involved in the prediction, thus reducing the data payload. The present study follows a different approach. First, it algebraically models the drift distortion spread problem by considering the analytic expressions of the MPEG-4 AVC encoding operations. Secondly, it solves the underlying algebraic system under drift-free constraints. Finally, the advanced solution is adapted to take into account the watermarking peculiarities. The experiments consider an m-QIM semi-fragile watermarking method and a video surveillance corpus of 80 minutes. For prescribed data payload (100 bit/s), robustness (BER < 0.1 against transcoding at 50% in stream size), fragility (frame modification detection with accuracies of 1/81 from the frame size and 3s) and complexity constraints, the modified insertion results in gains in transparency of 2 dB in PSNR, of 0.4 in AAD, of 0.002 in IF, of 0.03 in SC, of 0.017 NCC and 22 in DVQ.
The present paper provides the proof-of-concepts for the use of the MPEG-4 multimedia scene representations (BiFS and LASeR) as a virtualization tool for RDP-based applications (e.g. MS Windows applications). Two main applicative benefits are thus granted. First, any legacy application can be virtualized without additional programming effort. Second, heterogeneous mobile devices (different manufacturers, OS) can collaboratively enjoy full multimedia experiences. From the methodological point of view, the main novelty consists in (1) designing an architecture allowing the conversion of the RDP content into a semantic multimedia scene-graph and its subsequent rendering on the client and (2) providing the underlying scene graph management and interactivity tools. Experiments consider 5 users and two RDP applications (MS Word and Internet Explorer), and benchmark our solution against two state-of-the-art technologies (VNC and FreeRDP). The visual quality is evaluated by six objective measures (e.g. PSNR<37dB, SSIM<0.99). The network traffic evaluation shows that: (1) for text editing, the MPEG-based solutions outperforms the VNC by a factor 1.8 while being 2 times heavier then the FreeRDP; (2) for Internet browsing, the MPEG solutions outperform both VNC and FreeRDP by factors of 1.9 and 1.5, respectively. The average round-trip times (less than 40ms) cope with real-time application constraints.
By reconsidering some two-dimensional video inherited approaches and by adapting them to the stereoscopic video content and to the human visual system peculiarities, a new disparity map is designed. First, the inner relation between the left and the right views is modeled by some weights discriminating between the horizontal and vertical disparities. Second, the block matching operation is achieved by considering a visual related measure (normalized cross correlation) instead of the traditional pixel differences (mean squared error or sum of absolute differences). The advanced three-dimensional (3-D) video-new three step search (3DV-NTSS) disparity map (3-D Video-New Three Step Search) is benchmarked against two state-of-the-art algorithms, namely NTSS and full-search MPEG (FS-MPEG), by successively considering two corpora. The first corpus was organized during the 3DLive French national project and regroups 20 min of stereoscopic video sequences. The second one, with similar size, is provided by the MPEG community. The experimental results demonstrate the effectiveness of 3DV-NTSS in both reconstructed image quality (average gains between 3% and 7% in both PSNR and structural similarity, with a singular exception) and computational cost (search operation number reduced by average factors between 1.3 and 13). The 3DV-NTSS was finally validated by designing a watermarking method for high definition 3-D TV content protection.
The present paper is devoted to the MPEG-4 AVC (a.k.a. H.264) video stream protection by means of watermarking
techniques. The embedding process is carried out on quantized index domain and relies on the m-QIM (m-arry
Quantisation Index Modulation) principles. In order to cope with the MPEG-4 AVC peculiarities, the Watson's
perceptual model is reconsidered and discussed. The experimental results correspond to the MEDIEVALS (a French
National Project) corpus of 4 video sequences of about 15 minutes each, encoded at 512kbps. The transparency is
assessed by both subjective and objective measures. The transcoding (down to 64kbps) and geometric (StirMark) attacks
result in BER of 6.75% and 11.25%, respectively. In order to improve robustness, an MPEG-4 AVC syntax-driven
counterattack is considered: this way, the two above mentioned attacks lead to BER of 2% and 10%, respectively.
Finally, the overall theoretical relevance of these results is discussed by estimating the related channel capacities.
Watermarking already imposed itself as an effective and reliable solution for conventional multimedia content
protection (image/video/audio/3D). By persistently (robustly) and imperceptibly (transparently) inserting some extra
data into the original content, the illegitimate use of data can be detected without imposing any annoying constraint
to a legal user. The present paper deals with stereoscopic image protection by means of watermarking techniques.
That is, we first investigate the peculiarities of the visual stereoscopic content from the transparency and robustness
point of view. Then, we advance a new watermarking scheme designed so as to reach the trade-off between
transparency and robustness while ensuring a prescribed quantity of inserted information. Finally, this method is
evaluated on two stereoscopic image corpora (natural image and medical data).
Be there a traditional mobile user wanting to connect to a remote multimedia server. In order to allow them to enjoy
the same user experience remotely (play, interact, edit, store and share capabilities) as in a traditional fixed LAN
environment, several dead-locks are to be dealt with: (1) a heavy and heterogeneous content should be sent through a
bandwidth constrained network; (2) the displayed content should be of good quality; (3) user interaction should be
processed in real-time and (4) the complexity of the practical solution should not exceed the features of the mobile
client in terms of CPU, memory and battery. The present paper takes this challenge and presents a fully operational
MPEG-4 BiFS solution.
The main issue in this paper is to deploy a compressing algorithm for heterogeneous content (text, graphics, image and video) with low-complex decoding. Such an algorithm will be involved in the remote display core problem for mobile thin clients: it allows the graphical content, computed on a remote server, to be displayer on the user's thin terminal, even when the network constraints (bandwidth, errors) are very strict. The paper is structured into three parts. First, a client-server architecture is presented. On the server side, the graphical content is parsed, converted and binary encoded into the MPEG 4 (BiFS, LASeR) format. This content is further streamed to the terminal, where it is played into a simple MPEG player. Secondly, this architecture is considered as a test bed for MPEG 4 performance assessment for various types of content (image, graphics, text). The quantitative results were focussed on bandwidth requirements and quality of experience. Finally, the conclusions are structured as a reference benchmarking of the MPEG (BiFS, LASeR) and outside (VNC) mobile remote display potential solutions.
This paper presents a method able to optimize watermarking detection in the MPEG-4 AVC compressed domain. The
optimization was achieved by introducing a statistical counter -attack based on the MAP criterion applied on the noise
matrices corresponding to each attack.
As no statistical models are nowadays available for the attack effects in the MPEG-4 AVC domain, they had to be first
estimated. In this respect, 95% confidence limits for noise matrices, computed with relative errors lower than 15% have
been obtained for geometric attacks (StirMark, small rotations), and linear or non-linear filtering (Gaussian filtering,
sharpening). The viability of these statistical results was demonstrated by the watermarking experiments: it was obtained
that the counter-attacks based on the attack model approaches its theoretical upper limit.
The watermarking state of the art exhibits the hybrid methods combining spread spectrum and side information principles.
The present study is focussed on speeding up such an algorithm (jointly patented by SFR - Vodafone Group and Institut
Telecom). The dead lock on the reference method is first identified: the embedding module accounts for 90% of the whole
watermarking chain and that more than 99% of this time is spent on applying an attack procedure (required in order to grant
a good robustness to this method). The main issue of the present study is to deploy Monte Carlo generators accurately
representing the watermarking attacks. In this respect, two difficulties should be overcome. First, accurate statistical models
for the watermarking attacks should be obtained. Secondly, efficient Monte Carlo simulators should be deployed for these
models. The last part of the study was devoted to the experimental validations. The mark is inserted in the (9,7) DWT
representation of video sequence. Several types of attacks have been considered (linear and non-linear filters, geometrical
transformations, ...). The quantitative results proved that the data payload, transparency and robustness properties have
been inherited from the reference method. However, the watermarking speed was increased by a factor of 80.
Under the framework of the FP-7 European MobiThin project, the present study addresses the issue of remote display
representation for mobile thin client. The main issue is to design a compressing algorithm for heterogeneous content
(text, graphics, image and video) with low-complex decoding. As a first step in this direction, we propose a novel
software architecture, based on BiFS - Binary Format for Scenes (MPEG-4 Part 11). On the server side, the graphical
content is parsed, converted and binary encoded into the BiFS format. This content is then streamed to the terminal,
where it is played on a simple MPEG player. The viability of this solution is validated by comparing it to the most
intensively used wired solutions, e.g. VNC - Virtual Network Computing.
Coming across with the emerging Knowledge Society, the enriched video is nowadays a hot research topic, from both
academic and industrial perspectives. The principle consists in associating to the video stream some metadata of various
types (textual, audio, video, executable codes, ...). This new content is to be further exploited in a large variety of
applications, like interactive DTV, games, e-learning, and data mining, for instance. This paper brings into evidence the
potentiality of the watermarking techniques for such an application. By inserting the enrichment data into the very video
to be enriched, three main advantages are ensured. First, no additional complexity is required from the terminal and the
representation format point of view. Secondly, no backward compatibility issue is encountered, thus allowing a unique
system to accommodate services from several generations. Finally, the network adaptation constraints are alleviated. The
discussion is structured on both theoretical aspects (the accurate evaluation of the watermarking capacity in several reallife
scenarios) as well as on applications developed under the framework of the R&D contracts conducted at the
Nowadays, nobody doubts about the huge economical benefits the watermarking solutions will one day bring. The paper
is devoted to the theoretical evaluation of the watermarking capacity, i.e. devoted to find out with mathematical rigour
the maximum amount of information which can be inserted into the DWT of natural video, for prescribed constraints of
transparency and robustness. The starting point is the accurate statistical model for the watermarking attacks the authors
already reported. In this paper, in addition to the classical Shannon solutions, the capacity is evaluated by two
approaches: (1) a method developed in order to increase speed and precision for watermarking evaluations and (2) the
general Blahut-Arimoto algorithm, adapted by Justin Dauwels for the discrete case. The experiments are run on a video
corpus of 10 video sequences of about 25 minutes each.
High speed, low complexity, and interoperability are just three of the main advantages turning the MPEG stream
watermarking into a hot research topic. Unfortunately, viable solutions (in terms of robustness, data payload and
transparency) are yet to be found. In their previous work, the authors computed general models for the watermarking
attack effects (StirMark, linear & nonlinear filtering, rotations) in the MPEG-4 AVC stream. These models (expressed as
noise matrices) are now the starting point for evaluating three classes of watermarking insertion techniques (substitutive,
additive, and multiplicative). For each class, a specific set of noise matrices is first computed by particularising the
general model. Secondly, the corresponding capacity values (i.e. the largest data payload which can be inserted for
prescribed transparency and robustness) are computed. The paper is concluded with a comparison among these method
performances. The experiments are run on a video corpus of 10 video sequences of about 25 minutes each.
Nowadays, robust watermarking clearly identified its functionality within the multimedia production chain, from the content creation to the end-user consumption: property right identification and copy-maker tracking. In the quest for the speed required by today's real-time applications, compressed-domain watermarking becomes a hot research topic. This study evaluates the watermarking capacity in the MPEG-4 AVC domain in order to establish whether and to what extent compressed domain watermarking is viable. In this respect, the additive watermarking techniques are modelled by discrete noisy channels with non-causal side information at the transmitter. The study considers several attacks (linear and non-linear filtering, geometric) and computes the capacity of the corresponding channels. The experimental results are obtained out of processing a natural video corpus of 10 video sequences belonging to different movies, each of them about 25 minutes long (35000 frames in each video sequence).
The main issue this paper addresses is to obtain the information sources characterising the video sequences represented in
DWT domain and to discuss their relevance for practical applications. From the statistical point of view, this means to
establish whether the DWT coefficients can be approximated by random variables and, if so, to compute the
corresponding probability density functions (pdf). The corpus considered in experiments is composed of 10 video
sequences, belonging to different movies, each of them about 25 minutes long, DivX coded at a very low rate.
The ever-increasing Internet distribution of video content is echoed in ever-increasing efforts to devise systems balancing
copyright protection and user rights. Watermarking is such an example: by persistently and imperceptibly associating some
data with the host video, it offers at the same time a reliable and user-friendly solution for copyright infringement tracking.
This paper takes a closer look at the apparent contradiction between watermarking (using the visual redundancy of the video
to embed the data) and compression (eliminating the visual redundancy in order to speed up distribution and to alleviate
storage requirements). In this respect, the viability of compressed domain watermarking is evaluated by analysing the visual
effects of the MPEG-4 AVC stream alteration. The corpus consists of 10 video sequences of about 25 minutes each, coded at
256kbps and 64 kbps.
The explosion of VoD and HDTV services opened a new direction in watermarking applications: compressed domain
watermarking, promising at least tenfold speed increase. While sound technical approaches to this emerging field are
already available in the literature, at our best knowledge the present paper is the first related theoretical study. It
considers the ISO/IEC 14496-10:2005 standard (also known as MPEG-4 AVC) and objectively describes with
information theory concepts (noisy channel, noise matrices) the effects of the real-life watermarking attacks (like
rotations, linear and non-linear filtering, StirMark). All the results are obtained on a heterogeneous corpus of 7 video
sequences summing up to about 3 hours.
Regardless the final targeted application (compression, watermarking, texture analysis, indexation, ...), image/video modelling in the DCT domain is generally approached by tests of concordance with some well known pdfs (like Gaussian, generalised Gaussian, Laplace, Rayleigh ...). Instead of forcing the images/videos to stick to such theoretical models, our study aims at estimating the true pdf characterising their behaviour. In this respect, we considered three intensively used ways of applying DCT, namely on whole frames, on 4x4 blocks, and on 8x8 blocks. In each case, we first prove that a law modelling the corresponding coefficients exists. Then, we estimate this law by Gaussian mixtures and finally we identify the generality of such model with respect to the data on which it was computed and to the estimation method it relies on.
When considering the multimedia production chain from the content creation to the end-user consumption,
watermarking provides a well defined functionality: property right identification and copy-maker tracking. However, its
place within this chain is not yet clearly stated. The present paper describes an objective study aiming at establishing the
functional peculiarities in the cases when watermarking follows compression. First, some general limits concerning the
transparency, robustness and capacity in compressed (MPEG-4 AVC) domain watermarking are identified. Then, these
results are discussed and compared to the uncompressed domain watermarking case. The experiments were carried out
on a video corpus of 5 video sequences, each of them of 35000 frames (about 25 minutes each), coded at 256kbit/s.
Defined as the maximum amount of information which can be inserted in an original media for prescribed transparency
and robustness, watermarking capacity has been a challenging research topic in the last years. The present paper allows
several current limitations in this respect to be overcame. As the capacity strongly depends on the attack statistical
behaviour, the first part of our paper is devoted to their in-depth investigation. By advancing an original statistical
approach, it is pointed out that we may speak about probability density functions modelling several types of attacks
(filtering, small rotation, StirMark). Then, these new accurate models are considered as the starting points in the
probability evaluation. The experimental study is based on the watermarking methods inserting the mark in the hierarchy
of the coefficients corresponding to three types of wavelets transforms, namely the (2,2), (4,4) and (9,7). The video
corpus consisted in 10 video sequences of about 25 minutes each, with heterogeneous content.
Video watermarking enforces property right for digital video: a mark is transparently embedded into original data. The
true owner is identified by detecting this mark. The robust watermarking techniques allow the mark detection even when
the protected video is attacked. Generally, the better the transparency and robustness, the smaller the mark size. We
evaluate the maximum theoretical quantity of information which can be inserted into the 2D-DWT coefficient hierarchy,
for prescribed transparency and robustness constraints. In order to ensure the accuracy in capacity evaluation, our paper
do not relay on any assumption concerning the noise model. Instead, it carries out an in-depth analysis on the statistical
behaviour of the real life attacks (StirMark, Gaussian filtering, sharpening, rotation). The experiments are performed on
10 low rate video sequences of 30 minutes each and compares among them three types of bi-orthogonal DWT, namely
the (2,2), (4,4), and (9,7). The overall results (theoretical and experimental) are discussed not only for conventional
watermarking applications, but for hidden channel, indexing and retrieval applications, as well.
Nowadays, a large variety of emerging applications (clickable, video, interactive high definition television, intelligent interfaces) do not only process the multimedia content (audio, video, 3D,...) but some additional data directly connected to it, as well. This enrichment information is usuall transmitted and stored as an additional independent stream (metadata). Such an approach can be restrictive sometimes, mainly for the networks/application with strict bandwidth and/or protocol constraints. An alternative solution is advanced and discussed in this paper. The principle consists in transmitting the metadata via in-band channels obtained by means of data hiding (watermarking) techniques. The challenge is to design data hiding techniques reaching the trade off among transparency (the enrichment process should not alter the perceptual quality of the host media), robustness (possibility to recover the metadata at the end user even when the high distortions occur through the channel) and data payload (the amount of metadata which can be inserted). The paper investigates the feasibility of such techniques by evaluating the maximal data payload (the watermarking capacity) under given robustness and transparency constraints. The results are compared to the resources needed by some existing enrichment applications. The experiments are carried out in collaboration with the French mobile service operator SFR (Vodafone Group) and consider video sequences watermarked in the DWT domain.
Audio watermarking aims at ensuring the property rights for digital audio (music, speech). In this respest, some extra information, referred to as mark or watermark, is embedded into original (unmarked) clip. By detecting this information, the true owner should be identified and the copy maker should be tracked down. This paper starts by identifying the audio peculiarities under the watermarking framework. Then, the first method hybridising spread spectrum and side information principles for audio watermarking is advanced. This method meets the nowadays challenging reqirements of transparency, robustness, and data payload. The experiments were performed in collaboration with the French SFR (Vodafone Group) mobile service provider.
Watermarking aims at enforcing property right for digital video: a mark is imperceptibly - transparently - embedded into original data. The true owner is identified by detecting this mark. The robust watermarking techniques allow the mark detection even when the protected video is attacked. Transparency and robustness constraints restrict the mark size: the better transparency and robustness, the smaller the data payload. The paper presents a method to evaluate the maximum quantity of information which can be theoretically inserted into the 2D-DCT coefficient hierarchy, for prescribed transparency and robustness. This approach relies on the noisy channel model for watermarking. Within this mathematical framework, the maximal data payload is expressed by the channel capacity. As any capacity evaluation procedure requires an intimate knowledge of the noise sources, the paper first describes the developed statistical approach enabling: (1) to properly handle the inner dependency existing among successive frames in a video sequence, and (2) to accurately check out the Gaussian behaviour for each noise source. The experiments were carried out in partnership with the SFR mobile service provider in France (Vodafone group).
With the advent of the Information Society, video, audio, speech, and 3D media represent the source of huge economic benefits. Consequently, there is a continuously increasing demand for protecting their related intellectual property rights. The solution can be provided by robust watermarking, a research field which exploded in the last 7 years. However, the largest part of the scientific effort was devoted to video and audio protection, the 3D objects being quite neglected. In the absence of any standardisation attempt, the paper starts by summarising the approaches developed in this respect and by further identifying the main challenges to be addressed in the next years. Then, it describes an original oblivious watermarking method devoted to the protection of the 3D objects represented by NURBS (Non uniform Rational B Spline) surfaces. Applied to both free form objects and CAD models, the method exhibited very good transparency (no visible differences between the marked and the unmarked model) and robustness (with respect to both traditional attacks and to NURBS processing).
The cell phone expansion provides an additional direction for digital video content distribution: music clips, news, sport events are more and more transmitted toward mobile users. Consequently, from the watermarking point of view, a new challenge should be taken: very low bitrate contents (e.g. as low as 64 kbit/s) are now to be protected. Within this framework, the paper approaches for the first time the mathematical models for two random processes, namely the original video to be protected and a very harmful attack any watermarking method should face the StirMark attack. By applying an advanced statistical investigation (combining the Chi square, Ro, Fisher and Student tests) in the discrete wavelet domain, it is established that the popular Gaussian assumption can be very restrictively used when describing the former process and has nothing to do with the latter. As these results can a priori determine the performances of several watermarking methods, both of spread spectrum and informed embedding types, they should be considered in the design stage.
Nowadays, alongside with the traditional voice signal, music, video, and 3D characters tend to become common data to be run, stored and/or processed on mobile phones. Hence, to protect their related intellectual property rights also becomes a crucial issue. The video sequences involved in such applications are generally coded at very low bit rates. The present paper starts by presenting an accurate statistical investigation on such a video as well as on a very dangerous attack (the StirMark attack). The obtained results are turned into practice when adapting a spread spectrum watermarking method to such applications. The informed watermarking approach was also considered: an outstanding method belonging to this paradigm has been adapted and re evaluated under the low rate video constraint. The experimental results were conducted in collaboration with the SFR mobile services provider in France. They also allow a comparison between the spread spectrum and informed embedding techniques.
An original statistical approach making it possible to accurately model video sequences in the wavelet domain as Gaussian laws is presented. By partitioning the wavelet coefficients into classes with independent elements, we rigorously handle the dependency existing among the successive frames in the video sequence. Further on, four statistical tests are applied to each class in the partition, with the following purposes: (1) to check up the Gaussian law; (2) to validate the data partition and (3) to reveal a homogeneity behaviour among the classes in the partition. Finally, the obtained results are fusioned so as to provide a global information characterising the whole sequence. At the same time, an a posteriori proof concerning an ergodicity behaviour for video sequences is obtained. We integrated these results within a robust video watermarking scheme. The mark is generated according to a CDMA (Code Division Multiple Access) procedure, starting from a 64 bit message (a serial number, a logo, etc). The embedding procedure is a weighted addition of the watermark into the wavelet coefficients featuring the Gaussian behaviour. The detection procedure is based on matched filters, the optimality of which is ensured under the considered framework. The experiments feature firm results concerning all the requirements stated nowadays: obliviousness, transparency, robustness, and probability of false alarm.
This paper addresses the issue of oblivious robust watermarking, within the framework of colour still image database protection. We present an original method which complies with all the requirements nowadays imposed to watermarking applications: robustness (e.g. low-pass filtering, print & scan, StirMark), transparency (both quality and fidelity), low probability of false alarm, obliviousness and multiple bit recovering. The mark is generated from a 64 bit message (be it a logo, a serial number, etc.) by means of a Spread Spectrum technique and is embedded into DWT (Discrete Wavelet Transform) domain, into certain low frequency coefficients, selected according to the hierarchy of their absolute values. The best results were provided by the (9,7) bi-orthogonal transform. The experiments were carried out on 1200 image sequences, each of them of 32 images. Note that these sequences represented several types of images: natural, synthetic, medical, etc. and each time we obtained the same good results. These results are compared with those we already obtained for the DCT domain, the differences being pointed out and discussed.
A lot of research studies have been devoted to image protection, either by cryptographic methods (e.g. Opt. Eng. 35, Sept. 1996) or by watermarking techniques (e.g. Proc. SPIE Vol. 3971). In a previous study, the authors reconsidered and improved the cryptographic mixing transformations proposed by CE Shannon for natural language, obtaining a strong cipher for images, as well. The present paper goes deeply inside of the image protection problem: (1) by presenting some variants for the cryptographic mixing transformations which are good even when burst errors appear in the cryptogram; (2) by enabling the use of an m-gram substitution in the mixing functions; (3) by advancing a bridge between cryptographic methods and watermarking techniques. The illustrations are obtained out of processing: (a) computer-simulated random images obeying different probability laws and autocorrelation functions, (b) natural images, and (c) test images.
Several image encryption methods have been recently considered. This paper presents a new image enciphering method starting from the mixing transformations suggested by C.E. Shannon for natural language. There are two ways in which images are involved here: first, as a means of visual perception of the mixing transformation features (the very sophisticated diffusion and confusion); secondly, as the application field itself. Both the theoretical and experimental aspects pointed out here show the high quality of this method of image enciphering. The method illustrates were made either for simulated images with different probability laws and autocorrelation functions, or for natural images.