A number of technologies claim to be robust against content re-acquisition with a camera recorder e.g. water- marking and content ngerprinting. However, the benchmarking campaigns required to evaluate the impact of the camcorder path are tedious and such evaluation is routinely overlooked in practice. Due to the interaction between numerous devices, camcording displayed content modi es the video essence in various ways, including geometric distortions, temporal transforms, non-uniform and varying luminance transformations, saturation, color alteration, etc. It is necessary to clearly understand the di erent phenomena at stake in order to design ef- cient countermeasures or to build accurate simulators which mimic these e ects. As a rst step in this direction, we focus in this study solely on luminance transforms. In particular, we investigate three di erent alterations, namely: (i) the spatial non uniformity, (ii) the steady state luminance response, and (iii) the transient luminance response.
Digital camcording in the premises of cinema theaters is the main source of pirate copies of newly released
movies. To trace such recordings, watermarking systems are exploited in order for each projection to be unique
and thus identifiable. The forensic analysis to recover these marks is different for digital and legacy cinemas. To
avoid running both detectors, a reliable oracle discriminating between cams originating from analog or digital
projections is required. This article details a classification framework relying on three complementary features :
the spatial uniformity of the screen illumination, the vertical (in)stability of the projected image, and the luminance
artifacts due to the interplay between the display and acquisition devices. The system has been tuned
with cams captured in a controlled environment and benchmarked against a medium-sized dataset (61 samples)
composed of real-life pirate cams. Reported experimental results demonstrate that such a framework yields over
80% classification accuracy.
Fighting movie piracy often requires automatic content identification. The most common technique to achieve this uses
watermarking, but not all copyrighted content is watermarked. Video fingerprinting is an efficient alternative solution to
identify content, to manage multimedia files in UGC sites or P2P networks and to register pirated copies with master
content. When registering by matching copy fingerprints with master ones, a model of distortion can be estimated. In
case of in-theater piracy, the model of geometric distortion allows the estimation of the capture location. A step even
further is to determine, from passive image analysis only, whether different pirated versions were captured with the same
camcorder. In this paper we present three such fingerprinting-based forensic applications: UGC filtering, estimation of
capture location and source identification.
Pirate copies of feature films are proliferating on the Internet. DVD rip or screener recording methods involve the
duplication of officially distributed media whereas 'cam' versions are illicitly captured with handheld camcorders in
movie theaters. Several, complementary, multimedia forensic techniques such as copy identification, forensic tracking
marks or sensor forensics can deter those clandestine recordings. In the case of camcorder capture in a theater, the image
is often geometrically distorted, the main artifact being the trapezoidal effect, also known as 'keystoning', due to a
capture viewing axis not being perpendicular to the screen. In this paper we propose to analyze the geometric distortions
in a pirate copy to determine the camcorder viewing angle to the screen perpendicular and derive the approximate
position of the pirate in the theater. The problem is first of all geometrically defined, by describing the general projection
and capture setup, and by identifying unknown parameters and estimates. The estimation approach based on the
identification of an eight-parameter homographic model of the 'keystoning' effect is then presented. A validation
experiment based on ground truth collected in a real movie theater is reported, and the accuracy of the proposed method
The proliferation of pirate copies of feature films on peer-to-peer networks arouses a great interest to countermeasures
such as the insertion of (invisible) forensic marks in projected movies, to deter their illegal capture. The registration of
pirate copies with the original content is however a prerequisite to the recovery of such embedded messages, as severe
geometric distortions often occur in illegally camcorded contents. After a brief state-of-the-art in image registration, the
paper details an algorithm for video registration, focusing on the compensation of geometric distortions. Control points
are automatically extracted in original and copy pictures, followed by pre- and post-matching filtering steps to discard
not relevant control points and erroneous matched pairs of control points respectively. This enables the accurate
numerical estimation of an 8-parameter homographic distortion model, used to register the copy frames with the original
reference grid. Such an image registration algorithm is inserted into a general video registration scheme. Results are
presented on both natural and synthetic test material.
This paper presents an object detection framework applied to cinematographic post-processing of video sequences. Post-processing is done after production and before editing. At the beginning of each shot of a video, a slate (also called clapperboard) is shown. The slate contains notably an electronic audio timecode that is necessary
for audio-visual synchronization. This paper presents an object detection framework to detect slates in video sequences for automatic indexing and post-processing. It is based on five steps. The first two steps aim to reduce drastically the video data to be analyzed. They ensure high recall rate but have low precision. The first step detects images at the beginning of a shot possibly showing up a slate while the second step searches in these images for candidates regions with color distribution similar to slates. The objective is to not miss any slate while eliminating long parts of video without slate appearance. The third and fourth steps are statistical classification and pattern matching to detected and precisely locate slates in candidate regions. These steps ensure high recall rate and high precision. The objective is to detect slates with very little false alarms to minimize interactive corrections. In a last step, electronic
timecodes are read from slates to automize audio-visual synchronization. The presented slate detector has a recall rate of 89% and a precision of 97,5%. By temporal integration, much more than 89% of shots in dailies are detected. By timecode coherence analysis, the precision can be raised too. Issues for future work are to accelerate the system to be faster than real-time and to extend the framework for several slate types.
The recent proliferation of digital images captured by digital cameras and, as a result, the users’ needs for automatic
annotation tools to index huge multimedia databases arouse a renewed interest in face detection and recognition technologies. After a brief state-of-the-art, the paper details a model-based face detection algorithm for color images, based on skin color and face shape properties. We compare a stand-alone model-based approach with a hybrid approach in which this algorithm is used as a pre-processor to provide candidate faces to a supervised SVM classifier.
Experimental results are presented and discussed on two databases of 250 and 689 pictures respectively. Application to a system to automatically annotate the photos of a personal collection is eventually discussed from the human factors point of view.
In this paper we propose a photo browsing system that uses image classification results in an error tolerant manner. Images are hierarchically classified into indoor/outdoor and further into city/landscape. We employ simple classifiers based on global color histogram, wavelet subband energies and contour directions having medium recall rates around 85%. This paper delivers two contributions to cope with classification errors in the context of image browsing. The first contribution is a method to associate confidence measures to classification results. A second contribution is a browsing tool that does not reveal classification results to the user. Instead, browsing options are generated. These browsing options are thumbnails representing semantic topics such as indoor and outdoor. User studies showed that thumbnails and semantic topics are highly demanded features for a photo-browsing tool. The thumbnails are representative images from the database with high confidence values. The thumbnails are chosen context-based such that they have class labels in common with currently displayed images or usage history.
Color features are reviewed and their effectiveness assessed in the application framework of key-frame clustering for abstracting unconstrained video. Existing color spaces and associated quantization schemes are first studied. Description of global color distribution by means of histograms is then detailed. In our work, twelve combinations of color space and quantization were selected, together with twelve histogram metrics. Their respective effectiveness with respect to picture similarity measurement was evaluated through a query-be-example scenario. For that purpose, a set of still-picture databases was built by extracting key-frames from several video clips, including news, documentaries, sports and cartoons. Classical retrieval performance evaluation criteria were adapted to the specificity of our testing methodology.
For the past decade, the region-based approach, that combines object segmentation and optical flow estimation, has emerged as the only one likely to provide automatically, at a reasonable computational cost, higher-quality descriptions of 2D apparent motion in video sequences, as compared to conventional pixel-based motion estimation. Within this framework, a hybrid algorithm, embedding classical defense motion field estimation and color-based spatial segmentation, is presented. Per each, arbitrarily shaped, color-homogeneous region, a polynomial motion- parameter set is robustly estimated from pixel displacement vectors. Following a graph-based approach and starting from the initial color partition, neighboring regions are iteratively merged according to their mutual motion similarity. The obtained motion-homogeneous regions are eventually temporally tracked along the sequence. The region-based motion estimation algorithm is described in details and its computational complexity is loosely evaluated through processing time statistics on a workstation. The partition maps and modeled motion fields obtained on three well-known test sequences--`Table Tennis', `Mobile and calendar' and `Flower Garden'--are displayed. Alternative approaches in the literature are then assessed, their results being compared with the above ones. Application of such an automatic `mid-level' image analysis tool to object-based representation, manipulation and coding as well as indexing of video is outlined at last.
A 'region-based' approach to the problem of motion estimation and segmentation in video sequences is presented. The devised algorithm requires an initial still-picture partition and a dense optical flow: affine region motion parameters are robustly estimated from pixel motion vectors on color- homogenous regions, which are further merged on a motion- homogeneity criterion, and temporally tracked. Computer simulation results and comparisons with other approaches are given. Applications to object-based representation, manipulation and coding as well as indexing of video are discussed
There is no doubt that in a near future a large number of image processing techniques will be based on motion compensation, making thus very common the cascading of several 'motion compensated' devices in the same image chain. A reference scheme for the optimum use of motion compensation in future image communication networks is presented. Motion estimation is performed once only, at a very early stage of the process chain, then motion information is encoded, transmitted in a separate data channel and distributed to the cascaded motion compensated processes. The distribution scenario must take into consideration the various transformations performed on the image signal since its origination so that the motion information distributed is always consistent with the pictures to process. The problems of the representation of motion relatively to a given source image signal and of its adjustment to new frame rate environments are especially addressed.