In video surveillance semantic traits estimation as gender and age has always been debated topic because of the
uncontrolled environment: while light or pose variations have been largely studied, defocused images are still rarely
investigated. Recently the emergence of new technologies, as plenoptic cameras, yields to deal with these problems
analyzing multi-focus images. Thanks to a microlens array arranged between the sensor and the main lens, light field
cameras are able to record not only the RGB values but also the information related to the direction of light rays: the
additional data make possible rendering the image with different focal plane after the acquisition. For our experiments,
we use the GUC Light Field Face Database that includes pictures from the First Generation Lytro camera. Taking
advantage of light field images, we explore the influence of defocusing on gender recognition and age estimation
Evaluations are computed on up-to-date and competitive technologies based on deep learning algorithms. After studying
the relationship between focus and gender recognition and focus and age estimation, we compare the results obtained by
images defocused by Lytro software with images blurred by more standard filters in order to explore the difference
between defocusing and blurring effects. In addition we investigate the impact of deblurring on defocused images with
the goal to better understand the different impacts of defocusing and standard blurring on gender and age estimation.
Deep learning-based algorithms have become increasingly efficient in recognition and detection tasks, especially when they are trained on large-scale datasets. Such recent success has led to a speculation that deep learning methods are comparable to or even outperform human visual system in its ability to detect and recognize objects and their features. In this paper, we focus on the specific task of gender recognition in images when they have been processed by privacy protection filters (e.g., blurring, masking, and pixelization) applied at different strengths. Assuming a privacy protection scenario, we compare the performance of state of the art deep learning algorithms with a subjective evaluation obtained via crowdsourcing to understand how privacy protection filters affect both machine and human vision.
Extensive adoption of video surveillance, affecting many aspects of our daily lives, alarms the public about the increasing invasion into personal privacy. To address these concerns, many tools have been proposed for protection of personal privacy in image and video. However, little is understood regarding the effectiveness of such tools and especially their impact on the underlying surveillance tasks, leading to a tradeoff between the preservation of privacy offered by these tools and the intelligibility of activities under video surveillance. In this paper, we investigate this privacy-intelligibility tradeoff objectively by proposing an objective framework for evaluation of privacy filters. We apply the proposed framework on a use case where privacy of people is protected by obscuring faces, assuming an automated video surveillance system. We used several popular privacy protection filters, such as blurring, pixelization, and masking and applied them with varying strengths to people's faces from different public datasets of video surveillance footage. Accuracy of face detection algorithm was used as a measure of intelligibility (a face should be detected to perform a surveillance task), and accuracy of face recognition algorithm as a measure of privacy (a specific person should not be identified). Under these conditions, after application of an ideal privacy protection tool, an obfuscated face would be visible as a face but would not be correctly identified by the recognition algorithm. The experiments demonstrate that, in general, an increase in strength of privacy filters under consideration leads to an increase in privacy (i.e., reduction in recognition accuracy) and to a decrease in intelligibility (i.e., reduction in detection accuracy). Masking also shows to be the most favorable filter across all tested datasets.
In this paper, we present a new approach for dense stereo matching which is mainly oriented towards the
recovery of depth map of an observed scene. The extraction of depth information from the disparity map is well
understood, while the correspondence problem is still subject to errors. In our approach, we propose optimizing
correlation based technique by detecting and rejecting mismatched points that occur in the commonly challenging
image regions such as textureless areas, occluded portions and discontinuities. The missing values are completed
by incorporating edges detection to avoid that a window contains more than one object. It is an efficient method
for selecting a variable window size with adaptive shape in order to get accurate results at depth discontinuities
and in homogeneous areas while keeping a low complexity of the whole system. Experimental results using the
Middlebury datasets demonstrate the validity of our presented approach. The main domain of applications for
this study is the design of new functionalities within the context of mobile devices.
People tracking has to face many issues in video surveillance scenarios. One of the most challenging aspect is
to re-identify people across different cameras. Humans, indeed, change appearance according to pose, clothes
and illumination conditions and thus defining features that are able to robustly describe people moving in
a camera network is a not trivial task. While color is widely exploited in the distinction and recognition of
objects, most of the color descriptors proposed so far are not robust in complex applications such as video
A new color based feature is introduced in this paper to describe the color appearance of the subjects.
For each target a probabilistic color histogram (PCH) is built by using a fuzzy K-Nearest Neighbors (KNN)
classifier trained on an ad-hoc dataset and is used to match two corresponding appearances of the same person
in different cameras of the network. The experimental results show that the defined descriptor is effective at
discriminating and re-identifying people across two different video cameras regardless of the viewpoint change
between the two views and outperforms state of the art appearance based techniques.
Facial feature points are one of the most important clues for many computer vision applications such as face
normalization, registration and model-based human face coding. Hence, automating the extraction of these points would
have a wide range of usage. In this paper, we aim to detect a subset of Facial Definition Parameters (FDPs) defined in
MPEG-4 automatically by utilizing both 2D and 3D face data. The main assumption in this work is that the 2D images
and the corresponding 3D scans are taken for frontal faces with neutral expressions. This limitation is realistic with
respect to our scenario, in which the enrollment is done in a controlled environment and the detected FDP points are to
be used for the warping and animation of the enrolled faces  where the choice of MPEG-4 FDP is justified. For the
extraction of the points, 2D, 3D data or both is used according to the distinctive information they carry in that particular
facial region. As a result, total number of 29 interest points is detected. The method is tested on the neutral set of
Bosphorus database that includes 105 subjects with registered 3D scans and color images.
In the face recognition problem, one of the most critical sources of variation is facial expression. This paper presents a
system to overcome this issue by utilizing facial expression simulations on realistic and animatable face models that are
in compliance with MPEG-4 specifications.
In our system, firstly, 3D frontal face scans of the users in neutral expression and with closed mouth are taken for onetime
enrollment. Those rigid face models are then converted into animatable models by warping a generic animatable
model using Thin Plate Spline method. The warping is based on the facial feature points and both 2D color and 3D shape
data are exploited for the automation of their extraction. The obtained models of the users can be animated by using a
facial animation engine. This new attribution helps us to bring our whole database in the same "expression state"
detected in a test image for better recognition results, since the disadvantage of expression variations is eliminated.
In this paper, we describe some attack strategies we have applied to three different grayscale watermarked images
in the particular context of BOWS (Break Our Watermarking System) Contest1; we also propose a possible use
of BOWS as a teaching tool for master students.
We describe a novel framework for watermarking 3-D objects using their texture or silhouette information. Unlike most conventional 3-D object-watermarking techniques, for which both insertion and extraction of the mark are performed on the object itself (3-D/3-D approach), we propose an asymmetric 3-D/2-D procedure. It consists in watermarking 3-D objects and retrieving the mark from rendering 2-D images or videos having used the 3-D synthetic object, thus protecting the visual representations of the object. Two 3-D object watermarking schemes are presented, a texture-based approach for 3-D photorealistic objects and a silhouette-based approach for 3-D CAD (computer-assisted design) objects.
In this paper we describe a novel framework for watermarking 3-D objects via contour information. Instead of classical existing watermarking technologies dealing with 3-D objects that operate on the object itself to insert and extract the mark (3-D/3-D approach), the goal of our work is to retrieve information originally hidden in the apparent contour of the object from resulting 2D images or videos having used the 3D synthetic object (3-D/2-D approach). In this paper we also propose an extension of 2-D polygonal line watermarking algorithm to 3-D silhouette.
In this article, we evaluate the effectiveness of a pre-classification scheme for the fast retrieval of faces in a large image database. The studied approach is based on a partitioning of the face space through a clustering of face images. Mainly two issues are discussed. How to perform clustering with a non-trivial probabilistic measure of similarity between faces? How to assign face images to all clusters probabilistically to form a robust characterization vector? It is shown experimentally on the FERET face database that, with this simple approach, the cost of a search can be reduced by a factor 6 or 7 with no significant degradation of the performance.
Digital watermarking has first been introduced as a possible way to ensure intellectual property (IP) protection. However, fifteen years after its infancy, it is still considered as a young technology and digital watermarking is far from being introduced in Digital Right Management (DRM) frameworks. A possible explanation is that the research community has so far mainly focused on the robustness of the embedded watermark and has almost ignored security aspects. For IP protection applications such as fingerprinting and copyright protection, the watermark should provide means to ensure some kind of trust in a non secure environment. To this end, security against attacks from malicious users has to be considered. This paper will focus on collusion attacks to evaluate security in the context of video watermarking. In particular, security pitfalls will be exhibited when frame-by frame embedding strategies are enforced for video watermarking. Two alternative strategies will be surveyed: either eavesdropping the watermarking channel to identify some redundant hidden structure, or jamming the watermarking channel to wash out the embedded watermark signal. Finally, the need for a new brand of watermarking schemes will be highlighted if the watermark is to be released in a hostile environment, which is typically the case for IP protection applications.
Digital watermarking was introduced during the last decade as a complementary technology to protect digital multimedia data. Watermarking digital video material has already been studied, but it is still usually regarded as watermarking a sequence of still images. However, it is well-known that such straightforward frame-by-frame approaches result in low performance in terms of security. In particular, basic intra-video collusion attacks can easily defeat basic embedding strategies. In this paper, an extension of the simple temporal frame averaging attack will be presented, which basically considers frame registration to enlarge the averaging temporal window size. With this attack in mind, video processing, especially video mosaicing, will be considered to produce a temporally coherent watermark. In other words, an embedding strategy will be proposed which ensures that all the projections of a given 3D point in a movie set carry the same watermark sample along a video scene. Finally, there will be a discussion regarding the impact of this novel embedding strategy on different relevant parameters in digital watermarking e.g. capacity, visibility, robustness and security.
In this paper, we describe a novel framework for watermarking
3D video objects via their texture information. Instead of classical existing algorithms dealing with 3D objects and that operate on meshes in order to protect the object itself, the main goal of our work is to retrieve information originally hidden in the texture image of the object, from resulting images or videos having used the 3D synthetic object. After developing the theory and practical details of our 3D object watermarking scheme, we present preliminary results of several experiments carried out in various conditions, ranging from ideal conditions (e.g. a priori known rendering parameters) to more realistic conditions (e.g. rendering
projection estimated from 2D view) or within the context of possible
attacks (e.g. mesh reduction).
In this paper, we investigate the improvement achieved by turbo coding for robustness in still image watermarking. We use an error correcting scheme based on the concatenation of a BCH product code and a repetition code. The product code is iteratively decoded using the Chase-Pyndiah algorithm (turbo decoded). For this study, we set the watermarking distortion to around 38 dB and we consider different payloads that can correspond to different applications and services. We compare different coding strategies (i.e. repetition code only and a concatenation of product and repetition codes) in terms of robustness to different photometric attacks, in particular additive noise or lossy compression. Typically, for a payload of 121 bits, the robustness gain for a given message error probability is significant: first error appears at a JPEG quality factor of about 25% with the new coding scheme instead of about 50% when using only repetition codes.
After a brief reminder on the real difficulties that digital watermarking software still has to tackle -- especially some random geometric attacks such as StirMark -- we present an early overview of on-going solutions to make the survival of the watermark possible.
This article deals with 3D scene analysis for coding purposes and is part of general research in 3D television. The method proposed here attempts, through dynamic monocular analysis, to estimate three-dimensional motion and to determine the structure of the observed objects. Motion and structure estimation is achieved by means of a differential method. Images are segmented according to spatio-temporal criteria, using a hierarchical method of quad-tree type with overlappings. Segmentation and estimation are performed jointly.
The article deals with 3D scene analysis for coding purposes and is part of general research into 3D television. The method proposed here attempts, through dynamic monocular analysis, to estimate three-dimensional motion and determine the structure of the observed objects. Motion and structure estimation are achieved by means of a differential method. A multipredictor scheme is used to guarantee correct initialisation of the algorithm. Images are segmented according to spatio- temporal criteria, using a hierarchic method based on a quad-tree with overlapping. Segmentation and estimation are performed jointly.