In free-viewpoint rendering systems, one of the most challenging goals is the creation of virtual views based on available color texture (RGB) and depth data. Conventional depth-image-based rendering (DIBR) approaches have assumed that the virtual camera can only be displaced horizontally, thus leading to fairly simple disocclusion artifacts. However, in free-viewpoint DIBR, the virtual camera can be positioned in an arbitrary way and the respective disocclusion artifacts can exhibit complicated anisotropic appearances. Consequently, conventional approaches for compensating disocclusion holes usually fail in such arbitrary camera motion. We present a disocclusion compensation technique based on texture inpainting. We propose a layered representation of both the color and depth images in local foreground, background, and undefined segments (a trimap). This representation allows for employing an efficient alpha-matting approach for reconstructing the underlying opacity layer followed by a background compensation and layered rendering. The performance of the proposed method is evaluated with respect to the state-of-the-art through objective and subjective tests. The achieved results, especially for large camera displacements, outperform the state-of-the-art. Those results assess the effectiveness of the proposed method and highlight the need for new quality metrics able to address the impairments of this type of content.
The robust design and adaptation of multimedia networks relies on the study of the influence of potential network impairments on the perceived quality. Video quality may be affected by network impairments, such as delay, jitter, packet loss, and bandwidth, and the perceptual impact of these impairments may vary according to the video content. The effects of packet loss and encoding artifacts on the perceived quality have been widely addressed in the literature. However, the relationship between video content and network impairments on the perceived video quality has not been deeply investigated. A detailed analysis of ReTRiEVED test video dataset, designed by considering a set of potential network impairments, is presented, and the effects of transmission impairments on perceived quality are analyzed. Furthermore, the impact on the perceived quality of the video content in the presence of transmission impairments is studied by using video content descriptors. Finally, the performances of well-known quality metrics are tested on the proposed dataset.
The use of 3D video is growing in several fields such as entertainment, military simulations, medical applications. However, the process of recording, transmitting, and processing 3D video is prone to errors thus producing artifacts that may affect the perceived quality. Nowadays a challenging task is the definition of a new metric able to predict the perceived quality with low computational complexity in order to be used in real-time applications. The research in this field is very active due to the complexity of the analysis of the influence of stereoscopic cues. In this paper we present a novel stereoscopic metric based on the combination of relevant features able to predict the subjective quality rating in a more accurate way.
Face processing techniques for automatic recognition in broadcast video attract the research interest because of its value in applications, such as video indexing, retrieval, and summarization. In multimedia press review, the automatic annotation of broadcasting news programs is a challenging task because people can appear with large appearance variations such as hair styles, illumination conditions and poses that make the comparison between similar faces more difficult. In this paper a technique for automatic face identification in TV broadcasting programs based on a gallery of faces downloaded from Web is proposed. The approach is based on a joint use of Scale Invariant Feature Transform descriptor and Eigenfaces-based algorithms and it has been tested on video sequences using a database of images acquired starting from a web search. Experimental results show that the joint use of these two approaches improves the recognition rate in case of use Standard Definition (SD) and High Definition (HD) standards.
In this article the effects of video content on Quality of Experience (QoE) have been presented. Delivery of the
video content with high level of QoE from bandwidth-limited and error-prone network is of crucial importance
for the service providers. Therefore, it is of fundamental importance to analyse the impact of the network
impairments and video content on perceived quality during the QoE metric design. The major contributions of
the article are in the study of i)the impact of network impairments together with video content, ii) impact of the
video content and ii) the impact of video content related parameters: spatial-temporal perceptual information,
video content, and frame size on QoE has been presented. The results show that when the impact of impairments
on perceived quality is low, the quality is significantly influenced by video content, and video content itself also
has a significant impact on QoE. Finally, the results strengthen the need for new parameter characterization, for
better QoE metric design.
The automatic labeling of faces in TV broadcasting is still a challenging problem. The high variability in view points, facial expressions,
general appearance, and lighting conditions, as well as occlusions, rapid shot changes, and camera motions, produce
significant variations in image appearance. The application of automatic tools for face recognition is not yet fully established
and the human intervention is needed. In this paper, we deal with the automatic face recognition in TV broadcasting programs.
The target of the proposed method is to identify the presence of a specific person in a video by means of a set of images
downloaded from Web using a specific search key.
In this paper we present a novel image quality assessment technique for evaluating virtual synthesized views in the context of multi-view video. In particular, Free Viewpoint Videos are generated from uncompressed color views and their compressed associated depth maps by means of the View Synthesis Reference Software, provided by MPEG. Prior to the synthesis step, the original depth maps are encoded with different coding algorithms thus leading to the creation of additional artifacts in the synthesized views. The core of proposed wavelet-based metric is in the registration procedure performed to align the synthesized view and the original one, and in the skin detection that has been applied considering that the same distortion is more annoying if visible on human subjects rather than on other parts of the scene. The effectiveness of the metric is evaluated by analyzing the correlation of the scores obtained with the proposed metric with Mean Opinion Scores collected by means of subjective tests. The achieved results are also compared against those of well known objective quality metrics. The experimental results confirm the effectiveness of the proposed metric.
Person re-identification through a camera network deals with finding a correct link between consecutive observations
of the same target among different cameras in order to choose the most probable correspondence
among a set of possible matches. This task is particularly challenging in presence of low-resolution camera
networks. In this work, a method for people re-identification in a framework of low-resolution camera network
is presented. The proposed approach can be divided in two parts. First, the illumination changes of a target
while crossing the network is analyzed. The color structure is evaluated using a novel color descriptor, the
Color Structure Descriptor, which describes the differences of dominant colors between two regions of interest.
Afterwards, a new pruning system for the links, the Target Color Structure is proposed. Results shows that
the improvements achieved applying Target Color Structure control are up to 4% for the top rank and up to
16% considering the first eleven more similar candidates.
Reversible data hiding deals with the insertion of auxiliary information into a host data without causing any permanent degradation to the original signal. In this contribution a high capacity reversible data hiding scheme, based on the classical difference expansion insertion algorithm, is presented. The method exploits a prediction stage, followed by prediction errors modification, both in the spatial domain and in the S-transform domain. Such two step embedding allows us to achieve high embedding capacity while preserving a high image quality, as demonstrated in the experimental results.
In this paper a methodology for digital image forgery detection by means of an unconventional use of image
quality assessment is addressed. In particular, the presence of differences in quality degradations impairing
the images is adopted to reveal the mixture of different source patches. The ratio behind this work is in
the hypothesis that any image may be affected by artifacts, visible or not, caused by the processing steps:
acquisition (i.e., lens distortion, acquisition sensors imperfections, analog to digital conversion, single sensor to
color pattern interpolation), processing (i.e., quantization, storing, jpeg compression, sharpening, deblurring,
enhancement), and rendering (i.e., image decoding, color/size adjustment). These defects are generally spatially
localized and their strength strictly depends on the content. For these reasons they can be considered as a
fingerprint of each digital image. The proposed approach relies on a combination of image quality assessment
systems. The adopted no-reference metric does not require any information about the original image, thus
allowing an efficient and stand-alone blind system for image forgery detection. The experimental results show
the effectiveness of the proposed scheme.
The use of ear information for people identification has been under testing at least for 100 years. However, it is still an open issue if the ears can be considered unique or unique enough to be used as biometric feature. In this paper a biometric system for human identification based on ear recognition is presented. The ear is modeled as a set of contours extracted from the ear image with an edge potential function. The matching algorithm has been tested in presence of several image modifications. Two human ear databases have been used for the tests. The experimental results show the effectiveness of the proposed scheme.
'View plus depth' is an attractive compact representation format for 3D video compression and transmission. It
combines 2D video with depth map sequence aligned in a per-pixel manner to represent the moving 3D scene
in interest. Any different-perspective view can be synthesized out if this representation through Depth-Image
Based Rendering (DIBR). However, such rendering is prone to disocclusion errors: regions originally covered by
foreground objects become visible in the synthesized view and have to be filled with perceptually-meaningful
In this work, a technique for reducing the perceived artifacts by inpainting the disoccluded areas is proposed.
Based on Criminisi's exemplar-based inpainting algorithm, the developed technique recovers the disoccluded
areas by using pixels of similar blocks surrounding it. In the original work, a moving window is centered on the
boundaries between known and unknown parts ('target window'). The known pixels are used to select windows
which are most similar to the target one. When this process is completed, the unknown region of the target
patch is filled with a weighted combination of pixels from the selected windows.
In the proposed scheme, the priority map, which defines the rule for selecting the order of pixels to be filled,
has been modified to meet the requirement for disocclusion hole filling and a better non-local mean estimate
has been suggested accordingly. Furthermore, the search for similar patches has also been extended to previous
and following frames of the video under processing, thus improving both computational efficiency and resulting
The increasing use of digital image-based applications is resulting in huge databases that are often difficult to
use and prone to misuse and privacy concerns. These issues are especially crucial in medical applications. The
most commonly adopted solution is the encryption of both the image and the patient data in separate files
that are then linked. This practice results to be inefficient since, in order to retrieve patient data or analysis
details, it is necessary to decrypt both files.
In this contribution, an alternative solution for secure medical image annotation is presented. The proposed
framework is based on the joint use of a key-dependent wavelet transform, the Integer Fibonacci-Haar transform,
of a secure cryptographic scheme, and of a reversible watermarking scheme.
The system allows: i) the insertion of the patient data into the encrypted image without requiring the knowledge
of the original image, ii) the encryption of annotated images without causing loss in the embedded information,
and iii) due to the complete reversibility of the process, it allows recovering the original image after the mark
removal. Experimental results show the effectiveness of the proposed scheme.
In this contribution a novel reversible data hiding scheme for digital images is presented. The proposed
technique allows the exact recovery of the original image upon extraction of the embedded information. Lossless
recovery of the original is achieved by adopting the histogram shifting technique in a novel wavelet domain: the
Integer Fibonacci-Haar Transform, which is based on a parameterized subband decomposition of the image. In
particular, the parametrization depends on a selected Fibonacci sequence. The use of this transform increases
the security of the proposed method. Experimental results show the effectiveness of the proposed scheme.
In this contribution the robustness of a novel steganographic scheme based on the generalized Fibonacci sequence
against Chi-square attacks is investigated. In essence, an image is first represented in a basis defined by a
generalized Fibonacci sequence. Then the secret data are inserted by substitution technique into selected bit
planes preserving the first order distributions, and finally, the inverse Fibonacci decomposition is applied to
obtain the stego-image. Secret data are scrambled before the embedding to improve the security of the whole
system. In order to perform Chi-square attacks, the knowledge of both the parameters determining the binary
Fibonacci representation of an image is assumed. Experimental results show that no visual impairments are
introduced and the probability of detecting the presence of hidden data is small even if a modest capacity loss
This paper presents a novel spatial data hiding scheme based on the Least Significant Bit insertion. The bitplane
decomposition is obtained by using the (<i>p</i>, <i>r</i>)<i> Fibonacci</i> sequences. This decomposition depends on two
parameters, <i>p</i> and <i>r</i>. Those values increase the security of the whole system; without their knowledge it is
not possible to perform the same decomposition used in the embedding process and to extract the embedded
information. Experimental results show the effectiveness of the proposed method.
In this paper a joint watermarking and ciphering scheme for digital images is presented. Both operations are
performed on a key-dependent transform domain. The commutative property of the proposed method allows to
cipher a watermarked image without interfering with the embedded signal or to watermark an encrypted image
still allowing a perfect deciphering. Furthermore, the key dependence of the transform domain increases the
security of the overall system. Experimental results show the effectiveness of the proposed scheme.
This paper proposes a novel data hiding scheme in which a payload is embedded into the discrete cosine transform domain. The characteristics of the Human Visual System (HVS) with respect to image fruition have been exploited to adapt the strength of the embedded data and integrated in the design of a digital image watermarking system. By using an HVS-inspired image quality metric, we study the relation between the amount of data that can be embedded and the resulting perceived quality. This study allows one to increase the robustness of the watermarked image without damaging the perceived quality, or, as alternative, to reduce the impairments produced by the watermarking process given a fixed embedding strength. Experimental results show the effectiveness and the robustness of the proposed solution.
In this paper a novel method for watermarking and ciphering color images is presented. The aim of the system is
to allow the watermarking of encrypted data without requiring the knowledge of the original data. By using this
method, it is also possible to cipher watermarked data without damaging the embedded signal. Furthermore, the
extraction of the hidden information can be performed without deciphering the cover data and it is also possible
to decipher watermarked data without removing the watermark. The transform domain adopted in this work is the Fibonacci-Haar wavelet transform. The experimental results show the effectiveness of the proposed scheme.
In this contribution, we present a novel technique for imperceptible and robust watermarking of digital images.
It is based on the host image 2-th level decomposition using the Fibonacci-Haar Transform (FHT) and on the
Singular Value Decomposition (SVD) of the transformed subbands. The main contributions of this approach are
the use of the FHT for hiding purposes, the flexibility in data hiding capacity, and the key-dependent secrecy of
the used transform. The experimental results show the effectiveness of the proposed approach both in perceived
quality of the watermarked image and in robustness against the most common attacks.