Today, several alternatives for compression of digital pictures and video sequences exist to choose from. Beside
internationally recognized standard solutions, open access options like the VP8 image and video compression
have recently appeared and are gaining popularity. In this paper, we present the methodology and the results
of the rate-distortion performance analysis of VP8. The analysis is based on the results of subjective quality
assessment experiments, which have been carried out to compare the two algorithms to a set of state of the art
image and video compression standards.
The success of 3D video, as one of the emerging multimedia formats, will largely depend on the improved quality
of experience that it provides to viewers when compared to conventional 2D video. Therefore reliable methods
for 3D video quality assessment are crucial in order to optimize 3D video systems and services. The goal of this
paper is to review recent developments in 3D video quality assessment, and to discuss its future directions.
This paper describes the details and the results of the subjective quality evaluation performed at EPFL, as a
contribution to the effort of the Joint Collaborative Team on Video Coding (JCT-VC) for the definition of the
next-generation video coding standard. The performance of 27 coding technologies have been evaluated with
respect to two H.264/MPEG-4 AVC anchors, considering high definition (HD) test material. The test campaign
involved a total of 494 naive observers and took place over a period of four weeks. While similar tests have
been conducted as part of the standardization process of previous video coding technologies, the test campaign
described in this paper is by far the most extensive in the history of video coding standardization. The obtained
subjective quality scores show high consistency and support an accurate comparison of the performance of the
different coding solutions.
In this paper, we consider the use of object duplicate detection for the propagation of geotags from a small set of
images with location names (IPTC) to a large set of non-tagged images. The motivation behind this idea is that
images of individual locations usually contain specific objects such as monuments, buildings or signs. Therefore,
object duplicate detection can be used to establish the correspondence between tagged and non-tagged images.
Our recent graph based object duplicate detection approach is adapted for this task. The effectiveness of the
approach is demonstrated through a set of experiments considering various locations.
While objective and subjective quality assessment of 2D images and video have been an active research topic in
the recent years, emerging 3D technologies require new quality metrics and methodologies taking into account
the fundamental differences in the human visual perception and typical distortions of stereoscopic content.
Therefore, this paper presents a comprehensive stereoscopic video database that contains a large variety of scenes
captured using a stereoscopic camera setup consisting of two HD camcorders with different capture parameters.
In addition to the video, the database also provides subjective quality scores obtained using a tailored single
stimulus continuous quality scale (SSCQS) method. The resulting mean opinion scores can be used to evaluate
the performance of visual quality metrics as well as for the comparison and for the design of new metrics.
For object analysis in videos such as in video surveillance systems, the preliminary segmentation step is very
important. Many segmentation methods using static camera have been proposed in the last decade, but they
all suffer in occurrance of object reflection especially on the ground, i.e. reflected regions are also segmented
as foregrounds. We present a new method which detects the border between the real object and its reflection.
Experiments show that an outstanding improvement of segmentation results are obtained by removing the
reflection part of the over-segmented objects.
Speaker change detection (SCD) is a preliminary step for many audio applications such as speaker segmentation
and recognition. Thus, its robustness is crucial to achieve a good performance in the later steps. Especially,
misses (false negatives) affect the results. For some applications, domain-specific characteristics can be used to
improve the reliability of the SCD. In broadcast news and discussions, the cooccurrence of shot boundaries and
change points provides a robust clue for speaker changes.
In this paper, two multimodal approaches are presented that utilize the results of a shot boundary detection
(SBD) step to improve the robustness of the SCD. Both approaches clearly outperform the audio-only approach
and are exclusively applicable for TV broadcast news and plenary discussions.
This paper presents a novel approach for automatic and robust object detection. It utilizes a component-based approach that combines techniques from both statistical and structural pattern recognition domain. While the component detection relies on Haar-like features and an AdaBoost trained classifier cascade, the topology verification is based on graph matching techniques. The system was applied to face detection and the experiments show its outstanding performance in comparison to other face detection approaches. Especially in the presence of partial occlusions, uneven illumination and out-of-plane rotations it yields higher robustness.
A new segmentation approach usable for fixed or motion compensated camera is described. Instead of the often used RGB color space we operate with the invariant Gaussian color model proposed by Geusebroek and temporal information which eliminates unsteady regions surrounded by the moving objects. The Gaussian color model has never been used in video segmentation. Comparison with some state of the art methods in which both subjective and objective evaluation are applied proof the good performance of the proposed method.
In the case of a static or motion compensated camera, static background segmentation methods can be applied to
segment the interesting foreground objects from the background. Although a lot of methods have been proposed,
a general assessment of the state of the art is not available. An important issue is to compare various state of
the art methods in terms of quality (accuracy) and computational complexity (time and memory consumption).
A representative set of recent techniques is chosen, implemented and compared to each other. An extensive set
of videos is used to achieve comprehensive results. Both indoor and outdoor videos with different environmental
conditions are used. While visual analysis is used for subjective assessment of the quality, pixel based measures
based on available ground truth data are used for the objective assessment. Furthermore the computational
complexity is estimated by measuring the elapsed time and memory requirements of each algorithm. The paper
summarizes the experiments and considers the assets and drawbacks of the various techniques. Moreover, it will
give hints for selecting the optimal approach for a specific environment and directions for further research in this
This paper presents a novel approach to human body posture recognition based on the MPEG-7 contour-based shape descriptor and the widely used projection histogram. A combination of them was used to recognize the main posture and the view of a human based on the binary object mask obtained by the segmentation process. The recognition is treated as a typical pattern recognition task and is carried out through a hierarchy of classifiers. Therefore various structures both hierachical and non-hierarchical, in combination with different classifiers, are compared to each other with respect to recognition performance and computational complexity. Based on this an optimal system design with recognition rates of 95.59% for the main posture, 77.84% for the view and 79.77% in combination is achieved.