To improve both coding efficiency and visual quality of video coding, in this paper, we present an adaptive loop filtering
design which exploits local directional characteristics exhibit in the video content. The design combines linear spatial
filtering and directional filtering with a similarity mapping function. We compute and compare multiple simple
directional features to classify blocks in a video frame into classes with different dominant orientations. Each class of
blocks adapt to a directional filter, with symmetric constraints imposed on the filter coefficients according to the
dominant orientation determined by the classification. To emphasis pixel similarity for explicit adaptation to edges, we
use a simple hard-threshold mapping function to avoid artifacts arising from across-edge filtering. Our design uses only
4 filers per frame with fixed 7×7 diamond-shaped filter support, while achieving better coding efficiency and improved
visual quality especially along edges, as compared to other approaches using up to 16 filters with up to 7 vertical × 9
horizontal diamond-shaped filter support.
New data formats that include both video and the corresponding depth maps, such as multiview plus depth
(MVD), enable new video applications in which intermediate video views (virtual views) can be generated using
the transmitted/stored video views (reference views) and the corresponding depth maps as inputs. We propose a
depth map coding method based on a new distortion measurement by deriving relationships between distortions
in coded depth map and rendered view. In our experiments we use a codec based on H.264/AVC tools, where the
rate-distortion (RD) optimization for depth encoding makes use of the new distortion metric. Our experimental
results show the efficiency of the proposed method, with coding gains of up to 1.6 dB in interpolated frame
quality as compared to encoding the depth maps using the same coding tools but applying RD optimization
based on conventional distortion metrics.
To facilitate new video applications such as three-dimensional video (3DV) and free-viewpoint video (FVV), multiple view plus depth format (MVD), which consists of both video views and the corresponding per-pixel depth images, is being investigated. Virtual views can be generated using depth image based rendering (DIBR), which takes video and the corresponding depth images as input. This paper discusses view synthesis techniques based on DIBR, which includes forward warping, blending and hole filling. Especially, we will emphasize on the techniques brought to the MPEG view synthesis reference software (VSRS). Unlike the case in the field of computer graphics, the ground truth depth images for nature content are very difficult to obtain. The estimated depth images used for view synthesis typically contain different types of noises. Some robust synthesis modes to combat against the depth errors are also presented in this paper. In addition, we briefly discuss how to use synthesis techniques with minor modifications to generate the occlusion layer information for layered depth video (LDV) data, which is another potential format for 3DV applications.
We consider the effect of depth-image compression artifacts on the quality of virtual views rendered using neighboring views. Such view rendering processes are utilized in new video applications such as 3D television (3DTV) and free viewpoint video (FVV). We first analyze how compression artifacts in compressed depth-images result in distortions in rendered views. We show that the rendering position error is a monotonic function of the coding error. For the scenario in which cameras are arranged with parallel optical axes, we further demonstrate specific properties of rendering position error. Exploiting special characteristics of depth-images, namely smooth regions separated by sharp edges, we investigate a possible solution to suppress compression artifacts by encoding depth-images with a recently published sparsity-based in-loop de-artifacting filter. Simulation results show that applying such techniques not only provides significantly higher coding efficiency for depth-image coding, but, more importantly, also improves the quality of rendered views in terms of PSNR and subjective quality.
In this paper, we analyze focus mismatches among cameras utilized in a multiview system, and propose techniques
to efficiently apply our previously proposed adaptive reference filtering (ARF) scheme to inter-view prediction in
multiview video coding (MVC). We show that, with heterogeneous focus setting, the differences exhibit in images
captured by different cameras can be represented in terms of the focus setting mismatches (view-dependency) and
the depths of objects (depth-dependency). We then analyze the performance of the previously proposed ARF
in MVC inter-view prediction. The gains in coding efficiency show a strong view-wise variation. Furthermore,
the estimated filter coeffcients demonstrate strong correlation when the depths of objects in the scene remain
similar. By exploiting the properties derived from the theoretical and performance analysis, we propose two
techniques to achieve effcient ARF coding scheme: i) view-wise ARF adaptation based on RD-cost prediction,
which determines whether ARF is beneficial for a given view, and ii) filter updating based on depth-composition
change, in which the same set of filters will be used (i.e., no new filters will be designed) until there is significant
change in the depth-composition within the scene. Simulation results show that significant complexity savings
are possible (e.g., the complete ARF encoding process needs to be applied to only 20% ~ 35% of the frames)
with negligible quality degradation (e.g., around 0.05 dB loss).
We consider the problem of coding multi-view video that exhibits mismatches in frames from different views.
Such mismatches could be caused by heterogeneous cameras and/or different shooting positions of the cameras.
In particular, we consider focus mismatches across views, i.e., such that different portions of a video frame can
undergo different blurriness/sharpness changes with respect to the corresponding areas in frames from the other
views. We propose an adaptive filtering approach for cross-view prediction in multi-view video coding. The
disparity fields are exploited as an estimation of scene depth. An Expectation-maximization (EM) algorithm
is applied to classify the disparity vectors into groups. Based on the classification result, a video frame is
partitioned into regions with different scene-depth levels. Finally, for each scene-depth level, a two-dimensional
filter is designed to minimize the average residual energy of cross-view prediction for all blocks in the class.
The resulting filters are applied to the reference frames to generate better matches for cross-view prediction.
Simulation results show that, when encoding across views, the proposed method achieves up to 0.8dB gain over
current H.264 video coding.
We consider the problem of complexity reduction in motion/disparity estimation for multiview video coding. We propose predictive fast search algorithms that, after either the motion field or the disparity field has been estimated, obtain with low complexity a good set of candidate vectors for the other field. The proposed scheme performs predictive motion search from view to view and predictive disparity search from one time instant to another time. We also propose an efficient search pattern that starts with the candidate vectors from the proposed algorithms. Simulation results show a very significant reduction in encoding complexity with slight coding efficiency degradation as compared to the full search in both motion and disparity estimations.