State-of-the-art light field (LF) image coding solutions, usually, rely in one of two LF data representation formats: Lenslet or 4D LF. While the Lenslet data representation is a more compact version of the LF, it requires additional camera metadata and processing steps prior to image rendering. On the contrary, 4D LF data, consisting of a stack of sub-aperture images, provides a more redundant representation requiring, however, minimal side information, thus facilitating image rendering. Recently, JPEG Pleno guidelines on objective evaluation of LF image coding defined a processing chain that allows to compare different 4D LF data codecs, aiming to facilitate codec assessment and benchmark. Thus, any codec that does not rely on the 4D LF representation needs to undergo additional processing steps to generate an output comparable to a reference 4D LF image. These additional processing steps may have impact on the quality of the reconstructed LF image, especially if color subsampling format and bit depth conversions have been performed. Consequently, the influence of these conversions needs to be carefully assessed as it may have a significant impact on a comparison between different LF codecs. Very few in-depth comparisons on the effects of using existing LF representation have been reported. Therefore, using the guidelines from JPEG Pleno, this paper presents an exhaustive comparative analysis of these two LF data representation formats in terms of LF image coding efficiency, considering different color subsampling formats and bit depths. These comparisons are performed by testing different processing chains to encode and decode the LF images. Experimental results have shown that, in terms of coding efficiency for different color subsampling formats, the Lenslet LF data representation is more efficient when using YUV 4:4:4 with 10 bit/sample, while the 4D LF data representation is more efficient when using YUV 4:2:0 with 8 bit/sample. The “best” LF data representation, in terms of coding efficiency, depends on several factors which are extensively analyzed in this paper, such as the objective metric that is used for comparison (e.g., average PSNR-Y or average PNSR-YUV), the type of LF content, as well as the color format. The maximum objective quality is also determined, by evaluating the influence of each block from each processing chain in the objective quality of the reconstructed LF image. Experimental results show that, when the 4D LF data representation is not used the maximum achieved objective quality is lower than 50 dB, in terms of average PSNR-YUV.
Light field imaging based on a single-tier camera equipped with a microlens array – also known as integral, holoscopic, and plenoptic imaging – has currently risen up as a practical and prospective approach for future visual applications and services. However, successfully deploying actual light field imaging applications and services will require developing adequate coding solutions to efficiently handle the massive amount of data involved in these systems. In this context, self-similarity compensated prediction is a non-local spatial prediction scheme based on block matching that has been shown to achieve high efficiency for light field image coding based on the High Efficiency Video Coding (HEVC) standard. As previously shown by the authors, this is possible by simply averaging two predictor blocks that are jointly estimated from a causal search window in the current frame itself, referred to as self-similarity bi-prediction. However, theoretical analyses for motion compensated bi-prediction have suggested that it is still possible to achieve further rate-distortion performance improvements by adaptively estimating the weighting coefficients of the two predictor blocks.<p> </p>Therefore, this paper presents a comprehensive study of the rate-distortion performance for HEVC-based light field image coding when using different sets of weighting coefficients for self-similarity bi-prediction. Experimental results demonstrate that it is possible to extend the previous theoretical conclusions to light field image coding and show that the proposed adaptive weighting coefficient selection leads to up to 5 % of bit savings compared to the previous self-similarity bi-prediction scheme.
Light field imaging based on microlens arrays – also known as plenoptic, holoscopic and integral imaging – has recently
risen up as feasible and prospective technology due to its ability to support functionalities not straightforwardly available
in conventional imaging systems, such as: post-production refocusing and depth of field changing. However, to
gradually reach the consumer market and to provide interoperability with current 2D and 3D representations, a display
scalable coding solution is essential.
In this context, this paper proposes an improved display scalable light field codec comprising a three-layer hierarchical
coding architecture (previously proposed by the authors) that provides interoperability with 2D (Base Layer) and 3D
stereo and multiview (First Layer) representations, while the Second Layer supports the complete light field content. For
further improving the compression performance, novel exemplar-based inter-layer coding tools are proposed here for the
Second Layer, namely: (i) an inter-layer reference picture construction relying on an exemplar-based optimization
algorithm for texture synthesis, and (ii) a direct prediction mode based on exemplar texture samples from lower layers.
Experimental results show that the proposed solution performs better than the tested benchmark solutions, including the
authors’ previous scalable codec.
Holoscopic imaging became a prospective glassless 3D technology to provide more natural 3D viewing experiences to the end user. Additionally, holoscopic systems also allow new post-production degrees of freedom, such as controlling the plane of focus or the viewing angle presented to the user. However, to successfully introduce this technology into the consumer market, a display scalable coding approach is essential to achieve backward compatibility with legacy 2D and 3D displays. Moreover, to effectively transmit 3D holoscopic content over error-prone networks, e.g., wireless networks or the Internet, error resilience techniques are required to mitigate the impact of data impairments in the user quality perception. Therefore, it is essential to deeply understand the impact of packet losses in terms of decoding video quality for the specific case of 3D holoscopic content, notably when a scalable approach is used. In this context, this paper studies the impact of packet losses when using a three-layer display scalable 3D holoscopic video coding architecture previously proposed, where each layer represents a different level of display scalability (i.e., L0 - 2D, L1 - stereo or multiview, and L2 - full 3D holoscopic). For this, a simple error concealment algorithm is used, which makes use of inter-layer redundancy between multiview and 3D holoscopic content and the inherent correlation of the 3D holoscopic content to estimate lost data. Furthermore, a study of the influence of 2D views generation parameters used in lower layers on the performance of the used error concealment algorithm is also presented.
Holoscopic imaging, also known as integral imaging, has been recently attracting the attention of the research
community, as a promising glassless 3D technology due to its ability to create a more realistic depth illusion than the
current stereoscopic or multiview solutions. However, in order to gradually introduce this technology into the consumer
market and to efficiently deliver 3D holoscopic content to end-users, backward compatibility with legacy displays is
essential. Consequently, to enable 3D holoscopic content to be delivered and presented on legacy displays, a display
scalable 3D holoscopic coding approach is required.
Hence, this paper presents a display scalable architecture for 3D holoscopic video coding with a three-layer approach,
where each layer represents a different level of display scalability: Layer 0 - a single 2D view; Layer 1 - 3D stereo or
multiview; and Layer 2 - the full 3D holoscopic content. In this context, a prediction method is proposed, which
combines inter-layer prediction, aiming to exploit the existing redundancy between the multiview and the 3D holoscopic
layers, with self-similarity compensated prediction (previously proposed by the authors for non-scalable 3D holoscopic
video coding), aiming to exploit the spatial redundancy inherent to the 3D holoscopic enhancement layer.
Experimental results show that the proposed combined prediction can improve significantly the rate-distortion
performance of scalable 3D holoscopic video coding with respect to the authors’ previously proposed solutions, where
only inter-layer or only self-similarity prediction is used.
This paper proposes a network-aware macroblock (MB) coding mode decision method, which is both error resilient and
coding efficient. This method differs from traditional mode decision methods since MB mode decisions are made by
simultaneously taking into account: i) their rate-distortion (RD) cost and also ii) their impact on error resilience by
considering feedback information from the underlying network regarding current error characteristics. By doing so, the
amount of Intra coded MBs can be varied to better suit, in a cost efficient way, the current state of the network and,
therefore, further improve the decoded video quality for a given packet loss rate. The proposed approach outperforms a
network-aware version of the H.264/AVC reference software with cyclic MB Intra refresh, for typical test sequences
encoded at various bit rates and for several error conditions in terms of packet loss rate.
Rate and distortion models can play a very important role in real-time video encoding, since they can be used to obtain near optimal operation performance in terms of the RD tradeoff without the drawback of having to encode multiple times the same VOP to find the best combination of coding parameters. In the context of object-based video encoding, notably in MPEG-4 video encoding, rate and distortion models characterize the relation between the average number of bits/pixel to code a given Video Object Plane (VOP), the average VOP distortion, and the relevant coding parameters. These models are usually defined in terms of rate-quantization (RQ), distortion-quantization (DQ), and rate-distortion (RD) functions.
This paper addresses the problem of rate and distortion modeling for Intra and Inter coding in the context of object-based MPEG-4 video encoding. In the case of Intra coding, the VOP to encode does not depend on other past or future VOPs; therefore, its rate and distortion characteristics depend exclusively on the current quantizer parameter(s) and VOP statistics. In the case of Inter coding, the rate and distortion functions depend not only on the current VOP but also on its reference VOP(s); therefore the rate and distortion functions become bidimensional and consequently more difficult to estimate during encoding. In this paper, a new approach is proposed where the rate and distortion functions for Inter coding are modeled as one-dimensional functions plus an adaptation term.
Object-based coding approaches, such as the MPEG-4 standard approach, where a video scene is composed by several video objects, require that the rate control is performed by using two levels: the scene rate control and the object rate control. In this context, this paper presents a new scene level and object level rate control algorithm for low delay MPEG-4 video encoding capable of performing bit allocation for the several VOs in the scene, encoded at different VOP rates, and aiming at obtaining a better trade-off among spatial and temporal quality for the overall scene. The proposed approach combines rate-distortion modeling using model adaptation by least squares estimation and adaptive bit allocation to 'shape' the encoded data in order that the overall subjective quality of the encoded scene is maximized.
The advent of widespread mobile communications together with the continuous development of image communication markets led to the idea of offering mobile image communications, particularly mobile videotelephony. Since very low bitrate video coding is still a quite unexplored subject, a large research effort is being put into the study of the possible solutions. For the moment two main approaches related to the compatibility or not with the CCITT H.261 hybrid coding scheme are foreseen. This paper proposes some experiments related to the CCITT H.261 compatible solution in order to look for its limits, namely when typical mobile videotelephone sequences are used.