In existing video coding schemes with spatial scalability based on pyramid frame representation, such as the ongoing H.264/MPEG-4 SVC (scalable video coding) standard, video frame at a high resolution is mainly predicted either from the lower-resolution image of the same frame or from the temporal neighboring frames at the same resolution. Most of these prediction techniques fail to exploit the two correlations simultaneously and efficiently. This paper extends the in-scale prediction technique developed for wavelet video coding to a generalized in-scale motion compensation framework for H.264/MPEG-4 SVC. In this framework, for a video frame at a high resolution layer, the lowpass content is predicted from the information already coded in lower resolution layer, but the highpass content is predicted by exploiting the neighboring frames at current resolution. In this way, both the cross-resolution correlation and temporal correlation are exploited simultaneously, which leads to much higher efficiency in prediction. Preliminary experimental results demonstrate that the proposed framework improves the spatial scalability performance of current H.264/MPEG-4 SVC. The improvement is significant especially for high-fidelity video coding. In addition, another advantage over wavelet-based in-scale scheme is achieved that the proposed framework can support arbitrary down-sampling and up-sampling filters.
The shift-variant property of discrete wavelet transform leads to the intrinsic coupling across various spatial subbands during motion aligned temporal filtering. It brings a big challenge how to achieve a better trade-off between subband independency and motion alignment efficiency in the context of providing spatial scalability in 3D wavelet video coding. This paper first investigates the issue of subband coupling in-depth. From the investigations on analysis and synthesis filters, we verify the existence and causes of subband coupling phenomenon and illustrate the subband leakage due to motion shift. Furthermore, we propose a method to measure the strength of this kind of coupling. Based on these investigations, we focus on schemes which preserving most subband coupling relationship and recommend that spatial highpass subbands should not be dropped completely. The issue of rate allocation for spatial highpass subbands is considered in this paper. An error propagation model is proposed to describe the effect of subband coupling in video reconstruction. The synthesis gain of each subband is estimated according to this model and it is finally adopted to guide the rate allocation algorithm. Experimental results have fully demonstrated the promising of the proposed techniques in improving both objective and subjective qualities of low resolution video, especially at middle and high bit rates, for 3D video coding schemes with spatial scalability.
This paper makes a comparative study on the various spatial scalable coding frameworks. The frameworks with multiple image-domain motion aligned temporal filtering at various spatial resolutions, named as multi-T+2D, are mainly investigated. First we investigate a multi-T+2D scheme based on redundant frame representation. The cross spatial layer redundancy and prediction methods are discussed. The redundancy brings significant performance loss for schemes providing wide range SNR spatial scalability. To remove the redundancy produced in the multi-resolution temporal filtering while retaining the advantage of spatial-domain motion compensation, a novel non-redundant multi-T+2D scheme is proposed. Performance comparison is given among the discussed frameworks and it shows that the proposed non-redundant multi-T+2D framework has a good performance for fully scalable video coding. We also verify that the redundant multi-T+2D framework with cross spatial layer reconstruction feedback is practical in providing narrow range SNR scalability for each spatial layer.
This paper proposes an adaptive block-size motion alignment technique in 3D wavelet coding to further exploit temporal correlations across pictures. Similar to B picture in traditional video coding, each macroblock can motion align from forward and/or backward for temporal wavelet de-composition. In each direction, a macroblock may select its partition from one of seven modes - 16x16, 8x16, 16x8, 8x8, 8x4, 4x8 and 4x4 - to allow accurate motion alignment. Furthermore, the rate-distortion optimization criterions are proposed to select motion mode, motion vectors and partition mode. Although the proposed technique greatly improves the accuracy of motion alignment, it does not directly bring the coding efficiency gain because of smaller block size and more block boundaries. Therefore, an overlapped block motion alignment is further proposed to cope with block boundaries and to suppress spatial high-frequency components. The experimental results show the proposed adaptive block-size motion alignment with the overlapped block motion alignment can achieve up to 1.0 dB gain in 3D wavelet video coding. Our 3D wavelet coder outperforms the MC-EZBC for most sequences by 1~2dB and we are doing up to 1.5 dB better than H.264.