As of its open-loop structure and good decorrelation capability, motion-compensated temporal filtering (MCTF) provides a robust basis for highly-efficient scalable video coding. Combining MCTF with spatial wavelet decomposition and embedded quantization results in a 3D wavelet video compression system, providing temporal, spatial, and SNR scalability. Recent results indicate that the overall coding performance of these systems can be maximized if temporal filtering is performed in spatial domain (t+2D approach). However, as compared to non-scalable video coding, the performance of t+2D systems may not be satisfactory if spatial scalability needs to be provided. One important reason for this fact is the problem of spatial scalability of motion information. In this paper we present a conceptually new approach for t+2D-based video compression with spatially scalable
motion information. We call our approach overcomplete MCTF since multiple spatial-domain temporal filtering operations are needed to generate the lower spatial scales of the temporal subbands. Specifically, the encoder performs MCTF-based generation of reference sequences for the coarser spatial scales. We find that the newly generated reference sequences are of satisfactory quality. Compared to the conventional t+2D system, our approach allows for optimization of the reconstruction quality at lower spatial scales while having reduced impact on the reconstruction quality at high spatial scales/bitrates.
Contrary to predictive schemes such as hybrid video coding systems,
orthonormal transform coding systems are immune to error accumulation
in case of desynchronization between encoder and decoder. Therefore,
these systems allow for drift-free data adaptation at bit stream
level, thus, scalability. In t+2D interframe wavelet video coding,
wavelet-based motion-compensated temporal filtering is employed,
followed by spatial wavelet decomposition and bit plane coding. This
allows for temporal, spatial, and SNR scalability. While motion
compensation seems to be essential in this scheme to achieve excellent
coding performance, it causes local violation of the orthonormality of
the temporal transform. Particularly, motion compensated interframe
wavelet systems employ predictive coding for certain occlusion
areas. In case of reference mismatch between encoder and decoder,
error accumulation occurs in these regions. In this paper we present
an approach to adapt the encoder operating point for predictively
coded regions, effectively eliminating the reference mismatch
adaptively. An iterative algorithm for computation of the decoder
reference at the encoder side is presented for t+2D systems. We show
that this approach significantly increases overall coding performance,
gaining up to 1 dB in PSNR. Furthermore, the optimized quantization
algorithm presented in an earlier work can be applied more
effectively, leading to more even noise distribution.
In interframe wavelet video coding, wavelet-based motion-compensated temporal filtering (MCTF) is combined with spatial wavelet decomposition, allowing for efficient spatio-temporal decorrelation and temporal, spatial and SNR scalability. Contemporary interframe wavelet video coding concepts employ block-based motion estimation (ME) and compensation (MC) to exploit temporal redundancy between successive frames. Due to occlusion effects and imperfect motion modeling, block-based MCTF may generate temporal high frequency subbands with block-wise varying coefficient statistics, and low frequency subbands with block edges. Both effects may cause declined spatial transform gain and blocking artifacts. As a modification to MCTF, we present spatial highpass transition filtering (SHTF) and spatial lowpass transition filtering (SLTF), introducing smooth transitions between motion blocks in the high and low frequency subbands, respectively. Additionally, we analyze the propagation of quantization noise in MCTF and present an optimized quantization strategy to compensate for variations in synthesis filtering for different block types. Combining these approaches leads to a reduction of blocking artifacts, smoothed temporal PSNR performance, and significantly improved coding efficiency.
For exploitation of temporal interdependencies between consecutive frames, in existing 3D wavelet video coding concepts a blockwise motion estimation (ME) and compensation (MC) is employed. Because of local object motion, rotation or scaling, the processing of occlusion areas is problematic. In these regions, the calculation of correct motion vectors (MV) is not always possible and blocking artifacts may appear at the motion boundaries to the connected areas, for which uniquely referenced MV could be estimated. In order to avoid this, smooth transitions can be included around the occlusion pixels, which means to blur out the block artifacts. The proposed algorithm is based on the MC-EZBC 3D wavelet video coder (Motion-Compensated Embedded video coding algorithm using ZeroBlocks of subband / wavelet coefficients and Context modeling), which employs a lifting approach for temporal filtering.