Distributed video coding (DVC) is an emerging video coding paradigm for systems that require low-complexity encoders that are supported by high-complexity decoders as required, for example, in real-time video capture and streaming from one mobile phone to another. Under the assumption of an error-free transmission channel, the coding efficiency of current DVC systems is still below that of the latest video codecs, such as H.264/AVC. In order to increase the coding efficiency, we propose that every Wyner-Ziv frame be downsampled by a factor of two prior to encoding and the subsequent transmission. However, this would necessitate upsampling in conjunction with interpolation at the decoder. Simple interpolation (e.g., a bilinear or bicubic filter) would be insufficient because the high-frequency (HF) spatial image content would be missing. Instead, we propose the incorporation of a super-resolution (SR) technique based upon the example-based scene-specific method to allow this HF content to be recovered. The SR technique will add computational complexity to the decoder side of the DVC system, which is allowable within the DVC framework. Rate-distortion curves show that this novel combination of SR and DVC improves the system's peak signal-to-noise ratio (PSNR) performance by up to several decibels and can actually exceed the performance of the H.264/AVC codec when GOP = IP for some video sequences.
Distributed Video Coding (DVC) is an emerging video coding paradigm for the systems that require encoders having
low complexity that are supported by decoders having high complexity as would be required for, say, real-time video
capture and streaming from one mobile phone to display on another. Under the assumption of an error-free transmission
channel, the coding efficiency of current DVC systems is still below that of the latest conventional video codecs, such as
H.264/AVC. To increase coding efficiency we propose in this paper that either every second Key frame or every
Wyner-Ziv frame is downsampled by a factor of two in both dimensions prior to encoding and subsequent transmission.
However, this would necessitate upsampling coupled with interpolation at the decoder. Simple interpolation (e.g.,
bilinear or FIR filter) would not suffice since high-frequency (HF) spatial image content would be missing. Instead, we
propose the incorporation of a super-resolution (SR) technique that is based upon using example High Resolution images
with content that are specific to the Low Resolution scene that needs its HF content to be recovered. The example-based
scene-specific SR technique will add computational complexity to the decoder side of the DVC system, which is
allowable within the DVC framework. Rate-distortion curves will show that this novel combination of SR with DVC
improves the system performance by up to several decibels as measured by the PSNR, and can actually exceed the
performance of an H.264/AVC codec, using GOP=IP, for some video sequences.
Distributed Video Coding (DVC) is an emerging video coding paradigm for the systems that require low complexity
encoders supported by high complexity decoders. A typical real world application for a DVC system is mobile phones
with video capture hardware that have a limited encoding capability supported by base-stations with a high decoding
capability. Generally speaking, a DVC system operates by dividing a source image sequence into two streams, key
frames and Wyner-Ziv (W) frames, with the key frames being used to represent the source plus an approximation to the
W frames called S frames (where S stands for side information), while the W frames are used to correct the bit errors in
the S frames. This paper presents an effective algorithm to reduce the bit errors in the side information of a DVC
system. The algorithm is based on the maximum likelihood estimation to help predict future bits to be decoded. The
reduction in bit errors in turn reduces the number of parity bits needed for error correction. Thus, a higher coding
efficiency is achieved since fewer parity bits need to be transmitted from the encoder to the decoder. The algorithm is
called inter-bit prediction because it predicts the bit-plane to be decoded from previously decoded bit-planes, one bitplane
at a time, starting from the most significant bit-plane. Results provided from experiments using real-world image
sequences show that the inter-bit prediction algorithm does indeed reduce the bit rate by up to 13% for our test
sequences. This bit rate reduction corresponds to a PSNR gain of about 1.6 dB for the W frames.
This paper presents a fast implementation of a wavelet-based video codec. The codec consists of motion-compensated temporal filtering (MCTF), 2-D spatial wavelet transform, and SPIHT for wavelet coefficient coding. It offers compression efficiency that is competitive to H.264. The codec is implemented in software running on a general purpose PC, using C programming language and streaming SIMD extensions intrinsics, without assembly language. This high-level software implementation allows the codec to be portable to other general-purpose computing platforms. Testing with a Pentium 4 HT at 3.6GHz (running under Linux and using the GCC compiler, version 4), shows that the software decoder is able to decode 4CIF video in real-time, over 2 times faster than software written only in C language. This paper describes the structure of the codec, the fast algorithms chosen for the most computationally intensive elements in the codec, and the use of SIMD to implement these algorithms.
All existing video coding standards are based on block-wise motion compensation and block-wise DCT. At high
levels of quantization, block-wise motion compensation and transform produces blocking artifacts in the decoded
video, a form of distortion to which the human visual system is very sensitive. The latest video coding standard,
H.264/AVC, introduces a deblocking filter to reduce the blocking artifacts. However, there is still visible distortion
after the filtering when compared to the original video. In this paper, we propose a non-conventional filter to
further reduce the distortion and to improve the decoded picture quality. Different from conventional filters, the
proposed filter is based on a machine learning algorithm (decision tree). The decision trees are used to classify
the filter's inputs and select the best filter coeffcients for the inputs. Experimental results with 4 × 4 DCT
indicate that using the filter holds promise in improving the quality of H.264/AVC video sequences.
We investigate the issue of efficient data organization and representation of the curved wavelet coefficients [curved wavelet transform (WT)]. We present an adaptive zero-tree structure that exploits the cross-subband similarity of the curved wavelet transform. In the embedded zero-tree wavelet (EZW) and the set partitioning in hierarchical trees (SPIHT), the parent-child relationship is defined in such a way that a parent has four children, restricted to a square of 2×2 pixels, the parent-child relationship in the adaptive zero-tree structure varies according to the curves along which the curved WT is performed. Five child patterns were determined based on different combinations of curve orientation. A new image coder was then developed based on this adaptive zero-tree structure and the set-partitioning technique. Experimental results using synthetic and natural images showed the effectiveness of the proposed adaptive zero-tree structure for encoding of the curved wavelet coefficients. The coding gain of the proposed coder can be up to 1.2 dB in terms of peak SNR (PSNR) compared to the SPIHT coder. Subjective evaluation shows that the proposed coder preserves lines and edges better than the SPIHT coder.
Recursive wavelet filters and an alternative algorithm for implementing wavelet transform are presented in this paper. The recursive filters use previously calculated (past) wavelet coefficients as inputs to calculate the current wavelet coefficient, and provide the same transform results as convolutional FIR and lifting wavelet filters. The coefficients of the recursive filters are derived from those of conventional FIR wavelet filters. The wavelet transform with recursive filters requires a smaller amount of memory and is easy to implement in hardware. Another important advantage of the recursive filters is that perfect reconstruction can be easily achieved using recursive wavelet filters if a sequence of pixels to be transformed is extended by boundary pixel repetition. Boundary pixel repetition can be more efficient than the widely used method of symmetric extension for image and video coding.
The curved wavelet transform performs 1-D filtering along curves and exploits orientation features of edges and lines in an image to improve the compactness of the wavelet transform. This paper investigates the issue of efficient data organization and representation of the curved wavelet coefficients. We present an adaptive zero-tree structure that exploits the cross-subband similarity of the curved wavelet transform. The child positions in the adaptive zero-tree structure are not restricted to a square of 2x2 pixels and they vary with the curves along which the WT is performed. Five child patterns have been determined according to different combination of curve orientations. A new image coder, using the curved wavelet transform, is then developed based on this adaptive zero-tree structure and the set partitioning technique. Experimental results using synthetic and natural images show the effectiveness of the proposed adaptive zero-tree structure for encoding of the curved wavelet coefficients. The coding gain of the proposed coder can be as higher as 1.2dB in terms of PSNR compared to the SPIHT coder.
Multi-dimensional rate control schemes, which jointly adjust two or three coding parameters, have been recently proposed to achieve a target bit rate while maximizing some objective measures of video quality. The objective measures used in these schemes are the peak signal-to-noise ratio (PSNR) or the sum of absolute errors (SAE) of the decoded video. These objective measures of quality may differ substantially from subjective quality, especially when changes of spatial resolution and frame rate are involved. The proposed schemes are, therefore, not optimal in terms of human visual perception. We have investigated the impact on subjective video quality of the three coding parameters: spatial resolution, frame rate, and quantization parameter (QP). To this end, we have conducted two experiments using the H.263+ codec and five video sequences. In Experiment 1, we evaluated the impact of jointly adjusting QP and frame rate on subjective quality and bit rate. In Experiment 2, we evaluated the impact of jointly adjusting QP and spatial resolution. From these experiments, we suggest several general rules and guidelines that can be useful in the design of an optimal multi-dimensional rate control scheme. The experiments also show that PSNR and SAE do not adequately reflect perceived video quality when changes in spatial resolution and frame rate are involved, and are therefore not adequate for assessing quality in a multi-dimensional rate control scheme. This paper describes the method and results of the investigation.
In low bit rate coding applications, high quantization levels might be needed to achieve a target bit rate. However, such high levels of quantization are likely to decrease picture quality. A possible solution is to reduce temporal resolution by dropping, for instance, selected frames thereby lessening the requirement for high quantization levels and thus improving video quality. Similarly, the spatial resolution of the encoded video could also be manipulated to achieve the target bit rate. Therefore, it might be possible to maximize picture quality by adjusting dynamically these three parameters while still meeting bit rate constraints. To do so effectively, the relationship between these parameters, alone or in combination, and subjective picture quality must be known. In this paper, we investigated the effect on subjective quality of: quantization alone (Experiment 1); a reduction in spatial resolution either alone or combined to moderate levels of quantization (Experiment 2); and a reduction of temporal resolution either alone or combined with moderate levels of quantization (Experiment 3). The results suggest that at very low bit rates reductions in spatial or temporal resolution combined with moderate levels of quantization might be an effective means of reducing bit rate without further loss in video quality.
A field-sequential stereoscopic acquisition system based on off-the-shelf equipment and on in-house developed software for interpolating fields to interlaced frames is described. The software relies on object-based image analysis and the scheme is relatively robust for different types of scenes, including those with relatively fast motion and those with occluded and newly exposed areas. The off-the-shelf hardware consisted of a Sony DSR-PD1 with a Nu-View SX2000 Adapter. The adapter is a lens attachment that allows to views to be recorded: a view through the lens and a view displaced from the lens. Thus, a left-eye view is recorded in the odd (even) field and the right-eye view is recorded in the even (odd) field. After processing, the stereoscopic images could be played back at 120 Hz field rate and viewed without flicker and with smooth motion using standard electronic shutter glasses that are synchronized with the display of the odd and even fields.
A hybrid algorithm for estimating true motion fields is proposed in this paper. This algorithm consists of three steps: block-based initial motion estimation, image segmentation, and wrong motion vector correction based on objects. The hierarchical block-matching algorithms are improved for the initial motion estimation. The improved algorithm uses an adaptive technique to propagate motion vectors between hierarchical levels. It produces accurate motion field everywhere, except in the areas of motion occlusion. In order to correct wrong motion vectors in the areas of motion occlusion, the current image is segmented into objects and an object-based method is proposed to process the estimated motion fields. With the object-based method, wrong motion vectors are detected by approximating the estimated motion field in each object with a motion model, and are corrected using an object-adaptive interpolator. The object-adaptive interpolator is also used to increase the density of the motion field. Experimental results show that the improved hierarchical block-matching algorithm outperforms the conventional hierarchical block- matching algorithms. The proposed algorithm results in dense motion fields that are smooth within every object, discontinuous between objects of different motion, and very close to the true motion fields.
In natural video sequences, object movement causes regions to be covered or uncovered. Conventional algorithms for region-based motion estimation do not take these regions into full account. Uncovered regions seriously decrease the accuracy of motion estimation. This paper presents an algorithm for increasing the motion estimation accuracy. This algorithm detects uncovered regions and uses them to improve image segmentation. Experimental results show that the presented algorithm is effective in reducing the displaced frame difference, without introducing any extra information for coding applications.
Global motion estimation and compensation are important research issues in video compression. The main difficulty in global motion estimation resides in the disturbance of independently moving objects. The algorithm presented in this paper exploits global motion information not only form stationary objects and the image background, but also from independently moving objects. Simulation results show that he new algorithm is more robust to the disturbance of independently moving objects, and computationally faster than an algorithm based on least-square approximation.
This paper presents an approach to realistic motion field estimation. In this approach, an image is first segmented into homogeneous regions using a new multiscale gradient algorithm followed by watershed transformation. The multiscale gradient algorithm efficiently solves the over- segmentation problem of watershed transformation, increases segmentation accuracy and reduces the computational cost. The motion field is then estimated using block-matching with a consistency constraint. The consistency constraint function is defined by the neighboring motion vectors and the segmentation map. Simulation results show that the motion fields generated by the block-matching with consistency constraint are very smooth within each object, approaching realistic motion fields, even when a small block size is used.
In this paper, performances of basic morphological filters are evaluated. Very simple output distribution expressions of erosions and openings are given for independent non-identically distributed inputs. The output means and variances for input signals plus white Gaussian, bi- exponential, and uniform noises are then analyzed and computed. These results are used to compare the performances of morphological filters with those of median filters, (alpha) - trimmed mean filters, ranked-order filters and running mean filters. The comparisons show that morphological filters achieve the best edge preservation for all the three kinds of noises. As regards to noise suppression, morphological filters are the best ones for uniform noise, median filters are optimal for bi-exponential noise, and running mean filters are optimal for Gaussian noise. Performances of (alpha) -trimmed mean filters spread between those of median and linear filters, while performances of ranked-order filters are compromises between those of erosions (or dilation) and median filters.