Reduced representations have been used to decrease the memory bandwidth requirements of fast motion estimation schemes. Usually, this is achieved on special-purpose architectures that exploit the reduced representations to do several distortion calculations in parallel. In this paper, we present a generic fast implementation that its suitable for various general-purpose architectures. The algorithm uses a novel data structure that is based on packing and 'overlapping' the reduced representation data into the native word size of the processor. Efficient motion estimation schemes to minimize the memory bandwidth between the processor and cache by exploiting this data structure are developed. These schemes can be tailored with ease to suit different general-purpose processors and media processors.
Compression and interpolation each require, given part of an image, or part of a collection or stream of images, being able to predict other parts. Compression is achieved by transmitting part of the imagery along with instructions for predicting the rest of it; of course, the instructions are usually much shorter than the unsent data. Interpolation is just a matter of predicting part of the way between two extreme images; however, whereas in compression the original image is known at the encoder, and thus the residual can be calculated, compressed, and transmitted, in interpolation the actual intermediate image is not known, so it is not possible to improve the final image quality by adding back the residual image. Practical 3D-video compression methods typically use a system with four modules: (1) coding one of the streams (the main stream) using a conventional method (e.g., MPEG), (2) calculating the disparity map(s) between corresponding points in the main stream and the auxiliary stream(s), (3) coding the disparity maps, and (4) coding the residuals. It is natural and usually advantageous to integrate motion compensation with the disparity calculation and coding. The efficient coding and transmission of the residuals is usually the only practical way to handle occlusions, and the ultimate performance of beginning-to-end systems is usually dominated by the cost of this coding. In this paper we summarize the background principles, explain the innovative features of our implementation steps, and provide quantitative measures of component and system performance.
A binocular disparity based segmentation scheme to compactly represent one image of a stereoscopic image pair given the other image was proposed earlier by us. That scheme adapted the excess bitcount, needed to code the additional image, to the binocular disparity detail present in the image pair. This paper addresses the issue of extending such a segmentation in the temporal dimension to achieve efficient stereoscopic sequence compression. The easiest conceivable temporal extension would be to code one of the sequences using an MPEG-type scheme while the frames of the other stream are coded based on the segmentation. However such independent compression of one of the streams fails to take advantage of the segmentation or the additional disparity information available. To achieve better compression by exploiting this additional information, we propose the following scheme. Each frame in one of the streams is segmented based on disparity. An MPEG-type frame structure is used for motion compensated prediction of the segments in this segmented stream. The corresponding segments in the other stream are encoded by reversing the disparity-map obtained during the segmentation. Areas without correspondence in this stream, arising from binocular occlusions and disparity estimation errors, are filled in using a disparity-map based predictive error concealment method. Over a test set of several different stereoscopic image sequences, high perceived stereoscopic image qualities were achieved at an excess bandwidth that is roughly 40% above that of a highly compressed monoscopic sequence. Stereo perception can be achieved at significantly smaller excess bandwidths, albeit with a perceivable loss in the image quality.
Stereoscopic image sequence transmission over existing monocular digital transmission channels, without seriously affecting the quality of one of the image streams, requires a very low bit-rate coding of the additional stream. Fixed block-size based disparity estimation schemes cannot achieve such low bit-rates without causing severe edge artifacts. Also, textureless regions lead to spurious matches which hampers the efficient coding of block disparities. In this paper, we propose a novel disparity-based segmentation approach, to achieve an efficient partition of the image into regions of more or less fixed disparity. The partitions are edge based, in order to minimize the edge artifacts after disparity compensation. The scheme leads to disparity discontinuity preserving, yet smoother and more accurate disparity fields than fixed block-size based schemes. The smoothness and the reduced number of block disparities lead to efficient coding of one image of a stereo pair given the other. The segmentation is achieved by performing a quadtree decomposition, with the disparity compensated error as the splitting criterion. The multiresolutional recursive decomposition offers a computationally efficient and non-iterative means of improving the disparity estimates while preserving the disparity discontinuities. The segmented regions can be tracked temporally to achieve very high compression ratios on a stereoscopic image stream.
We exploit the correlations between 3D-stereoscopic left-right image pairs to achieve high compression factors for image frame storage and image stream transmission. In particular, in image stream transmission, we can find extremely high correlations between left-right frames offset in time such that perspective-induced disparity between viewpoints and motion-induced parallax from a single viewpoint are nearly identical; we coin the term `wordline correlation' for this condition. We test these ideas in two implementations, straightforward computing of blockwise cross-correlations, and multiresolution hierarchical matching using a wavelet-based compression method. We find that good 3D-stereoscopic imagery can be had for only a few percent more storage space or transmission bandwidth than is required for the corresponding flat imagery.