Translator Disclaimer
1 August 2011 Texture video-assisted motion vector predictor for depth map coding
Author Affiliations +
A texture video-assisted motion vector predictor for depth map coding is proposed in this letter. Based on the analyses of motion similarity between texture videos and their corresponding depth maps, the proposed approach uses the motion vectors of texture videos and the median predictor jointly to determine the optimal predicted motion vector for depth map coding by employing a rate-distortion (R-D) criterion. Experimental results demonstrate that compared with the median predictor utilized in H.264/AVC, the proposed method can save the maximum and average bit rate as high as 4.89% and 3.68%, respectively, while guaranteeing the quality of synthesized virtual views.



Three-dimensional video (3DV) has attracted people's attention as an emerging new media. Owing to limited transmission bandwidth, efficient compression of 3DV, which consists of texture video coding and depth map coding, is a key enabling factor for its applications. In Ref. 1, research on texture video compression has been carried out by adopting the interview correlations between different viewpoints besides temporal and spatial correlations. Also, many works have focused on depth map coding recently.2, 3

In Ref. 2, the motion estimation process was skipped by directly using the motion vectors (MVs) of texture videos, thus reducing the coding cost for depth maps. While the method in Ref. 3 generated three candidate modes and MVs for depth map coding using the motion information of texture videos. Inspired by these previous works, the motion information of texture videos was utilized in this letter for improving the depth map coding efficiency.

In H.264/AVC, to improve the video coding efficiency of interprediction, the median of the MVs of neighboring coded blocks was applied to predict the MV for the current coding block. Thus, the accuracy of predicted motion vector (PMV) highly affects the coding efficiency of motion-compensated prediction.4 Among the existing proposed methods for improving the efficiency of the predictive coding of MV, the competition-based PMV selection method proposed by Laroche5 has been adopted to the Key Technology Area (KTA) due to its significant bit rate reduction for texture videos. Motivated by the MV-competition method,5 this letter proposed a texture video-assisted motion vector predictor for depth map coding.


Motion Correlation Analyses Between Texture Videos and Depth Maps


Motion Similarity Analysis between Texture Videos and Depth Maps

For the object belonging to the same macroblock, the MVs of texture videos and depth maps should be the same in a sense of physics. However, the texture video MVs cannot represent the physical displacement values of the object due to texture complexity. Thus, the MVs between texture videos and depth maps may be coincident when reflecting the displacement of the same object. While in other cases, since the pixels in the coding block belong to different objects, or even if they are located in the same object, due to the texture complexity, the MVs between texture videos and depth maps are not the same, even not similar, as shown in the following experiment.

In the experiment, texture videos and depth maps were encoded with H.264/AVC using four quantization parameters (QPs), i.e., 27, 32, 37, 42. The MVs of texture videos and depth maps for each 4×4 block were extracted and analyzed. An MV similar evaluation criterion is introduced in Eq. 1

Eq. 1

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{eqnarray} E_{mv} (i,j,k) = \left\{ {\begin{array}{*{20}c} {1,} & {{\rm }\left\| {{\bf MV}_t (i,j,k) - {\bf MV}_d (i,j,k)} \right\| < T} \\[3pt] {0,} & {{\rm otherwise}} \\ \end{array},} \right.\nonumber\\ \end{eqnarray}\end{document} Emv(i,j,k)=1,MVt(i,j,k)MVd(i,j,k)<T0, otherwise ,
where MVt(i, j, k) and MVd(i, j, k) are the MVs for block (i, j) in the k’th picture of the texture videos and depth maps, respectively, and T is a predefined threshold and is set to 2. It is noted that the threshold T is only applied in motion similarity analysis and will not be used in the following experiments. When ∥MVt(i, j, k)−MVd(i, j, k)∥<T, Emv is set to 1, namely, MVs of the two blocks are similar; otherwise setting Emv to 0, which means there may be quite different MVs of the two blocks. Then, the percentage of similar MVs of corresponding blocks between texture videos and depth maps was computed via Eq. 2

Eq. 2

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} P_{td} = \frac{{16 \times \sum\nolimits_{k = 1}^{N_f } {\sum\nolimits_{i = 1}^{{H / 4}} {\sum\nolimits_{j = 1}^{{W / 4}} {E_{mv} (i,j,k)} } } }}{{H \times W \times N_f }}, \end{equation}\end{document} Ptd=16×k=1Nfi=1H/4j=1W/4Emv(i,j,k)H×W×Nf,
where Nf represents the total number of pictures and H×W represents the image size. All the Ptd values are shown in Table 1.

Table 1

Percentage of similar MVs between texture videos and depth maps.

Ptd (%)
SequencesQP = 27QP = 32QP = 37QP = 42
Breakdancers view 0259.0267.8076.3684.60
Breakdancers view 0459.4268.9177.3985.71
Ballet view 0275.6480.8486.7090.12
Ballet view 0477.0182.4287.4089.81
Bookarrival view 0877.4781.3983.8986.35
Bookarrival view 1075.1779.3983.4886.51

As seen from Table 1, the value of Ptd becomes larger with the increase of QP value for a certain sequence. It is because there are much more skipped macroblocks (MBs) and 16×16 modes in depth map coding than texture video coding,3 while for texture video coding, the number of such MBs and modes increases for higher QP, thus resulting in the increased percentage of similar MVs between texture videos and depth maps. To summarize, when coding depth maps, the MVs of several blocks in texture videos can be used as PMVs for associated blocks in depth maps; while for other blocks with relatively large MV differences, the texture video MVs should not be used.


Accuracy Analysis of Employing Texture Video MV as PMV for Depth Map

We compared the accuracy of using the median PMV and texture video MV as PMV for depth map, and the results are described in Table 2.

Table 2

Comparison results of motion vector difference.

SequencesQP = 27QP = 32QP = 37QP = 42
Breakdancers view 0215.5525.9334.7346.19
Breakdancers view 0418.9328.6842.2653.50
Ballet view 0262.9275.6786.8992.35
Ballet view 0463.2879.5187.8692.63
Bookarrival view 0877.8284.5187.9892.10
Bookarrival view 1072.0377.8385.0590.54

In the experiment, the MV for the depth map block is denoted as MVd, and the median PMV and PMV based on texture video MV are PMVm and PMVt, respectively. ∥MVDm∥ represents the vector difference between MVd and PMVm, and ∥MVDt∥ represents the vector difference between MVd and PMVt. In Table 2, Ptm indicates the percentage of the blocks with ∥MVDt∥−∥MVDm∥≤0 in all the blocks of the picture. From Table 2, the value of Ptm increases with the increase of QP value for the certain sequence, which is mainly attributable to the occurrence of more skipped MBs and 16×16 modes in texture video coding for higher QPs. Thus, we can draw the conclusion that for some blocks of depth maps, it is more accurate to select the MV of corresponding texture video as PMV for depth map than the median PMV.


Texture Video-Assisted Motion Vector Predictor for Depth Map Coding

Motivated by the analyses in Sec. 2, a texture video-assisted motion vector predictor for depth map coding was proposed. In the proposed algorithm, both the median PMV and texture video MVs are adopted to predict MVs for depth maps.

In order to obtain the optimal PMV between the median PMV and the proposed PMV for depth maps, the rate-distortion optimization criterion5 is employed to select the PMV with the minimum rate-distortion (R-D) cost. The Lagrangian cost function is given by

Eq. 3

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} J({\bf S},{\bf I}|\lambda) = D({\bf S},{\bf I}) + \lambda R({\bf S},{\bf I}),\end{equation}\end{document} J(S,I|λ)=D(S,I)+λR(S,I),
where S is the depth map data, I is the coding parameter set including coding mode, motion information, etc., λ indicates the Lagrangian multiplier, D(S, I) and R(S, I) are the total distortion and rate, respectively, resulting from the coding of S with a particular combination of coding options I. It should be noted that the selection of PMV is included in I. The optimal PMV can be obtained via

Eq. 4

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\bf PMV}^* = \arg \min \{ J[{\bf S},{\bf I}({\bf PMV}_i |_{i \in \left\{ {t,m} \right\}})|\lambda]\}.\end{equation}\end{document} PMV*=argmin{J[S,I(PMVi|it,m)|λ]}.
Similar with the MV-competition method in Ref. 5, the selected optimal PMV can give the minimum R-D cost for skip mode; while for other modes, the selected optimal PMV can guarantee the minimum motion vector difference.


Experimental Results and Analyses

Experiments were carried out using three 3DV sequences as shown in Table 3. In Table 3, the left view and right view were used to synthesize the virtual view in the position of the central view. Experiments were performed on the KTA test platform version 2.6r1 (KTA2.6r1). Context-based adaptive binary Arithmetic coding and four QPs (27, 32, 37, 42) were used.

Table 3

Sequences parameters.

SequencesResolutionLeft viewCentral viewRight view
Breakdancers1024×768view 02view 03view 04
Ballet1024×768view 02view 03view 04
Bookarrival1024×768view 10view 09view 08

The performance comparison among the proposed method, H.264/AVC, and the MV-competition (temporal PMV + median PMV) method5 is tabulated in Table 4. In Table 4, the Bjontegaard delta (BD) bit rate6 denotes the percentage of average bit rate savings at the same coding quality; while the BD peak signal-to-noise ration (PSNR)6 indicates average increase of PSNR at the same coding bit rate. As seen from Table 4, the achieved average BD bit rate is −3.68% compared with H.264/AVC. The percentages of selecting texture video MV as optimal PMV among skip and 16×16 MBs, as well as other block types in the depth map coding, are provided in Table 4. For depth map coding, the block types other than skip and 16×16 MBs mostly occur in the object edge with discontinuous depth values, the MVs among neighboring blocks have relatively large differences. Thus, for these block types, the texture video MV would be more accurate than the median PMV and can be selected as the optimal PMV for depth maps. Thereby, the coding bits for MV difference in depth maps are saved. Moreover, from Table 4, it can be seen that the proposed method can give better coding results than the MV-competition method.

Table 4

Coding efficiency of different test sequences.

H.264/AVCMV- competitionProposedPercentage of Texture MV(%)Compared with MV-competitionCompared with H.264/AVC
SequencesPSNR (dB)Bit rate (kbit/s)PSNR (dB)Bit rate (kbit/s)PSNR (dB)Bit rate (kbit/s)Skip & 16×16 MBsOther block typesBD PSNR (dB)BD bit rate (%)BD PSNR (dB)BD bit rate (%)
Breakdancers view 0247.74707.3247.72705.2447.74702.220.8427.340.05−0.910.15−2.78
Breakdancers view 0447.68705.4347.67700.0347.64699.250.8827.660.03−0.740.13−2.56
Ballet view 0247.55605.0847.57602.2547.58601.601.0229.100.10−1.500.24−3.94
Ballet view 0447.9561.6947.93551.8347.92551.531.0530.850.05−1.090.20−3.52
Bookarrival view 0845.89336.4845.85329.2545.87327.021.1433.720.06−1.340.24−4.89
Bookarrival view 1046.1461.5146.13458.3646.11457.771.8130.810.06−1.300.21−4.39

To further verify the effectiveness of the proposed method, the qualities of the synthesized virtual views were also compared as shown in Table 5. As seen from Table 5, the average BD PSNR of synthesized virtual views using the proposed method was improved up to 0.13 dB compared with the median PMV utilized in H.264/AVC.

Table 5

PSNR comparison of the synthesized virtual view.

SequencesBD PSNR (dB)Average (dB)
Breakdancers virtual view 030.08
Ballet virtual view 030.110.13
Bookarrival virtual view 090.20



Based on the analyses of motion correlations between texture videos and depth maps, the texture video MV can be used as a candidate PMV for depth map coding in the proposed method. Experimental results have verified that the proposed method can achieve higher coding efficiency than the median PMV utilized in H.264/AVC.


The work was supported by Xidian-ZTE Research Funds.



P. Merkle, A. Smolic, K. Müller, and T. Wiegand, “Efficient prediction structures for multiview video coding,” IEEE Trans. Circuits Syst. Video Technol., 17 (11), 1461 –1473 (2007). Google Scholar


S. Grewatch and E. Müller, “Sharing of motion vectors in 3D video coding,” 3271 –3274 (2004). Google Scholar


H. Oh and Y. S. Ho, “H.264-based depth map sequence coding using motion information of corresponding texture video,” 898 –907 (2006). Google Scholar


H. Yuan, Y. Chang, Z. Lu, and Y. Ma, “Model based motion vector predictor for zoom motion,” IEEE Signal Process. Lett., 17 (9), 787 –790 (2010). Google Scholar


G. Laroche, J. Jung, and B. P. Popescu, “RD optimized coding for motion vector predictor selection,” IEEE Trans.Circuits Syst. Video Technol., 18 (12), 1681 –1691 (2008). Google Scholar


G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” (2001) Google Scholar
©(2011) Society of Photo-Optical Instrumentation Engineers (SPIE)
XiaoXian Liu, Yilin Chang, Zhibin Li, and Junyan Huo "Texture video-assisted motion vector predictor for depth map coding," Optical Engineering 50(8), 080504 (1 August 2011).
Published: 1 August 2011


Improved motion vector predictor for video coding
Proceedings of SPIE (June 24 2005)
Enhancements to MPEG4 MVC for depth compression
Proceedings of SPIE (August 04 2010)
Object-based indexing of MPEG-4 compressed video
Proceedings of SPIE (January 10 1997)
Influence of camera and in scene motion on perceived video...
Proceedings of SPIE (February 14 2008)

Back to Top