1 December 2008 Joint motion vector encoding scheme with a pooled macroblock type
Author Affiliations +
Optical Engineering, 47(12), 120501 (2008). doi:10.1117/1.3046713
Motion vectors correlate very well with other neighboring motion vectors. Thus, many macroblocks have zero residual motion vectors within their blocks after differential pulse coded modulation using their individually predicted motion vectors. Motivated by this observation, we develop a new joint encoding scheme of motion vectors by defining a new macroblock coding mode called pooled zero motion vector difference coding to jointly code such cases more efficiently. Experimental results with several well-known video test sequences verify that the proposed method improves the coding efficiency up to 6.2% compared to the H.264|advanced video coding (AVC).
Jung and Jeon: Joint motion vector encoding scheme with a pooled macroblock type



In the H.264∣advanced video coding (AVC) standard, the motion-compensation process selects the best motion-compensation block size among many different sizes1 ranging between 16×16 and 4×4 . To signal the selected block size, the inteṟcoding̱mode defines and signals the following macroblock types: SKIP, P16×16 , P16×8 , P8×16 , and P8×8 . The inteṟcoding̱mode of P8×8 can be further divided into P8×4 , P4×8 , and P4×4 in a hierarchical fashion. The Joint Model (JM) H.264∣AVC encoder2, 3 estimates motion vectors for each block size and selects the one that performs best in rate-distortion optimization. This fine block structure brings higher coding efficiency, but it generates up to 16 motion vectors, which means that there are more data to encode for motion vector information. In general, decreasing the block size for motion compensation also decreases the prediction error. However, small motion-compensation block sizes are not frequently decided by rate-distortion optimization because of increased motion vector data. The current method for coding motion vectors by the H.264∣AVC is still inefficient because it uses at least two bits to signal the zero residual motion vector, even if it is the same as its predicted motion vector. Furthermore, the zero residual motion vectors are encoded individually.

We propose a new joint motion vector coding scheme that can represent the case very efficiently when the motion vector of each block is the same as the individual predicted motion vector. This is achieved by pooled representation of the zero residual motion vectors using a new macroblocḵtype called PallzeroDMV mode.


Proposed Method

The compressed bitstream consists mainly of three kinds of information: texture information, motion information, and other information, such as headers. Usually, the texture information that represents transformed coefficients of pixel data takes up most of the compressed bits. However, the bit amount of motion information is not negligible unless the motion is extremely simple or the texture is very complex with low quantization parameter (QP).

Figure 1 shows the relative bit portions of motion information in total compressed bits with four different QP values. The sequence Foreman has a lot of motion, so it requires a large number of bits to represent that motion, and its relative portion increases as the bit rate decreases. When QP=40 , about 47% of the total bits represent the motion information. The motion vectors of H.264∣AVC are coded differentially with respect to their motion vector predictors, which are the medians of three motion vectors of the spatially adjacent causal blocks, as shown in Fig. 2.

Fig. 1

Bit proportion of motion vector data in compressed bitstream (H.264∣AVC@baseline).


Fig. 2

Motion vector prediction.


H.264∣AVC does differential motion vector coding. The motion vector residual mvd is calculated as:


where mvE is the motion vector of the current block E, and pmv is the median of its neighboring motion vectors, as shown in Fig. 2. If block C is not available when computing Eq. 1, then it is replaced by block D. Since motion vectors correlate closely with their spatial neighbors, more often than not the mvd is a zero vector. To maximize the coding efficiency of motion vectors, it is critical to effectively encode the cases that result in a zero vector of mvd because it happens very frequently. The current standard in H.264∣AVC in this regard is to use 2bits per block (one bit to indicate a zero value of the x component of mvd , and the other one bit for its y component).

In the JM H.264∣AVC reference software,4 motion vector selection is achieved by minimizing the rate-distortion optimization criterion:


where Rmv is the rate of the motion vector difference (i.e., mvd ), and Rref is the rate of the reference frame index. The macroblock mode is decided by minimizing the rate-distortion cost among all block partitions (16×16,, 4×4) .

We note that in low bit-rate applications, motion vector data may take a more dominant bit portion than texture data. In this case, the inefficient motion vector coding will lead to significantly lower coding efficiency. In the H.264∣AVC standard, the DIRECT mode coding in B frames need not transmit the motion vector because the decoder can infer it by using the motion vectors of previously decoded macroblocks. Similarly, the P̱SKIP mode in P frames sends neither the motion vector nor residual texture information. The motion vector is inferred by the median motion vector predictor of the 16×16 macroblock in the same way. This is possible because the motion vector of the current macroblock correlates well with those of neighboring macroblocks. These ideas inspired us to develop our new motion vector coding method based on a pooled representation of zero residual motion vectors. In sixteen 4×4 motion compensation blocks, we make a pooled representation of the 16 zero mvds . The cases of macroblock segmentation in the smallest block size happen frequently only if they can override the rate burden of motion vector data since smaller block size for motion compensation brings small prediction error. We signal the pooled mvd by defining a new macroblocḵtype mode in P(Predictive) slices. This mode is called PallzeroDMV, and it indicates that the following conditions are jointly satisfied:

  • The block size for motion compensation is 4×4 .

  • The reference frame is the one closest in the reference frame memory to the current picture

  • mvd=0 for each of the 16 blocks in the macroblock.

For the PallzeroDMV mode, the rate-distortion criterion is proposed as follows:


where DPallzeroDMV is the distortion introduced by all sixteen 4×4 motion compensation blocks in the PallzeroDMV mode. Here, Rmv and Rref have zero rate, since PallzeroDMV mode does not send a motion vector nor a reference frame index to the decoder.

The proposed PallzeroDMV mode is easily accommodated in the H.264 encoder by adding the additional mode, as shown in Table 1. If all sixteen 4×4 block motion vectors are equal to each predicted motion vector, then the total number of bits in H.264∣AVC to represent all motion vectors is 32bits . However, by implementing the proposed macroblock mode PallzeroDMV, it can be represented using only 3bits for the macroblocḵtype. This is possible because the zero differential motion vectors are pooled and jointly represented by only one macroblocḵtype. Under the PallzeroDMV mode, a decoder can infer the 16 motion vectors as individual predicted motion vectors (PMVs), and the reference index as zero. Note that the proposed PallzeroDMV mode coding does not require any extra motion search to calculate the rate-distortion cost of the PallzeroDMV mode.

Table 1

Proposed inteṟcoding̱mode of macroblock in P-slices.

mḇtypeCodeNo. of bitsName of mḇtype
0“1”1 P16×16
2“011”3 P16×8
3“00100”5 P8×16
4“00101”5 P8×8
5“00110”5 P8×8ref0
InferredRun codingP̱SKIP


Experimental Results

To evaluate the performance of the proposed pooled coding scheme, we implemented it on the JM12.3 encoder reference software.3 In the simulation, we considered the baseline profile; three reference frames; context adaptive variable length coding entropy coding; the IPPP coding structure; full motion search with ±32 search range; QP values of 28, 32, 36, 40; and a total encoding of 150 frames. We used Foreman, Coastguard, Paris, and Silent sequences of common intermediate format (CIF) size, Jets, Night, and Bigships sequences of 720×576 standard definition (SD) size; and Night, ShuttleStart, and Crew sequences of 720-p high definition (HD) size. The coding efficiency of the proposed method is compared with that of the H.264∣AVC encoder under the identical conditions. For a numerical comparison, we use the Bjontegaard Delta bit rate (BDBR) and Bjontegaard Delta peak signal-to-noise ratio (PSNR) (BDPSNR).5 The plus sign of BDPSNR and minus sign of BDBR indicate better PSNR and better bit rate reduction of the proposed method compared to those of its anchor case, respectively.

Table 2 shows that the proposed motion vector coding method is better than that of the current H.264∣AVC by an average of 3.83% in BDBR and 0.13dB in BDPSNR. In particular, the proposed method has much higher coding efficiency than that of H.264∣AVC baseline in the Foreman and the ShuttleStart sequences. Based on the bit portion statistics in Fig. 1, this is somewhat expected—the Foreman has a lot of motion, and its motion information consumes a relatively large portion of the compressed bit budget. Therefore, a more efficient motion vector coding method has made a distinctive difference. These results verify that our proposed motion vector encoding approach using the new pooled zero differential motion vector coding is more efficient than the H.264∣AVC video coding standard.

Table 2

Coding performance of proposed method.

Foreman (CIF, 30fps )0.28 5.91
Coastguard (CIF, 30fps )0.09 2.58
Paris (CIF, 15fps )0.08 1.53
Silent (CIF, 15fps )0.17 3.99
Jets (SD, 30fps )0.20 4.33
Night (SD, 30fps )0.13 3.09
Bigships (SD, 30fps )0.18 4.52
Night (HD, 30fps )0.15 3.56
ShuttleStart (HD, 30fps )0.19 6.16
Crew (HD, 30fps )0.10 2.64
Average0.15 3.83



We propose a new motion vector coding method using pooled zero differential motion vector representation. It is realized by defining a new macroblocḵtype called the PallzeroDMV mode. Experimental results show that the proposed coding method achieves a BDBR gain of 1.53%6.16% with no increase in complexity.


This work was supported in part by the Korea Science and Engineering Foundation (KOSEF) NRL program grant funded by the Korean government [MEST; ROA-2006-000-10826-0(2008)].


1. Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, “Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264∣ISO/IEC 14496-10 AVC),” Document JVT-G050r1 (2003). Google Scholar

2.  T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits Syst. Video Technol.1051-8215 10.1109/TCSVT.2003.815168 13(7), 688–703 (2003). Google Scholar

3. JM12.3 Test Model CODEC ISO/IEC MPEG and ITU-T VCEG Joint Video Team (Online). Available:  http://bs.hhi.de/suehring/tml/download/old_jm/jm12.3.zipGoogle Scholar

4.  K.-P. Lim, G. Sullivan, and T. Wiegand, “Text description of joint model reference encoding methods and decoding concealment methods,” ISO/IEC MPEG and ITU-T VCEG Joint Video Team, Document JVT-N046 (2005). Google Scholar

5.  G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” ISO/IEC MPEG and ITU-T VCEG Joint Video Team, Document VCEG-M33, Austin, TX (2001). Google Scholar

Bong-Soo Jung, Byeungwoo Jeon, "Joint motion vector encoding scheme with a pooled macroblock type," Optical Engineering 47(12), 120501 (1 December 2008). https://doi.org/10.1117/1.3046713

Computer programming

Video coding


Optical engineering


Electrical engineering

Electronics engineering


Back to Top