In the H.264∣advanced video coding (AVC) standard, the motion-compensation process selects the best motion-compensation block size among many different sizes1 ranging between and . To signal the selected block size, the inteṟcoding̱mode defines and signals the following macroblock types: SKIP, , , , and . The inteṟcoding̱mode of can be further divided into , , and in a hierarchical fashion. The Joint Model (JM) H.264∣AVC encoder2, 3 estimates motion vectors for each block size and selects the one that performs best in rate-distortion optimization. This fine block structure brings higher coding efficiency, but it generates up to 16 motion vectors, which means that there are more data to encode for motion vector information. In general, decreasing the block size for motion compensation also decreases the prediction error. However, small motion-compensation block sizes are not frequently decided by rate-distortion optimization because of increased motion vector data. The current method for coding motion vectors by the H.264∣AVC is still inefficient because it uses at least two bits to signal the zero residual motion vector, even if it is the same as its predicted motion vector. Furthermore, the zero residual motion vectors are encoded individually.
We propose a new joint motion vector coding scheme that can represent the case very efficiently when the motion vector of each block is the same as the individual predicted motion vector. This is achieved by pooled representation of the zero residual motion vectors using a new macroblocḵtype called PallzeroDMV mode.
The compressed bitstream consists mainly of three kinds of information: texture information, motion information, and other information, such as headers. Usually, the texture information that represents transformed coefficients of pixel data takes up most of the compressed bits. However, the bit amount of motion information is not negligible unless the motion is extremely simple or the texture is very complex with low quantization parameter (QP).
Figure 1 shows the relative bit portions of motion information in total compressed bits with four different QP values. The sequence Foreman has a lot of motion, so it requires a large number of bits to represent that motion, and its relative portion increases as the bit rate decreases. When , about 47% of the total bits represent the motion information. The motion vectors of H.264∣AVC are coded differentially with respect to their motion vector predictors, which are the medians of three motion vectors of the spatially adjacent causal blocks, as shown in Fig. 2.
H.264∣AVC does differential motion vector coding. The motion vector residual is calculated as:is the motion vector of the current block E, and is the median of its neighboring motion vectors, as shown in Fig. 2. If block C is not available when computing Eq. 1, then it is replaced by block D. Since motion vectors correlate closely with their spatial neighbors, more often than not the is a zero vector. To maximize the coding efficiency of motion vectors, it is critical to effectively encode the cases that result in a zero vector of because it happens very frequently. The current standard in H.264∣AVC in this regard is to use per block (one bit to indicate a zero value of the component of , and the other one bit for its component).
In the JM H.264∣AVC reference software,4 motion vector selection is achieved by minimizing the rate-distortion optimization criterion:is the rate of the motion vector difference (i.e., ), and is the rate of the reference frame index. The macroblock mode is decided by minimizing the rate-distortion cost among all block partitions .
We note that in low bit-rate applications, motion vector data may take a more dominant bit portion than texture data. In this case, the inefficient motion vector coding will lead to significantly lower coding efficiency. In the H.264∣AVC standard, the DIRECT mode coding in B frames need not transmit the motion vector because the decoder can infer it by using the motion vectors of previously decoded macroblocks. Similarly, the P̱SKIP mode in P frames sends neither the motion vector nor residual texture information. The motion vector is inferred by the median motion vector predictor of the macroblock in the same way. This is possible because the motion vector of the current macroblock correlates well with those of neighboring macroblocks. These ideas inspired us to develop our new motion vector coding method based on a pooled representation of zero residual motion vectors. In sixteen motion compensation blocks, we make a pooled representation of the 16 zero . The cases of macroblock segmentation in the smallest block size happen frequently only if they can override the rate burden of motion vector data since smaller block size for motion compensation brings small prediction error. We signal the pooled by defining a new macroblocḵtype mode in P(Predictive) slices. This mode is called PallzeroDMV, and it indicates that the following conditions are jointly satisfied:
• The block size for motion compensation is .
• The reference frame is the one closest in the reference frame memory to the current picture
• for each of the 16 blocks in the macroblock.
For the PallzeroDMV mode, the rate-distortion criterion is proposed as follows:is the distortion introduced by all sixteen motion compensation blocks in the PallzeroDMV mode. Here, and have zero rate, since PallzeroDMV mode does not send a motion vector nor a reference frame index to the decoder.
The proposed PallzeroDMV mode is easily accommodated in the H.264 encoder by adding the additional mode, as shown in Table 1. If all sixteen block motion vectors are equal to each predicted motion vector, then the total number of bits in H.264∣AVC to represent all motion vectors is . However, by implementing the proposed macroblock mode PallzeroDMV, it can be represented using only for the macroblocḵtype. This is possible because the zero differential motion vectors are pooled and jointly represented by only one macroblocḵtype. Under the PallzeroDMV mode, a decoder can infer the 16 motion vectors as individual predicted motion vectors (PMVs), and the reference index as zero. Note that the proposed PallzeroDMV mode coding does not require any extra motion search to calculate the rate-distortion cost of the PallzeroDMV mode.
Proposed inteṟcoding̱mode of macroblock in P-slices.
|mḇtype||Code||No. of bits||Name of mḇtype|
To evaluate the performance of the proposed pooled coding scheme, we implemented it on the JM12.3 encoder reference software.3 In the simulation, we considered the baseline profile; three reference frames; context adaptive variable length coding entropy coding; the IPPP coding structure; full motion search with search range; QP values of 28, 32, 36, 40; and a total encoding of 150 frames. We used Foreman, Coastguard, Paris, and Silent sequences of common intermediate format (CIF) size, Jets, Night, and Bigships sequences of standard definition (SD) size; and Night, ShuttleStart, and Crew sequences of high definition (HD) size. The coding efficiency of the proposed method is compared with that of the H.264∣AVC encoder under the identical conditions. For a numerical comparison, we use the Bjontegaard Delta bit rate (BDBR) and Bjontegaard Delta peak signal-to-noise ratio (PSNR) (BDPSNR).5 The plus sign of BDPSNR and minus sign of BDBR indicate better PSNR and better bit rate reduction of the proposed method compared to those of its anchor case, respectively.
Table 2 shows that the proposed motion vector coding method is better than that of the current H.264∣AVC by an average of 3.83% in BDBR and in BDPSNR. In particular, the proposed method has much higher coding efficiency than that of H.264∣AVC baseline in the Foreman and the ShuttleStart sequences. Based on the bit portion statistics in Fig. 1, this is somewhat expected—the Foreman has a lot of motion, and its motion information consumes a relatively large portion of the compressed bit budget. Therefore, a more efficient motion vector coding method has made a distinctive difference. These results verify that our proposed motion vector encoding approach using the new pooled zero differential motion vector coding is more efficient than the H.264∣AVC video coding standard.
Coding performance of proposed method.
|Foreman (CIF, )||0.28|
|Coastguard (CIF, )||0.09|
|Paris (CIF, )||0.08|
|Silent (CIF, )||0.17|
|Jets (SD, )||0.20|
|Night (SD, )||0.13|
|Bigships (SD, )||0.18|
|Night (HD, )||0.15|
|ShuttleStart (HD, )||0.19|
|Crew (HD, )||0.10|
We propose a new motion vector coding method using pooled zero differential motion vector representation. It is realized by defining a new macroblocḵtype called the PallzeroDMV mode. Experimental results show that the proposed coding method achieves a BDBR gain of with no increase in complexity.
This work was supported in part by the Korea Science and Engineering Foundation (KOSEF) NRL program grant funded by the Korean government [MEST; ROA-2006-000-10826-0(2008)].