Joint motion vector encoding scheme with a pooled macroblock type

Bong-Soo Jung; Byeungwoo Jeon

doi:10.1117/1.3046713

1 December 2008 Joint motion vector encoding scheme with a pooled macroblock type

Bong-Soo Jung, Byeungwoo Jeon

Author Affiliations +

Optical Engineering, Vol. 47, Issue 12, 120501 (December 2008). https://doi.org/10.1117/1.3046713

Abstract

Motion vectors correlate very well with other neighboring motion vectors. Thus, many macroblocks have zero residual motion vectors within their blocks after differential pulse coded modulation using their individually predicted motion vectors. Motivated by this observation, we develop a new joint encoding scheme of motion vectors by defining a new macroblock coding mode called pooled zero motion vector difference coding to jointly code such cases more efficiently. Experimental results with several well-known video test sequences verify that the proposed method improves the coding efficiency up to 6.2% compared to the H.264|advanced video coding (AVC).

1. Introduction

In the H.264∣advanced video coding (AVC) standard, the motion-compensation process selects the best motion-compensation block size among many different sizes¹ ranging between $16 \times 16$ and $4 \times 4$ . To signal the selected block size, the inteṟcoding̱mode defines and signals the following macroblock types: SKIP, $P 16 \times 16$ , $P 16 \times 8$ , $P 8 \times 16$ , and $P 8 \times 8$ . The inteṟcoding̱mode of $P 8 \times 8$ can be further divided into $P 8 \times 4$ , $P 4 \times 8$ , and $P 4 \times 4$ in a hierarchical fashion. The Joint Model (JM) H.264∣AVC encoder^{2, 3} estimates motion vectors for each block size and selects the one that performs best in rate-distortion optimization. This fine block structure brings higher coding efficiency, but it generates up to 16 motion vectors, which means that there are more data to encode for motion vector information. In general, decreasing the block size for motion compensation also decreases the prediction error. However, small motion-compensation block sizes are not frequently decided by rate-distortion optimization because of increased motion vector data. The current method for coding motion vectors by the H.264∣AVC is still inefficient because it uses at least two bits to signal the zero residual motion vector, even if it is the same as its predicted motion vector. Furthermore, the zero residual motion vectors are encoded individually.

We propose a new joint motion vector coding scheme that can represent the case very efficiently when the motion vector of each block is the same as the individual predicted motion vector. This is achieved by pooled representation of the zero residual motion vectors using a new macroblocḵtype called PallzeroDMV mode.

2. Proposed Method

The compressed bitstream consists mainly of three kinds of information: texture information, motion information, and other information, such as headers. Usually, the texture information that represents transformed coefficients of pixel data takes up most of the compressed bits. However, the bit amount of motion information is not negligible unless the motion is extremely simple or the texture is very complex with low quantization parameter (QP).

Figure 1 shows the relative bit portions of motion information in total compressed bits with four different QP values. The sequence Foreman has a lot of motion, so it requires a large number of bits to represent that motion, and its relative portion increases as the bit rate decreases. When $QP = 40$ , about 47% of the total bits represent the motion information. The motion vectors of H.264∣AVC are coded differentially with respect to their motion vector predictors, which are the medians of three motion vectors of the spatially adjacent causal blocks, as shown in Fig. 2.

Fig. 1

Bit proportion of motion vector data in compressed bitstream (H.264∣AVC@baseline).

Fig. 2

Motion vector prediction.

H.264∣AVC does differential motion vector coding. The motion vector residual $m v d$ is calculated as:

1.

m v d = m v_{E} - p m v,

p m v = median (m v_{A}, m v_{B}, m v_{C}),

where

m v_{E}

is the motion vector of the current block E, and

p m v

is the median of its neighboring motion vectors, as shown in Fig. 2. If block C is not available when computing Eq. 1, then it is replaced by block D. Since motion vectors correlate closely with their spatial neighbors, more often than not the

m v d

is a zero vector. To maximize the coding efficiency of motion vectors, it is critical to effectively encode the cases that result in a zero vector of

m v d

because it happens very frequently. The current standard in H.264∣AVC in this regard is to use

2 bits

per block (one bit to indicate a zero value of the

x

component of

m v d

, and the other one bit for its

y

component).

In the JM H.264∣AVC reference software,⁴ motion vector selection is achieved by minimizing the rate-distortion optimization criterion:

2.

J = D + λ R,

R = R_{mv} + R_{ref},

where

R_{mv}

is the rate of the motion vector difference (i.e.,

m v d

), and

R_{ref}

is the rate of the reference frame index. The macroblock mode is decided by minimizing the rate-distortion cost among all block partitions

(16 \times 16, \dots,

4 \times 4)

.

We note that in low bit-rate applications, motion vector data may take a more dominant bit portion than texture data. In this case, the inefficient motion vector coding will lead to significantly lower coding efficiency. In the H.264∣AVC standard, the DIRECT mode coding in B frames need not transmit the motion vector because the decoder can infer it by using the motion vectors of previously decoded macroblocks. Similarly, the P̱SKIP mode in P frames sends neither the motion vector nor residual texture information. The motion vector is inferred by the median motion vector predictor of the $16 \times 16$ macroblock in the same way. This is possible because the motion vector of the current macroblock correlates well with those of neighboring macroblocks. These ideas inspired us to develop our new motion vector coding method based on a pooled representation of zero residual motion vectors. In sixteen $4 \times 4$ motion compensation blocks, we make a pooled representation of the 16 zero $m v d s$ . The cases of macroblock segmentation in the smallest block size happen frequently only if they can override the rate burden of motion vector data since smaller block size for motion compensation brings small prediction error. We signal the pooled $m v d$ by defining a new macroblocḵtype mode in P(Predictive) slices. This mode is called PallzeroDMV, and it indicates that the following conditions are jointly satisfied:

• The block size for motion compensation is $4 \times 4$ .
• The reference frame is the one closest in the reference frame memory to the current picture
• $m v d = 0$ for each of the 16 blocks in the macroblock.

For the PallzeroDMV mode, the rate-distortion criterion is proposed as follows:

3.

J_{PallzeroDMV} = D_{PallzeroDMV} + λ R_{PallzeroDMV},

R_{PallzeroDMV} = R_{mv} + R_{ref} = 0,

where

D_{PallzeroDMV}

is the distortion introduced by all sixteen

4 \times 4

motion compensation blocks in the PallzeroDMV mode. Here,

R_{mv}

and

R_{ref}

have zero rate, since PallzeroDMV mode does not send a motion vector nor a reference frame index to the decoder.

The proposed PallzeroDMV mode is easily accommodated in the H.264 encoder by adding the additional mode, as shown in Table 1. If all sixteen $4 \times 4$ block motion vectors are equal to each predicted motion vector, then the total number of bits in H.264∣AVC to represent all motion vectors is $32 bits$ . However, by implementing the proposed macroblock mode PallzeroDMV, it can be represented using only $3 bits$ for the macroblocḵtype. This is possible because the zero differential motion vectors are pooled and jointly represented by only one macroblocḵtype. Under the PallzeroDMV mode, a decoder can infer the 16 motion vectors as individual predicted motion vectors (PMVs), and the reference index as zero. Note that the proposed PallzeroDMV mode coding does not require any extra motion search to calculate the rate-distortion cost of the PallzeroDMV mode.

Table 1

Proposed inteṟcoding̱mode of macroblock in P-slices.

mḇtype	Code	No. of bits	Name of mḇtype
0	“1”	1	$P 16 \times 16$
1	“010”	3	PallzeroDMV
2	“011”	3	$P 16 \times 8$
3	“00100”	5	$P 8 \times 16$
4	“00101”	5	$P 8 \times 8$
5	“00110”	5	$P 8 \times 8 ref 0$
Inferred	Run coding	—	P̱SKIP

3. Experimental Results

To evaluate the performance of the proposed pooled coding scheme, we implemented it on the JM12.3 encoder reference software.³ In the simulation, we considered the baseline profile; three reference frames; context adaptive variable length coding entropy coding; the IPPP coding structure; full motion search with $\pm 32$ search range; QP values of 28, 32, 36, 40; and a total encoding of 150 frames. We used Foreman, Coastguard, Paris, and Silent sequences of common intermediate format (CIF) size, Jets, Night, and Bigships sequences of $720 \times 576$ standard definition (SD) size; and Night, ShuttleStart, and Crew sequences of $720 - p$ high definition (HD) size. The coding efficiency of the proposed method is compared with that of the H.264∣AVC encoder under the identical conditions. For a numerical comparison, we use the Bjontegaard Delta bit rate (BDBR) and Bjontegaard Delta peak signal-to-noise ratio (PSNR) (BDPSNR).⁵ The plus sign of BDPSNR and minus sign of BDBR indicate better PSNR and better bit rate reduction of the proposed method compared to those of its anchor case, respectively.

Table 2 shows that the proposed motion vector coding method is better than that of the current H.264∣AVC by an average of 3.83% in BDBR and $0.13 dB$ in BDPSNR. In particular, the proposed method has much higher coding efficiency than that of H.264∣AVC baseline in the Foreman and the ShuttleStart sequences. Based on the bit portion statistics in Fig. 1, this is somewhat expected—the Foreman has a lot of motion, and its motion information consumes a relatively large portion of the compressed bit budget. Therefore, a more efficient motion vector coding method has made a distinctive difference. These results verify that our proposed motion vector encoding approach using the new pooled zero differential motion vector coding is more efficient than the H.264∣AVC video coding standard.

Table 2

Coding performance of proposed method.

Sequence	BDPSNR(dB)	BDBR(%)
Foreman (CIF, $30 fps$ )	0.28	$- 5.91$
Coastguard (CIF, $30 fps$ )	0.09	$- 2.58$
Paris (CIF, $15 fps$ )	0.08	$- 1.53$
Silent (CIF, $15 fps$ )	0.17	$- 3.99$
Jets (SD, $30 fps$ )	0.20	$- 4.33$
Night (SD, $30 fps$ )	0.13	$- 3.09$
Bigships (SD, $30 fps$ )	0.18	$- 4.52$
Night (HD, $30 fps$ )	0.15	$- 3.56$
ShuttleStart (HD, $30 fps$ )	0.19	$- 6.16$
Crew (HD, $30 fps$ )	0.10	$- 2.64$
Average	0.15	$- 3.83$

4. Conclusion

We propose a new motion vector coding method using pooled zero differential motion vector representation. It is realized by defining a new macroblocḵtype called the PallzeroDMV mode. Experimental results show that the proposed coding method achieves a BDBR gain of $1.53 % \sim 6.16 %$ with no increase in complexity.

Acknowledgments

This work was supported in part by the Korea Science and Engineering Foundation (KOSEF) NRL program grant funded by the Korean government [MEST; ROA-2006-000-10826-0(2008)].

References

1.

Joint Video Team (JVT) of ISO/IEC MPEGITU-T VCEG, “Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264∣ISO/IEC 14496-10 AVC),” (2003). Google Scholar

2.

T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits Syst. Video Technol., 13 (7), 688 –703 (2003). https://doi.org/10.1109/TCSVT.2003.815168 1051-8215 Google Scholar

3.

JM12.3 Test Model CODEC ISO/IEC MPEG and ITU-T VCEG Joint Video Team (Online). Available: http://bs.hhi.de/suehring/tml/download/old_jm/jm12.3.zip Google Scholar

4.

K.-P. Lim, G. Sullivan, and T. Wiegand, “Text description of joint model reference encoding methods and decoding concealment methods,” (2005). Google Scholar

5.

G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” (2001). Google Scholar

Citation Download Citation

Bong-Soo Jung and Byeungwoo Jeon "Joint motion vector encoding scheme with a pooled macroblock type," Optical Engineering 47(12), 120501 (1 December 2008). https://doi.org/10.1117/1.3046713

Published: 1 December 2008

Access the abstract

JOURNAL ARTICLE
3 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 1 scholarly publication.

Explore citations on Lens.org

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Computer programming

Video coding

Modulation

Optical engineering

Video

Electrical engineering

Electronics engineering

1.

Introduction

2.

Proposed Method

Fig. 1

Fig. 2

1.

2.

3.

Table 1

3.

Experimental Results

Table 2

4.

Conclusion

Acknowledgments

References

Show All Keywords

Keywords/Phrases

Search In:

Publication Years