## 1.

## Introduction

In general inter-predicted video coding, the motion vector (MV) provides a spatial offset of a block in the current picture to a block in the reference picture. Therefore, more MV information is needed to improve the accuracy of the inter prediction. To minimize the number of bits required to represent MV information, the H.264/AVC standard applies a predictive coding method by using predictive motion vectors (PMVs), which are calculated as the median of three spatially neighboring MVs.^{1} The median PMV is effective at reducing the required number of compressed MV bits, since it is very similar to the MV in most cases. However, the median PMV is not always optimal for minimizing the number of MV bits. If a more precise PMV exists than the median PMV, there is a chance that even more bits can be saved.

To overcome this problem, several approaches have been taken.^{2, 3, 4} In Chen and Willson’s work,^{2} MVs that are located in spatially and temporally neighboring blocks are additionally considered to select a more precise PMV. A spatial or temporal PMV is selected according to the value of the predictors. However, this method cannot ensure selection of the optimal PMV. In Kim and Ra’s work,^{3} an optimal PMV is selected by using a distance measure function between the MV and PMV. In this method, however, additional information is required to decide which candidate PMV is an optimal one. Finally, in Laroche ’s work,^{4} several candidate PMVs are generated by a combination of spatial and temporal neighboring MVs. A more precise PMV is then selected by a rate distortion (RD) competing scheme. As in Kim and Ra’s work,^{3} in some cases this method requires additional information to determine which predictor is to be used. This means that the benefit of using the optimal PMV may not be fully realized, because the choice of the optimal PMV by the encoder must be signaled to the decoder.

Therefore, in this paper, we propose a new motion vector coding method with optimal PMV selection (MVOP) to use the optimal PMV without the need for additional signaling information. First, the encoder defines the set of possible candidate PMVs by using neighboring MVs. To minimize bits of the MV information, an optimal PMV is selected among the candidate PMV set. If the decoder can estimate the optimal PMV by using decoder-side estimation, the encoder selects it as an optimal PMV. Otherwise, the encoder selects the median PMV in the same manner as the H.264/AVC standard. In the worst case of the proposed method, only $1\phantom{\rule{0.3em}{0ex}}\text{bit}$ of additional information is required to signal whether the decoder can estimate the optimal PMV or not. Simulation results show that the proposed method reduces the average Bjontegaard delta bit rate (BDBR) by about 2.97% and increases the average Bjontegaard delta peak signal-to-noise ratio (BDPSNR) by about $0.14\phantom{\rule{0.3em}{0ex}}\mathrm{dB}$ compared with the H.264/AVC standard.

## 2.

## Proposed Method

## 2.1.

### PMV Candidate Set

As shown in Fig. 1, the candidate set (CS) is defined to select a more precise PMV than the median PMV. The CS, which is a group of possible and distinct candidate PMVs for the current block, is composed of a combination of horizontal and vertical components of spatial neighboring MVs. The CS is defined by

## 1

$$\mathrm{CS}=\text{combination}\phantom{\rule{0.3em}{0ex}}\text{of}\{\mathbf{m}{\mathbf{v}}^{L},\mathbf{m}{\mathbf{v}}^{U},\mathbf{m}{\mathbf{v}}^{R}\}=\{({\mathrm{mv}}_{x}^{L},{\mathrm{mv}}_{y}^{L}),({\mathrm{mv}}_{x}^{L},{\mathrm{mv}}_{y}^{U}),\dots ,({\mathrm{mv}}_{x}^{R},{\mathrm{mv}}_{y}^{U}),({\mathrm{mv}}_{x}^{R},{\mathrm{mv}}_{y}^{R})\},$$## 2.2.

### Optimal PMV Selection at the Encoder Side

To select an optimal PMV among the CS, we define an optimal PMV selection function $f(\cdot )$ , which is given by

## 2

$$f(\mathbf{p}\mathbf{m}\mathbf{v}{\mathbf{c}}^{C}\mid \mathbf{m}{\mathbf{v}}^{C})=r\left(\mathbf{d}\mathbf{m}{\mathbf{v}}^{C}\right)=r({\mathrm{mv}}_{x}^{C}-{\mathrm{pmvc}}_{x}^{C},{\mathrm{mv}}_{y}^{C}-{\mathrm{pmvc}}_{y}^{C}),$$## 3

$$\mathbf{p}\mathbf{m}{\mathbf{v}}^{C\left(\mathrm{opt}\right)}=\underset{\mathbf{p}\mathbf{m}\mathbf{v}{\mathbf{c}}^{C}\u220a\mathrm{CS}}{\mathrm{arg}\phantom{\rule{0.2em}{0ex}}\mathrm{min}}\phantom{\rule{0.2em}{0ex}}f(\mathbf{p}\mathbf{m}\mathbf{v}{\mathbf{c}}^{C}\mid \mathbf{m}{\mathbf{v}}^{C}).$$## 2.3.

### Optimal PMV Estimation at the Decoder Side

To estimate an optimal PMV at the decoder with known information of the DMV
$\mathbf{d}\mathbf{m}{\mathbf{v}}^{C}$
, template matching^{5} is applied with a matching criterion function
$g(\cdot )$
:

## 4

$$g(\mathbf{p}\mathbf{m}\mathbf{v}{\mathbf{c}}^{C}\mid \mathbf{d}\mathbf{m}{\mathbf{v}}^{\mathbf{C}})=\sum _{i\u220a\mathrm{TMS}}{[\mathrm{Ref}(\mathbf{p}\mathbf{m}\mathbf{v}{\mathbf{c}}^{C}+\mathbf{d}\mathbf{m}{\mathbf{v}}^{C},i)-\mathrm{Cur}\left(i\right)]}^{2},$$In the decoding process, all possible PMV candidates in the CS are tested by template matching those pixels indicated by the TMS to find the optimal PMV having the minimum matching error as

## 5

$$\mathbf{p}\mathbf{m}{\mathbf{v}}^{C\left(\mathrm{dec}\right)}=\underset{\mathbf{p}\mathbf{m}\mathbf{v}{\mathbf{c}}^{C}\u220a\mathrm{CS}}{\mathrm{arg}\phantom{\rule{0.2em}{0ex}}\mathrm{min}}\phantom{\rule{0.2em}{0ex}}g(\mathbf{p}\mathbf{m}\mathbf{v}{\mathbf{c}}^{C}\mid \mathbf{d}\mathbf{m}{\mathbf{v}}^{C}).$$## 6

$$\mathbf{p}\mathbf{m}{\mathbf{v}}^{C\left(\mathrm{dec}\right)}=\mathbf{p}\mathbf{m}{\mathbf{v}}^{C\left(\mathrm{opt}\right)}.$$^{4}

## 2.4.

### Encoding Mode Decision for Motion Vector Coding

Three different modes are considered as follows. Firstly, when neighboring MVs are all unavailable
$(\mid \mathrm{CS}\mid =0)$
or identical
$(\mid \mathrm{CS}\mid =1)$
, there is only one choice for PMV selection. Therefore, in this case, called the *exceptional mode*, the encoder has to use the available PMV. Because the decoder can recognize this situation for itself, no other information is sent. Secondly, the case of the *fallback mode*, in which the optimal PMV is the same as the median PMV, can also be autonomously recognized by the decoder; thus no extra signaling for this mode to the decoder is needed either. The decoder uses the median as the predictor to reconstruct the motion vector. Finally, if a block does not belong to either of those two modes, the decoder recognizes it as belonging to the *competing mode*, which requires the decoder to be informed whether it should use the estimated optimal PMV or not. Thus,
$1\phantom{\rule{0.3em}{0ex}}\text{bit}$
of additional information, called mvop̱flag, is needed. If mvop̱flag is 1, the DMV is decoded using an optimal PMV obtained by the decoder using the template matching. If mvop̱flag is 0, the decoder uses the median PMV in decoding the DMV. In the *competing mode*, finer macroblock partition for better motion compensation could be implemented. For an encoder to make an RD-optimized macroblock partition decision, a slightly modified RD measure function
$J$
is used.

## 3.

## Experimental Results

To evaluate the performance of the proposed method, we modified the reference software of the H.264/AVC standard. Joint model (JM) version 12.2 reference software was used for modification and comparison. All sequences [“Coastguard,” “Foreman,” “Carphone,” and “TableTennis” (QCIF,
$15\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$
) and “Coastguard,” “Foreman,” “Paris,” and “TableTennis” (CIF,
$30\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$
)] have their first 300 frames encoded with four quantization parameters (QPs) of 28, 32, 38, and 40. To obtain a more precise MV, no fast motion estimation process was used. The performance of the proposed method was evaluated in terms of the BDBR and BDPSNR.^{6} Those quantities give the average bit rate and PSNR difference of the proposed method compared to the H.264/AVC standard, which always uses median PMV to encode MVs.

Table 1 shows the BDBR and BDPSNR of the proposed method compared to the H.264/AVC standard. As described in Table 1, the experimental results show that the proposed method decreases the number of bits compared with the H.264/AVC standard by about 2.97% on average. This is because the proposed method selects a more precise PMV than the H.264/AVC standard. In particular, some sequences with fast, nonlinear motion such as “Foreman” and “TableTennis” show better performance when a precise PMV is used. With higher QP values, the proposed method works better. This is because motion vector takes a larger bit portion at the lower bit rate, and thus there is more space to improve the coding efficiency.

## Table 1

Coding performance of the proposed method.

Format | Sequence | BDPSNR(dB) | BDBR(%) |
---|---|---|---|

QCIF | “Coastguard” | 0.096 | $-2.606$ |

“Foreman” | 0.191 | $-3.463$ | |

“Carphone” | 0.133 | $-2.719$ | |

“TableTennis” | 0.176 | $-3.394$ | |

CIF | “Coastguard” | 0.086 | $-2.411$ |

“Foreman” | 0.123 | $-2.799$ | |

“Paris” | 0.152 | $-2.875$ | |

“TableTennis” | 0.150 | $-3.494$ | |

Average | 0.138 | $-2.970$ |

## 4.

## Conclusions

In this paper, we have proposed a new motion vector coding method using optimal PMV selection. By selecting the optimal PMV, which requires minimal bits to encode MV information, the proposed method can decrease the number of compressed MV bits compared with H.264/AVC. In particular, the proposed method is effective on sequences with fast and nonlinear motion activities. If more candidate PMVs are used, the proposed method can be even more effective without requiring additional signaling information.

## Acknowledgment

This work was supported by a Korea Science and Engineering Foundation (KOSEF) NRL Program grant funded by the Korean government (MEST) (ROA-2006-000-10826-0(2008)).

## References

*ITU-T Rec. H. 264, ISO/IEC 14496-10 AVC*(2003). Google Scholar

*ITU-T SG16, VCEG-M13*(2001). Google Scholar