## 1.

## Introduction

In block-based video coders, motion estimation (ME) carries a great significance because of its impact on the compression efficiency. In order to achieve high compression efficiency, ME is performed with quarter-pel (Qpel) accuracy as well as half-pel (Hpel) and integer-pel accuracies in the H.264/AVC standard.^{1} Even though ME with a high accuracy generally reduces the bits required for encoding the difference frame, it often compromises the total bit rate because the bits required for encoding the motion vector (MV) grows as the motion vector accuracy (MVA) increases.

Various methods of obtaining the optimal MVA has been introduced in the literature.^{2, 3} In Ref. 2, an optimal MVA is derived for each macroblock (MB) and for each frame. The optimal MVA formula in Ref. 2 reveals that the MVA is dependent on the texture and the interframe noise of the MB. In Ref. 3, the MVA is adaptively determined for each MB by examining all possible MVAs and selecting the one with the minimum Lagrange cost. However, the coding gain of these methods is limited, since additional bits indicating the MVA need to be encoded.

In this letter, a novel MVA decision algorithm for H.264/AVC is presented. The proposed method determines the validity of Qpel ME for each MB. Since no additional bit is required to indicate the MVA, the proposed algorithm can be implemented without modifying the syntax of the H.264/AVC standard. Then, in order to achieve the coding gain, we also propose an MV encoding technique that adaptively changes the variable length coding (VLC) table according to the MVA of the MB.

## 2.

## Proposed Algorithm

The proposed algorithm consists of two techniques. We first present an adaptive MV encoding technique for H.264/AVC. Then, based on the proposed MV coding technique, we also propose an MVA decision technique.

In H.264/AVC, not an original MV itself but the difference between the original MV and the predicted motion vector (PMV),^{1} the motion vector difference (MVD), is encoded. Let
$\Delta V$
denote the MVD defined as follows:

For notational simplicity, we first define three motion vector sets:

## Eq. 2

$${\mathit{MV}}_{\mathit{\text{all}}}=\left\{(u,v)\right|u,v\u220a0,-1\u22154,1\u22154;-1\u22152,1\u22152,\dots \},$$## Eq. 3

$${\mathit{MV}}_{\mathit{Qpel}}=\left\{(u,v)\right|u,v\u220a0,-1\u22154,1\u22154,-3\u22154,3\u22154,\dots \},$$## Eq. 4

$${\mathit{MV}}_{\mathit{Hpel}}=\left\{(u,v)\right|u,v\u220a0,-1\u22152,1\u22152,-1,1,\dots \}.$$Assume that ME is performed up to Hpel accuracy for the current MB, i.e., ${V}_{0}\u220a{\mathit{MV}}_{\mathit{Hpel}}$ . Then, $\Delta V$ is an element of either ${\mathit{MV}}_{\mathit{Hpel}}$ or ${\mathit{MV}}_{\mathit{Qpel}}$ depending on ${V}_{p}$ as follows:

## Eq. 5

$$\Delta V\u220a\{\begin{array}{cc}{\mathit{MV}}_{\mathit{Hpel}},& \text{if}\phantom{\rule{0.3em}{0ex}}{V}_{p}\u220a{\mathit{MV}}_{\mathit{Hpel}},\\ {\mathit{MV}}_{\mathit{Qpel}},& \text{if}\phantom{\rule{0.3em}{0ex}}{V}_{p}\u220a{\mathit{MV}}_{\mathit{Qpel}}.\end{array}\phantom{\}}$$Note that the number of possible $\Delta V$ values is halved if Qpel ME is not applied. Therefore, instead of using a VLC table including all possible MVD values, ${\mathit{VLC}}_{\mathit{All}}$ , a reduced-size VLC table containing either Hpel or Qpel MVD values, ${\mathit{VLC}}_{\mathit{Hpel}}$ or ${\mathit{VLC}}_{\mathit{Qpel}}$ , can be used. By adaptively changing the VLC table according to the MVA of the MB, the bits required to encode MVD in H.264/AVC can be effectively reduced.

The bits required for encoding the MV can be reduced by skipping Qpel ME. However, Qpel ME needs to be omitted when it has a negligible impact on ME performance. In order to determine whether Qpel ME is necessary, we consider two cases. In the first case (C1), we skip Qpel ME for all MBs and encode the MVDs using ${\mathit{VLC}}_{\mathit{Hpel}}$ . We allow, in the second case (C2), Qpel ME for all MBs. Then, the resulting MVDs are encoded using ${\mathit{VLC}}_{\mathit{All}}$ . By comparing C1 and C2, the effect of Qpel ME can be analyzed.

Let *dRDcost* denote the difference between the rate distortion costs (RDcosts) of C1 and C2. If *dRDcost* is positive, we can interpret that Qpel ME is required for the MB. This is because the loss caused by skipping Qpel ME is larger than the gain achieved by the proposed MVD coding technique. In the other case, Qpel ME can be considered unnecessary. Now, the remaining problem is to find the elements which affect *dRDcost*.

Motivated by the optimal MVA formula,^{2} we claim that the necessity of Qpel ME increases as the spatial and temporal complexity of the MB increases. In our work, we estimate the spatial and temporal complexity of the current MB from the MB at the reference frame indicated by
${V}_{p}$
, which is also available at the decoder. First, we measure the sum of the gradients of the luminance component as a spatial complexity metric.^{4} Let
$\Delta {Y}_{h}$
and
$\Delta {Y}_{v}$
denote the horizontal and vertical gradient defined as

## 6.

## Eq. 7

$${D}_{p}=\frac{1}{{N}^{2}}\sum _{i=0}^{N-1}\sum _{j=0}^{N-1}\phantom{\rule{0.2em}{0ex}}\mathrm{max}[\Delta {Y}_{h}({x}_{p}+i,{y}_{p}+j),\Delta {Y}_{v}({x}_{p}+i,{y}_{p}+j)],$$Based on the preceding spatial and temporal complexity metrics, we examine the relation between *dRDcost* and complexity metrics. Figure 1 shows an example of the *dRDcost* result obtained by using the 80th frame of the Foreman test sequences. In this example, the quantization parameter (QP) is set to 36, one reference frame is used with CABAC coding, and the other experimental conditions are given in Table 1. For each MB, *dRDcost* is computed by performing C1 and C2, and
$o$
or
$x$
is assigned to represent whether its value is positive or not. We can see that if the temporal or spatial complexity of the MB is high, Qpel ME tends to be advantageous. In the other case, where the spatial and temporal complexities are both low, the skipping of Qpel ME is beneficial in most cases. This tendency is consistent with the results in Ref. 2. To this end, we define an exponential curve to determine whether Qpel ME is advantageous:

^{5}By using test sequences in Ref. 6 and different QPs in Table 1, $a$ and $b$ are achieved by 9.29 and $-2.99\times {10}^{-2}$ , respectively. Then, Qpel ME is applied only when the estimated complexity point, $(\left|{V}_{p}\right|,{D}_{p})$ , is located above the curve of Eq. 9, where Qpel ME tends to improve the coding efficiency.

## Table 1

Experimental conditions.

Software | JM 16.0 (Ref. 7) |

Profile | High |

Prediction structure | IPPP |

Sequence resolution | CIF $(352\times 288)$ , $720\mathrm{p}$ $(1280\times 720)$ |

Number of encoded frames | 100 |

Number of reference frames | 3 |

QP | 32, 36, 40, 44 |

ME | Exhaustive ( $1\u22154$ resolution) |

ME search range | 32 (CIF), 64 $\left(720\mathrm{p}\right)$ |

Rate distortion optimization | Enabled |

Entropy coding | CAVLC, CABAC |

Figure 2 summarizes the procedure of the proposed algorithm at the encoder. When encoding the MB, ${D}_{p}$ and $f\left(\left|{V}_{p}\right|\right)$ are calculated by using the PMV of INTER- $16\times 16$ mode, and the Qpel ME is applied if ${D}_{p}<f\left(\left|{V}_{p}\right|\right)$ . This Qpel ME decision result is shared for all subpartitioned blocks. In other words, we do not allow each subpartitioned block to have different MVA. At the decoder, ${D}_{p}$ is similarly obtained by Eq. 7. Then, if ${D}_{p}<f\left(\left|{V}_{p}\right|\right)$ , the received MVD bits are decoded using either ${\mathit{VLC}}_{\mathit{Qpel}}$ or ${\mathit{VLC}}_{\mathit{Hpel}}$ , depending on the PMV. Since the same PMV should be used at both encoder and decoder, the PMV of INTER- $16\times 16$ mode is used at the decoder. In the other case, the MVD bits are decoded by using ${\mathit{VLC}}_{\mathit{All}}$ as the conventional decoder.

## 3.

## Experimental Results and Conclusion

In order to evaluate the performance of the proposed algorithm, the proposed method is compared with the conventional algorithm in Ref. 3. A PC with an Intel Core2 Quad,
$2.67\text{-}\mathrm{GHz}$
CPU, and 8 GB RAM is used. The detailed experimental conditions are given in Table 1. The changes of Bjontegaard Delta (BD) rate
$\left(\Delta \text{bit}\right)$
, encoding time
$\left(\Delta {T}_{E}\right)$
, and decoding time
$\left(\Delta {T}_{D}\right)$
are used to measure the performance.^{7}

Table 2 indicates that the proposed algorithm improves the coding efficiency of the original JM 16.0 by 2.97% and 2.77% for CABAC and CALVC, respectively.^{8} Since the proposed algorithm does not require any overhead bit to indicate the MVA, superior coding efficiency is obtained when compared to the conventional algorithm. Here, it should be noted that the Bigships and Jets sequences are encoded by using 151th to 250th and 301th to 400th frames, where shot changes occur, respectively. Since the intracoding outperforms the intercoding in such cases, the performance of the proposed algorithm is not deteriorated.

## Table 2

Performance of the proposed algorithm.

Sequence | Ref. 2 (CABAC/CAVLC) | Proposed (CABAC/CAVLC) | |||||
---|---|---|---|---|---|---|---|

Δbit(%) | ΔTE(%) | ΔTD(%) | Δbit(%) | ΔTE(%) | ΔTD(%) | ||

CIF | Crew | $-0.69\u2215-0.28$ | $0.75\u22151.24$ | $0.15\u22150.04$ | $-2.72\u2215-2.03$ | $-4.90\u2215-4.54$ | $-3.15\u2215-4.15$ |

Foreman | $-0.07\u2215-0.19$ | $1.01\u22150.19$ | $0.28\u22150.21$ | $-3.15\u2215-3.15$ | $-5.01\u2215-4.17$ | $-5.28\u2215-4.58$ | |

Salesman | $-0.49\u2215-0.18$ | $-0.15\u2215-0.36$ | $0.17\u22150.09$ | $-2.63\u2215-2.18$ | $-5.41\u2215-4.45$ | $-3.45\u2215-1.67$ | |

Soccer | $-0.23\u2215-0.42$ | $0.23\u22150.22$ | $0.13\u2215-0.12$ | $-2.65\u2215-2.80$ | $-3.91\u2215-4.92$ | $-4.93\u2215-4.24$ | |

$720\mathrm{p}$ | Bigships | $-1.12\u2215-0.69$ | $1.88\u22151.02$ | $0.48\u22150.31$ | $-2.11\u2215-2.66$ | $-1.97\u2215-1.83$ | $-5.78\u2215-6.12$ |

City | $-1.23\u2215-1.36$ | $0.39\u22151.03$ | $0.75\u22150.41$ | $-2.46\u2215-2.08$ | $-2.01\u2215-1.89$ | $-4.57\u2215-3.99$ | |

Jets | $-1.20\u2215-1.06$ | $0.44\u22151.29$ | $1.23\u22150.89$ | $-3.62\u2215-2.51$ | $-2.31\u2215-3.17$ | $-4.78\u2215-4.45$ | |

Raven | $-2.03\u2215-2.70$ | $0.92\u22150.35$ | $0.69\u22150.98$ | $-4.43\u2215-4.72$ | $-2.02\u2215-2.32$ | $-5.46\u2215-4.96$ | |

Average | $-0.88\u2215-0.86$ | $0.68\u22150.71$ | $0.48\u22150.34$ | $-2.97\u2215-2.77$ | $-3.57\u2215-3.41$ | $-4.68\u2215-4.27$ |

From the viewpoint of the computational complexity, since all MVAs are examined and the best one is selected,^{3} the complexity of the original JM 16.0 encoder and decoder is maintained or slightly increased. In the proposed algorithm, by skipping unnecessary Qpel ME, additional computation for
${D}_{p}$
at the encoder is compensated and even a slight encoding time saving of 3.57% and 3.41% is achieved by CABAC and CAVLC, respectively. In addition, although the decoder should compute
${D}_{p}$
for each MB, the decoding time is also saved by 4.68% and 4.27% on average for CABAC and CAVLC, respectively. This is because the interpolation time for the motion compensation is decreased due to the skipping of the unnecessary Qpel ME at the encoder.

In this letter, we first presented an adaptive MVD coding scheme. Then, in order to apply the adaptive MVD coding technique effectively, we also proposed an algorithm that selectively performs Qpel ME based on the spatial and temporal complexity of the MB. The experimental results demonstrated that the proposed algorithm improves coding efficiency without requiring the computational overhead at both encoder and decoder.

## Acknowledgments

This research was supported by Seoul Future Contents Convergence (SFCC) Cluster established by Seoul R&BD Program. This work was also supported by a Korea Science and Engineering Foundation (KOSEF) Grant funded by the Korean Government (MEST) (No. 2009-0080547).

## References

**,” IEEE Trans. Circuits Syst. Video Technol., 13 (7), 560 –576 (2003). https://doi.org/10.1109/TCSVT.2003.815165 1051-8215 Google Scholar**

*Overview of the H.264/AVC video coding standard***,” IEEE Trans. Circuits Syst. Video Technol., 11 (4), 497 –511 (2001). https://doi.org/10.1109/76.915356 1051-8215 Google Scholar**

*Optimizing motion vector accuracy in block-based video coding***,” Proc. IEEE ICIP, 1 1012 –1015 (2000). Google Scholar**

*Benefits of adaptive motion accuracy in H.26L video coding***,” IEEE Trans. Instrum. Meas., 57 (8), 1779 –1785 (2008). https://doi.org/10.1109/TIM.2008.925715 0018-9456 Google Scholar**

*Gradient-based synthesized multiple exposure time color HDR image***,” Neural Process. Lett., 9 293 –300 (1999). https://doi.org/10.1023/A:1018628609742 Google Scholar**

*Least squares support vector machine classifiers***,” (2008) Google Scholar**

*Recommended simulation common conditions for coding efficiency experiments revision 4***,” (2001) Google Scholar**

*Calculation of average PSNR differences between RD-curves*