## 1.

## Introduction

The H.264 advanced video coding (AVC) standard^{1} is the newest standard from the ITU-T video coding experts group and the ISO/IEC moving pictures experts group. Its main advantages are the great variety of applications in which it can be used and its versatile design. This standard has shown significant rate distortion (RD) improvements, as compared to previous standards for video compression.

Although the standard has shown significant RD improvements, it has also increased the overall encoding complexity due to the very refined motion estimation (ME) and mode decision processes where variable block size ME is employed. In H.264, there are seven different block sizes that can be used in intermode coding (16×16 = mode 1, 16×8 = mode 2, 8×16 = mode 3, 8×8 = mode 4, 8×4 = mode 5, 4×8 = mode 6, and 4×4 = mode 7). In addition, the SKIP mode (mode 0), direct mode, and two intramodes (INTRA_4 and INTRA_16) are also supported. To achieve the highest coding efficiency, the encoder tries all the possible modes and selects the best one which minimizes the RD cost.

However, this method is not computationally efficient, and consequently limits the use of H.264 encoders in real-time applications. Therefore, algorithms which can reduce computational complexity of H.264 encoding without compromising coding efficiency are very desirable for real-time implementation of H.264 encoders.

Several fast mode decisions^{2, 3, 4, 5, 6} have been proposed in the literature. This section provides a review of some existing fast intermode decision techniques and their limitations.

In Ref. 2 the mean of absolute difference between the current and the co-located block in the reference frames have been used to predict the modes. This scheme achieved up to 48% computational cost reduction. A similar algorithm has been proposed in Ref. 3, where the sum of the absolute difference value of the current MB is calculated and compared to a threshold. Based on the comparison results, the modes are selected adaptively.

In Ref. 4, the motion vector information has been used to predict the modes and the scheme utilizes the spatial property of the motion vector to predict the modes efficiently.

Another fast intermode decision algorithm based on temporal correlation of modes in *P* slices was proposed in Ref. 5. A time reduction of 57% on average was claimed, with a bitrate increment of 0.07% and a loss of 0.05 dB, as compared to the standard. However, if the local temporal information is unreliable, for example, when the scene changes, the RD performance will be degraded because of mode misprediction.

A recently developed algorithm was proposed in Ref. 6. This scheme achieves up to 63% time savings when compared to the standard reference software. However, the algorithm is based on heuristic analysis obtained from a set of video sequences which can lead to a significant RD degradation if the algorithm is used to encode sequences with different characteristics. Furthermore, the spatial correlations between MBs have been exploited and this correlation is unreliable for sequences with a complex background.

From the information above, it can be seen that fast intermode decision algorithms can achieve time savings in the range of 40% to 65% with some RD performance degradations. It also can be noticed that all the fast intermode decision schemes are based on spatial domain ME information.

Recently, there has been a lot of interest in motion estimation techniques operating in the frequency domain. These are commonly based on the principle of cyclic correlation and offer well-documented advantages in terms of computational efficiency due to the employment of fast algorithms. One of the best-known methods in this class is phase correlation,^{7} which has become one of the ME methods of choice for a wide range of professional studio and broadcasting applications.^{8} In addition to computational efficiency, phase correlation offers key advantages in terms of its strong response to edges and salient picture features, its immunity to illumination changes and moving shadows, and its ability to measure large displacements. Several attempts^{9, 10} have been proposed to adapt the phase correlation to the standard. In Ref. 9, the authors proposed an adaptive block size phase correlation ME, which has been compared to the full search block matching (FSBM) algorithm.^{11} The comparison results indicated a significant increase in the bitrate. Furthermore, block sizes up to 32×32 were used to estimate the motion which increases the computational complexity. In Ref. 10, the authors used the phase correlation to predict the ME block size by generating a binary matrix, and then selected the mode from the binary matrix. Although the authors claimed a 50% reduction in the ME time, the algorithm showed significant RD performance degradation for slow video sequences.

In this paper, we propose a novel fast mode decision algorithm. In addition to saving up to 97% of the ME time for similar RD performances, our algorithm differs from the above-mentioned algorithms as it preprocesses the macroblock in the frequency domain using 16×16 phase correlation, and based on these results, we directly predict the mode and the search range.

The rest of the paper is organized as follows. Section 2 describes the proposed mode decision algorithm. Section 3 contains a comprehensive list of experiments and a discussion. Section 4 concludes the paper.

## 2.

## Proposed Scheme

In video compression, knowledge of motion helps to exploit similarity between adjacent and nearby frames in the sequence, and remove the temporal redundancy between neighboring frames in addition to the spatial and spectral redundancies.^{12} The phase correlation method measures the movement between the two fields directly from their phases. The basic principles are described below.

Assuming a translational shift between the two frames:

## 1

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} s_t \left( {x,y} \right) = s_{t + 1} \left( {x + \Delta x,y + \Delta y} \right). \end{equation}\end{document} $${s}_{t}\left(x,y\right)={s}_{t+1}\left(x+\Delta x,y+\Delta y\right).$$## 2

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} S_t \left( {f_1,f_2 } \right) = S_{t + 1} \left( {f_1,f_2 } \right)\exp \left[ {2j\pi \left( {f_1 \Delta x + f_2 \Delta y} \right)} \right]. \end{equation}\end{document} $${S}_{t}\left({f}_{1},{f}_{2}\right)={S}_{t+1}\left({f}_{1},{f}_{2}\right)\mathrm{exp}\left[2j\pi \left({f}_{1}\Delta x+{f}_{2}\Delta y\right)\right].$$## 3

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} C_{t,t + 1} \left( {f_1,f_2 } \right) = S_{t + 1} \left( {f_1,f_2 } \right) \cdot S_t \left( {f_1,f_2 } \right). \end{equation}\end{document} $${C}_{t,t+1}\left({f}_{1},{f}_{2}\right)={S}_{t+1}\left({f}_{1},{f}_{2}\right)\xb7{S}_{t}\left({f}_{1},{f}_{2}\right).$$## 4

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} R_{t,t + 1} \left( {f_1,f_2 } \right) = \frac{{S_{t + 1} \left( {f_1,f_2 } \right) \cdot S_t^* \left( {f_1,f_2 } \right)}}{{\left| {S_{t + 1} \left( {f_1,f_2 } \right) \cdot S_t^* \left( {f_1,f_2 } \right)} \right|}}. \end{equation}\end{document} $${R}_{t,t+1}\left({f}_{1},{f}_{2}\right)=\frac{{S}_{t+1}\left({f}_{1},{f}_{2}\right)\xb7{S}_{t}^{*}\left({f}_{1},{f}_{2}\right)}{\left|{S}_{t+1}\left({f}_{1},{f}_{2}\right)\xb7{S}_{t}^{*}\left({f}_{1},{f}_{2}\right)\right|}.$$## 5

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} R_{t,t + 1} \left( {f_1,f_2 } \right) = \exp \left[ { - 2j\pi \left( {f_1 \Delta x + f_2 \Delta y} \right)} \right]. \end{equation}\end{document} $${R}_{t,t+1}\left({f}_{1},{f}_{2}\right)=\mathrm{exp}\left[-2j\pi \left({f}_{1}\Delta x+{f}_{2}\Delta y\right)\right].$$## 6

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} c_{t,t + 1} \left( {x_1,y_1 } \right) = \delta \left( {x_1 - \Delta x,y_1 - \Delta y} \right). \end{equation}\end{document} $${c}_{t,t+1}\left({x}_{1},{y}_{1}\right)=\delta \left({x}_{1}-\Delta x,{y}_{1}-\Delta y\right).$$Using the above insights, we developed the following algorithm: if the correlation value is equal to 1, then we choose the SKIP mode as the best mode. Otherwise, if the correlation value is greater than or equal to 0.8, we limit the mode selection process to modes {0, 1, 2, and 3}. Additionally, we limit the search range to 8. Finally, if the correlation value is less than 0.8, we enable all the modes and ME is performed using the defined search range. The proposed algorithm is shown in flowchart form in Fig. 1.

## 3.

## Experimental Results

To assess the proposed algorithm, a comprehensive set of experiments for eight kinds of video sequences with different motion characteristics was performed.

The chosen search range was 32 pixels for the full ME. The configuration file for the encoder had the following settings: RD optimization ON, IPPP structure, CABAC coding, and the number of reference slices was 1.

In these experiments, the source code for the H.264 Reference Software Version JM14.2 (Ref. 11) was used. Four sizes, QCIF (176×144), CIF (352×288), (640×480), and (1024×768) were used in an Intel Core 2 CPU 6420 @ 2.13 GHz with 2.0 GB RAM. The Intel VTune performance analyzer was used to measure the number of machine cycles differences, reflecting the total encoding time.

Table 1 shows the percentage cycle savings, the percentage search point savings, the Bjontegaard Delta bit rate (BDBR) percentage differences, and the Bjontegaard delta peak signal-to-noise ratio (BDPSNR) differences (in decibels)^{13} between the JM software and the proposed new algorithm, and between the proposed algorithm and the algorithm proposed in Ref. 13. In the first comparison, Table 1 shows that the BDBR differences are in the range of 0.2 to 1.3, while the BDPSNR differences are in the range of −0.08 to −0.01. The minus signs denote PSNR degradation and bitrate savings, respectively. This clearly shows that the proposed algorithm has very similar RD performance to H.264/AVC reference software. Furthermore, ME time savings up to 97% and percentage cycle savings up to 67% are observed. It also can be seen that the reduction in the CPU cycles depends on the characteristics of the image sequences. For a slow image sequence with a simple background, the reduction is much more significant than for fast image sequences or sequences with a more complex background. The reason for this is that in slow video sequences, the number of big block sizes increases significantly.

## Table 1

Comparison on BDPSNR and BDBR cycle differences and ME time saving between the proposed algorithm and JM software and the algorithm proposed in Ref. 3.

Against the JM software | Against the algorithm proposed in Ref. 6 | |||||||
---|---|---|---|---|---|---|---|---|

Sequence | Size | BDPSNR (dB) | BDBR (%) | Cycles Saving (%) | ME Time Saving (%) | BDPSNR (dB) | BDBR (%) | ME Time Saving (%) |

Akiyo | QCIF | −0.08 | +1.3 | 66.98 | 97.02 | 0.03 | −0.4 | 48.12 |

CIF | −0.04 | +0.96 | 57.36 | 86.49 | 0.04 | 0.24 | 42.65 | |

Foreman | QCIF | −0.05 | +1.22 | 36.43 | 45.17 | 0.06 | −0.7 | 22.53 |

CIF | −0.04 | +1.14 | 35.74 | 44.68 | 0.02 | −0.05 | 21.45 | |

Tempete | QCIF | −0.01 | +0.61 | 39.75 | 44.31 | 0.04 | 0.03 | 24.09 |

CIF | −0.05 | +0.76 | 40.07 | 46.8 | 0.01 | −0.45 | 26.76 | |

Silent | QCIF | −0.01 | +0.36 | 60.53 | 80.9 | 0.03 | 0.51 | 50.32 |

CIF | −0.03 | +0.77 | 52.69 | 75.02 | 0.06 | 0.25 | 47.22 | |

Stefan | QCIF | −0.03 | +0.3 | 30.54 | 36.53 | 0.05 | 0.61 | 19.66 |

CIF | −0.04 | +0.5 | 29.39 | 35.81 | 0.06 | −0.32 | 18.55 | |

Mobile | QCIF | −0.02 | +0.2 | 26.06 | 34.88 | 0.07 | 0.01 | 22.49 |

CIF | −0.05 | +0.6 | 27.4 | 32.74 | 0.01 | 0.83 | 20.87 | |

Rena | 640×480 | −0.05 | +0.8 | 42.5 | 64.5 | −0.03 | 0.9 | 29.8 |

Uli | 1024×768 | −0.03 | +0.6 | 39.6 | 51.8 | 0.05 | −0.04 | 26.72 |

Average | −0.04 | +0.7 | 41.8 | 55.4 | 0.04 | 0.1 | 30.08 |

The second comparison in Table 1 indicates that the proposed algorithm consistently outperforms a recently proposed approach^{3} in all aspects; an average of 30% encoding time savings, 0.04 dB PSNR improvement, and 0.1% total bit rate reduction.

Moreover, when comparing the results to the results in Ref. 10, in addition to the significant time reduction gain (40%), the RD performance is maintained similar to the JM software for the various sequences, while in Ref. 10, the performances have been degraded rather significantly for some of the sequences.

## 4.

## Conclusion

The H.264/AVC increases memory bandwidth and spends a significant amount of processing time for the motion estimation process in order to determine the optimal motion vector. As a means of increasing the coding efficiency, in this paper, we proposed a fast mode decision and a motion estimation scheme with rate distortion performance similar to the standard. Our technique can reduce up to 97% of the ME time (67% in CPU cycles), resulting in significant time/cycle savings as compared to H.264/AVC. It is very relevant to low complexity video coding systems.