## 1.

## Introduction

Block matching motion estimation is the core process in various video coding standards because of its efficiency in removing the temporal redundancy. However, motion estimation is very computationally intensive, if the exhaustive fast search algorithm (FSA) is employed to find the optimal matched block in the reference frame. To alleviate the heavy computation, many fast algorithms, such as the well-known three-step search (3SS),^{1} the four-step search (4SS),^{2} the diamond search (DS),^{3} and the hexagon-based search (HEXBS),^{4} have been proposed. Except for the successive elimination algorithm (SEA),^{5} these algorithms achieve fewer computations by sampling the search points in the search range, i.e., searching over a subset of possible candidate points with certain search patterns. Although algorithms in this category considerably reduce the computational complexity, they occasionally produce ambiguous search directions, and hence result in a local minimum.

This letter proposes a new fast motion estimation algorithm, which focuses on reducing computation not by sampling the search points but by reducing the size of the search range. Based on the assumption of convexity of the unimodal error surface, the proposed method makes use of the fact that the search center is probably close to the optimal matching point when the initial cost at the search center is efficiently low. Thus, in such cases, it’s possible to reduce the size of the search range for increased speed without much increase in matching error, which is the basic concept of the proposed method.

## 2.

## Proposed Method

In general, motion vectors of spatially adjacent blocks are highly correlated if they belong to the same moving area. Having a predictive motion vector
$\left({\mathrm{MV}}_{\mathrm{p}}\right)$
for a current block as the search center, as in the H.264/AVC JM,^{6, 7} the initial cost at the search center
$J\left({\mathrm{MV}}_{p}\right)$
is a good estimate for correlation between the motion vector of the current block
$\left({\mathrm{MV}}_{\mathrm{c}}\right)$
and
${\mathrm{MV}}_{\mathrm{p}}$
. If
$J\left({\mathrm{MV}}_{\mathrm{p}}\right)$
is low, it means that
${\mathrm{MV}}_{\mathrm{p}}$
is probably close to
${\mathrm{MV}}_{\mathrm{c}}$
, and hence the size of the search range can be reduced to a proper size without missing the global minimum. On the other hand, if
$J\left({\mathrm{MV}}_{\mathrm{p}}\right)$
is high, it means that
${\mathrm{MV}}_{\mathrm{p}}$
is not so correlated with
${\mathrm{MV}}_{\mathrm{c}}$
, and thus the size of the search range should be larger in order to include the global minimum in the search range. Based on this background, the block matching process for each block is performed as follows:

1. The ${\mathrm{MV}}_{\mathrm{p}}$ for a current block is determined in the same way as H.264/AVC, and the search center is set to ${\mathrm{MV}}_{\mathrm{p}}$ .

2. Using ${\mathrm{MV}}_{\mathrm{p}}$ , $J\left({\mathrm{MV}}_{\mathrm{p}}\right)$ is measured according to the definition of the cost.

3. With $J\left({\mathrm{MV}}_{\mathrm{p}}\right)$ , the size of the search range for a current block, $S{R}_{var}$ , is adjusted as

where $S{R}_{fixed}$ denotes the size of the fixed search range used in JM, and $L+1$ represents the number of search range sizes in the adaptive search range adjustment (ASRA)–based method. Note that $S{R}_{var}$ reduces to the power of 2 as $J\left({\mathrm{MV}}_{\mathrm{p}}\right)$ decreases. For $S{R}_{fixed}=16$ and $L=2$ , there are three search range sizes of $\pm 16$ , $\pm 8$ , and $\pm 4$ .## 1

$$S{R}_{var}=S{R}_{fixed}\u2215{2}^{L-k},\phantom{\rule{1em}{0ex}}\mathrm{if}\phantom{\rule{0.3em}{0ex}}{T}_{k}<J\left({\mathrm{MV}}_{\mathrm{p}}\right)\u2a7d{T}_{k+1},\phantom{\rule{1em}{0ex}}k=0,1,\dots ,L,$$4. The FSA is performed to estimate ${\mathrm{MV}}_{\mathrm{c}}$ in the adjusted search range.

## 3.

## Threshold Decision

With $S{R}_{fixed}=16$ and $L=2$ , two thresholds ${T}_{1}$ and ${T}_{2}$ $({T}_{1}<{T}_{2})$ must be given to discriminate the ranges of $J\left({\mathrm{MV}}_{\mathrm{p}}\right)$ for the three kinds of search range such as

## 2

$$S{R}_{\mathrm{var}}=\{\begin{array}{c}4,\phantom{\rule{1em}{0ex}}\mathrm{if}\phantom{\rule{0.3em}{0ex}}J\left({\mathrm{MV}}_{\mathrm{P}}\right)<{T}_{1}\hfill \\ 8,\phantom{\rule{1em}{0ex}}\mathrm{else}\phantom{\rule{0.3em}{0ex}}\mathrm{if}\phantom{\rule{0.3em}{0ex}}J\left({\mathrm{MV}}_{\mathrm{P}}\right)<{T}_{2}\hfill \\ 16,\phantom{\rule{1em}{0ex}}\mathrm{else}\hfill \end{array}.$$## 4.

## Experimental Results

To evaluate the performance of the proposed technique, Foreman, Stefan, and Soccer sequences with the CIF format were tested with H.264/AVC JM9.4. The encoding parameters are as follows: number of frames—200, frame rate— $15\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$ , target bit rate— $256\phantom{\rule{0.3em}{0ex}}\mathrm{kbit}\u2215\mathrm{s}$ ; RD optimization off; $16\times 16$ , $16\times 8$ , $8\times 16$ , and $8\times 8$ block modes on; number of reference frames—1.

Table 1 shows the performance comparison between the proposed algorithm and the conventional FSA. Note that the spiral search method with ${\mathrm{MV}}_{\mathrm{p}}$ as the initial search point is used in both methods. The computational complexity is measured relative to the FSA with a $\pm 16$ search range. The results are summarized as follows:

• In terms of computational complexity, the proposed method has a significant gain. When $\alpha =2.0$ , the computational complexity is approximately 30% compared to the FSA with a $\pm 16$ search range and nearly equivalent to the FSA with a $\pm 8$ search range.

• In terms of reconstructed image quality, the proposed method keeps up with the FSA with a $\pm 16$ search range. The average PSNR decreases $0.2\phantom{\rule{0.3em}{0ex}}\mathrm{dB}$ at most. In contrast, the FSA with a $\pm 8$ search range exhibits severe degradation from $-0.4\phantom{\rule{0.3em}{0ex}}\mathrm{dB}$ to $-1.1\phantom{\rule{0.3em}{0ex}}\mathrm{dB}$ . This is because those blocks having large motion cannot be accurately motion-compensated with a small fixed search range. On the contrary, for such cases, the proposed ASRA-based approach may provide the full search range of $\pm 16$ , so the accuracy of the motion estimation can be maintained. Table 2 shows the distributions of selected search ranges in the proposed method.

## Table 1

Performance comparison between the proposed approach and the conventional FSA.

Foreman | Stefan | Soccer | ||||
---|---|---|---|---|---|---|

Method | PSNR | Complexity | PSNR | Complexity | PSNR | Complexity |

$\mathrm{FSA}(\pm 16)$ | 35.34 | 1 | 28.39 | 1 | 31.52 | 1 |

$\mathrm{FSA}(\pm 8)$ | 34.98 | 0.265 | 27.07 | 0.265 | 30.95 | 0.265 |

$\mathrm{ASRA}(\alpha =1.0)$ | 35.29 | 0.490 | 28.29 | 0.503 | 31.43 | 0.506 |

$\mathrm{ASRA}(\alpha =1.5)$ | 35.27 | 0.343 | 28.20 | 0.348 | 31.36 | 0.333 |

$\mathrm{ASRA}(\alpha =2.0)$ | 35.25 | 0.273 | 28.18 | 0.261 | 31.32 | 0.275 |

## Table 2

Distribution of selected search ranges in the proposed approach.

Foreman (%) | Stefan (%) | Soccer (%) | |||||||
---|---|---|---|---|---|---|---|---|---|

Method | ±16 | ±8 | ±4 | ±16 | ±8 | ±4 | ±16 | ±8 | ±4 |

$\mathrm{ASRA}(\alpha =1.0)$ | 34.4 | 36.1 | 29.5 | 26.1 | 37.3 | 36.6 | 36.2 | 35.7 | 28.1 |

$\mathrm{ASRA}(\alpha =1.5)$ | 22.0 | 23.0 | 55.0 | 16.2 | 23.7 | 60.1 | 21.6 | 20.3 | 58.1 |

$\mathrm{ASRA}(\alpha =2.0)$ | 16.5 | 14.8 | 68.7 | 10.7 | 15.4 | 73.9 | 17.1 | 12.3 | 70.6 |

Figure 2 shows PSNR and complexity graphs for a range of $\alpha $ . We can observe that as $\alpha $ grows, the reduction ratio of the complexity decreases, and it is nearly saturated over $\alpha =2.0$ , whereas the PSNR drops continually, although the degree of drop is negligible. From this result, a value about 2.0 seems proper as the empirical value for $\alpha $ .

## 5.

## Conclusions

This letter has presented a fast motion estimation algorithm based on the ASRA. The size of the search range is adaptively adjusted according to the initial cost at the search center. Experimental results demonstrate that the proposed method saves a significant part of computations while providing comparable performance to the conventional FSA. If we combine our method with a fast FSA such as SEA, additional reduction in computation can be simply obtained. In addition, the ASRA-based approach has a strong point with respect to the hardware implementation, since it has a regular structure like the FSA.

## Acknowledgments

This research was supported by the Ministry of Information and Communication (MIC), Korea, under the Information Technology Research Center (ITRC) support program supervised by the Institute of Information Technology Advancement (IITA), IITA-2006-(C1090-0603-0002).