1 March 2009 Fast subpixel motion estimation based on the interpolation effect on different block sizes for H264/AVC
Author Affiliations +
Abstract
We propose a fast subpixel motion estimation algorithm for the H.264 advanced video coding (AVC) standard. The algorithm utilizes the correlation of the spatial interpolation effect on the full-pixel motion estimation best matches between different block sizes in order to reduce the computational cost of the overall motion-estimation process. Experimental results show that the proposed algorithm significantly reduces the CPU cycles in the various motion estimation schemes by up to 16% with similar rate-distortion performance when weighed up against the H.264/AVC standard.
Abdelazim, Yang, and Grecos: Fast subpixel motion estimation based on the interpolation effect on different block sizes for H264/AVC

1.

Introduction

The H.264 advanced video coding (AVC) standard1 is the newest standard from the ITU-T Video Coding Experts Group and the ISO/IEC Moving Pictures Experts Group. Its main advantages are the great variety of applications in which it can be used and its versatile design. This standard has shown significant rate-distortion (RD) improvements as compared to other standards for video compression.

The standard provides great flexibility in the selection of block sizes for motion estimation/compensation, with a minimum luma block size as small as 4×4 . Although most prior standards enable half-pixel motion vector accuracy at most, the H264/AVC further allows quarter-pixel motion vector accuracy for improved performance. Although the standard has shown significant RD improvements, it has also increased the overall encoding complexity due to the very refined motion-estimation (ME) process. The ME process consists of two stages: integer-pixel motion search and fractional-pixel motion search. Because the complexity of integer-pixel ME has been greatly reduced by numerous fast ME algorithms,2, 3 the computation overhead required by fractional-pixel ME has become relatively significant.

Different fast fractional-pixel ME algorithms3, 4, 5, 6 have been proposed, and some of them are used by the JM reference software.7 Their common idea is to simplify the search pattern by applying very refined prediction algorithms and improved adaptive threshold schemes to terminate unnecessary search positions.

In this paper, we focus on decreasing the complexity of fractional-pixel ME by effectively applying a two-step algorithm. First, we examine the 16×16 macroblock fractional-pixel ME best match, derived from the outcome we eliminate the fractional-pixel motion search for 16×8 and 8×16 macroblock partitions. Likewise, in the second step we examine the 8×8 macroblock partitions fractional-pixel ME best matches and, derived from the outcome, we eliminate the fractional-pixel motion search for 8×4 , 4×8 , and 4×4 macroblock partitions.

Our algorithm differs from the previous methods in two aspects: (i) It uses the similarities between the interpolation effect on the macroblock and its partitions to completely eliminate the fractional-pixel ME. (ii) The proposed algorithm is adaptive and can be applied to any combination of integer and fractional-pixel ME schemes.

The rest of the paper is organized as follows. Section 2 gives a brief overview of the ME algorithms proposed in the H.264/AVC. Section 3 describes the proposed ME algorithm. Section 4 contains a comprehensive list of experiments and a discussion. Section 5 concludes the letter.

2.

ME in the H.264/AVC

In the first stage of ME, integer-pixel motion search is performed for each square block of the slice to be encoded in order to find one (or more) displacement vector(s) within a search range. The best match is the position that minimizes the Lagrangian cost function Jmotion

1

Jmotion=Dmotion+λmotionRmotion
where λmotion is the Lagrangian multiplier, Dmotion is an error measure between the candidate macroblock taken from the reference frame(s) and the current macroblock, and Rmotion is the number of bits required to encode the difference between the motion vector(s) and its prediction from the neighboring macroblocks (differential coding). A similar functional to Eq. 1 is used to decide the optimal block size for ME. The most common error measures are the sum of absolute difference (SAD) and the sum of absolute transformed differences (SATD).

After the integer-pixel motion search finds the best match, the values at half-pixel positions around the best match are interpolated by applying a one-dimensional six-tap finite impulse response (FIR) filter horizontally and vertically. Then the values of the quarter-pixel positions are generated by averaging pixels at integer and half-pixel positions. Figure 1 illustrates the interpolated fractional pixel positions. Uppercase letters indicate pixels on the full-pixel grid, while numeric values indicate elements at half-pixel positions and lowercase letters indicate pixels in-between, at quarter-pixel positions.

Fig. 1

Fractional pixel search positions.

030504_1_1.jpg

For example, in Fig. 1, if the integer best match is position E, the half-pixel positions 1–8 are searched using Eq. 1. Suppose position 7 is the best match of the half pixel search. Then the quarter-pixel positions a–h are searched, again using Eq. 1.

3.

Proposed Scheme

In slow-motion video sequences or in the slow motion segments of fast video sequences, the ME process might find a best-match position during the integer-pixel motion search, which does not change after the subsequent fractional-pixel motion search. Furthermore, if the integer-pixel ME best match for a bigger block size does not change during fractional-pixel motion search, it is “highly likely” that this blocks’ partitions integer-pixel ME result will also not change during the fractional-pixel motion search. How likely, depends on the difference in block sizes as demonstrated below. The above observations are shown in Table 1.

Table 1

Evaluation of the conditional probabilities.

ProbabilitiesAverage
PROB(1)70%
PROB(2)59%
PROB(3)70%

Table 1 is divided into three rows. The first row shows the probability that the 16×8 and 8×16 macroblock partitions have the same best match in integer- and fractional-pixel ME, given that the 16×16 macroblock has the same best match in integer- and fractional-pixel ME. We call this probability PROB(1). Similarly, the second row shows the probability of 8×8 , 8×4 , 4×8 , and 4×4 blocks having the same best match in integer and fractional-pixel-ME, given that the 16×16 macroblock has the same best match in integer- and fractional-pixel ME. We call this probability PROB(2). The third row shows the probability of 8×4 , 4×8 , 8×4 , and 4×4 blocks partitions having the same best match in integer- and fractional-pixel ME, given that the 8×8 blocks have the same best match in integer- and fractional-pixel ME. We call this probability PROB(3).These probabilities are averaged across sequences with different motion characteristics and are shown in the second column of Table 1.

From Table 1, it can be seen that the conditional probabilities are reasonably high (70%) only when the macroblock/block and their partitions do not differ much in terms of size. For example, we cannot safely say that the 8×4 , 4×8 , and 4×4 partitions would find the same best match in the integer- and fractional-pixel motion search, given that enclosing 16×16 macroblock does so. In this case, the difference in size is big, because the 16×16 macroblock is 8, 8, and 16 times bigger with respect to the aforementioned block sizes. Using the above insights, we have developed the following scheme:

If the 16×16 macroblock finds the same best match in the integer- and fractional-pixel motion searches, then we disable the fractional-pixel motion search for all the enclosed 16×8 and 8×16 blocks. Thus, we can save all the fractional-pixel search, SAD, and Hadamard transform calculations for these blocks, Otherwise, the fractional-pixel motion search is performed.

Similarly, if the 8×8 block partitions of the 16×16 macroblock find the same best match in the integer- and fractional-pixel motion searches, we disable the fractional-pixel motion search for all the enclosed 8×4 , 4×8 , and 4×4 blocks. Otherwise, the fractional-pixel motion search is performed.

4.

Experiments

To assess the proposed algorithm, a comprehensive set of experiments for a variety of video sequences with different motion characteristics was performed. In this experiment, the source code for the H.264 Reference Software Version JM12.27 was used in a Pentium-4 PC running at 2.8GHz with 1.0GB RAM. Table 2 illustrates the conditions of the experiments.

Table 2

Encoder experiment conditions

ParameterValueParameterValue
Profile100(Main)YUVformatYUV4:2:0
Level IDC740B-FrameNot used
EntropycodingCABACFrameskip0
References5Searchrange32
ME metriclevel 0SADME metriclevels 1&2HadamardSAD

Table 3 shows the percentage cycle savings, the Bjontegaard Delta bit rate (BDBR) percentage differences, and the Bjontegaard Delta Peak signal-to-noise ratio (BDPSNR) differences (in decibels)8 between the H264/AVC and the algorithm we propose when full search (FS), enhanced predictive zonal search (EPZS),2 and unsymmetrical-cross multi-hexagon-grid search (UMHEXS)3 are used as full and fractional-pixel ME schemes.

Table 3

Experimental results.

SequenceSizeFullpixelMESub pixelMEFullpixelMESub pixelMEFullpixelMESub pixelMEFullpixelMESub pixelMEFullpixelMESub pixelME
FFSFSUMHEXUMHEXUMHEXFSEPZSEPZSEPZSFS
BDPSNR (db)BDBR (%)Cycles (%)BDPSNR (db)BDBR (%)Cycles (%)BDPSNR (db)BDBR (%)Cycles (%)BDPSNR (db)BDBR (%)Cycles (%)BDPSNR (db)BDBR (%)Cycles (%)
AkiyoQCIF+0.02−0.456.5−0.01+0.3414.320.00.014.85−0.01+0.2410.9−0.01+0.1114.9
CIF−0.04−0.128.2−0.01+0.4316.06−0.02+0.4915.89−0.01+0.2511.04−0.02+0.4915.14
ForemanQCIF−0.05+1.22.45−0.02+0.492.94−0.02+0.673−0.03+0.843.61−0.01+0.246.7
CIF−0.01+0.282.77−0.03+0.723.17−0.02+0.63.23−0.04+0.813.23−0.01+0.293.35
MobileQCIF−0.03+0.431.98−0.05+0.512.09−0.04+0.352−0.03+0.392.03−0.03+0.281.3
CIF−0.01+0.121.68−0.02+0.311.08−0.01+0.171.21−0.01+0.161.36−0.02+0.381.29
StefanQCIF−0.03+0.511.88−0.03+0.522.2−0.04+0.012.1−0.13+0.22.07−0.02+0.42.4
CIF−0.03+0.511.87−0.01+0.121.7−0,01+0.232−0.01+0.21.89−0.01+0.242.2
SilentQCIF−0.02+0.486.2−0.01+0.113.16−0.03+0.4811.6−0.02+0.438.92+0.01−0.1411.71
CIF−0.02+0.556.01−0.01+0.3711.79−0.02+0.4712.65−0.02+0.469.37−0.02+0.4312.6
TempeteQCIF−0.01+0.21.9−0.02+0.254.5−0.03+0.331.92−0.03+0.372.14−0.01+0.291.44
CIF−0.01−0.011.15−0.01+0.161.080.0+0.061.4−0.01+0.021.34−0.01+0.183.05
Openingceremony 720× 480−0.01+0.22.89−0.01+0.223.67−0.01+0.174.1−0.01+0.222.56−0.01+0.153.2
Driving 720× 480−0.01+0.161.3−0.01+0.121.450+0.051.90+0.051.80−0.02+0.081.1

The Intel VTune performance analyzer was used to measure the number of machine cycles differences. Table 3 shows that the BDBR percentage differences are in the range of [0.5,1.2] , while the BDPSNR differences are in the range of [0.04,0.02] . The minus signs denote PSNR degradation and bit-rate savings, respectively.

This clearly shows that the proposed algorithm has very similar RD performance to the H.264/AVC. Furthermore, percentage cycle savings up to 16% are observed. It also can be seen that the reduction in the CPU cycles depends on the characteristics of the image sequences. For a slow image sequence with a simple background, the reduction is much more significant than for a fast image sequence or sequences with a more complex background.

5.

Conclusion

In conclusion, we proposed a fast Subpixel ME based on the interpolation effect on different block sizes for H264/AVC standard. For RD performance very similar to the standard, the proposed technique can reduce up to 16% of the CPU cycles required for different ME schemes. Our scheme is very relevant to low-complexity video-coding systems.

References

1.  T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol.1051-8215 13(7), 560–577 (2003). Google Scholar

2.  A. Tourapis, “Enhanced predictive zonal search for single and multiple frame motion estimation,” In Proc. of VCIP 2002, pp.1069–1079, SPIE, Jan. 2002. Google Scholar

3.  Z. B. Chen, P. Zhou, and Y. He, “Fast integer pel and fractional pel motion estimation for JVT,” JVT-F017, 6th meeting, Awaji, Japan, Dec. 5–13 2002. Google Scholar

4.  P. Yin, H. Y. C. Tourapis, A. M. Tourapis, and J. Boyce, “Fast mode decision and motion estimation for JVT/H.264,” in Proc. of ICIP 2003, pp. 853–856, IEEE, Sep. 2003 Google Scholar

5.  Z. B. Chen, C. Du, J. H. Wang, and Y. He, “PPFPS—a paraboloid prediction based fractional pixel search strategy for H.26L,” In Proc. of ISCAS 2002, pp. 9–12, IEEE, May 2002. Google Scholar

6.  Z. B. Chen and Y. He, “Prediction based directional refinement (PDR) algorithm for fractional pixel motion search strategy,” JVT-D069, 4th meeting, Klagenfurt, Austria, July 22–26, 2002 Google Scholar

7.  K. Sühring, H.264/AVC Reference Software Version JM12.2,  http://iphome.hhi.de/suehring/tml/download/, Joint Video Team (2003). Google Scholar

8.  G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” Doc. VCEG-M33, Apr. 2001. Google Scholar

© (2009) Society of Photo-Optical Instrumentation Engineers (SPIE)
Abdelrahman Abdelazim, Abdelrahman Abdelazim, Mingyuan Yang, Mingyuan Yang, Christos Grecos, Christos Grecos, } "Fast subpixel motion estimation based on the interpolation effect on different block sizes for H264/AVC," Optical Engineering 48(3), 030504 (1 March 2009). https://doi.org/10.1117/1.3095795 . Submission:
JOURNAL ARTICLE
3 PAGES


SHARE
RELATED CONTENT

Low complexity H.264 video encoding
Proceedings of SPIE (September 02 2009)
Temporal filtering of coded video
Proceedings of SPIE (September 16 1996)
Influence of camera and in scene motion on perceived video...
Proceedings of SPIE (February 14 2008)
Adaptive GOP structure based on motion coherence
Proceedings of SPIE (August 31 2009)

Back to Top