1 April 2010 Fast and efficient intraprediction method for H.264/AVC
Abstract
A new intraprediction method is proposed to decrease coding complexity as well as improve coding efficiency of H.264/AVC. Whether the predicted values of different prediction modes are similar is determined according to the proposed rules. When the predicted values are similar, the average of such values is used to predict the current block without invoking the rate-distortion optimization process. Experimental results show that by integrating the proposed method in the H.264/AVC codec, 39.25% coding time can be saved; meanwhile, the average bit-rate reduction is 2.62% under the same peak signal-to-noise ratio of reconstructed videos.
Yuan, Chang, Lu, and Li: Fast and efficient intraprediction method for H.264/AVC

## Introduction

H.264/AVC1 can give higher coding efficiency than any other video coding standards because it provides many complex modes, both in intra and interframe coding. Rate distortion optimization (RDO)2 is used to choose the best mode when coding a block. In this work, intraframe coding of H.264/AVC is investigated. Motivated by the observation that different prediction methods can give similar predicted values, a new intraprediction method is proposed to decrease coding complexity, as well as enhance coding efficiency of H.264/AVC.

## Proposed Intraprediction Mode

In H.264/AVC, a picture is divided into many macroblocks (MB). Each MB can be partitioned into blocks with different sizes, such as $4×4$ , $8×8$ , or $16×16$ . For each block, different intraprediction methods can all be modeled as a mathematical expression, as shown in Eq. 1,

## 1

${t}_{y}=\sum _{x}\left({\omega }_{\gamma ,x}×{s}_{x}\right)}{\sum _{x}{\omega }_{\gamma ,x}},$
where $t$ and $s$ represent the predicted and prediction pixels, ${t}_{y}$ and ${s}_{x}$ represent the $y$ ’th predicted pixel and the $x$ ’th prediction pixel, respectively, ${\omega }_{\gamma ,x}$ is the corresponding weighted coefficient of the $x$ ’th prediction pixel under the $\gamma$ ’th prediction mode, and ${\omega }_{\gamma }$ represents a set of coefficients of the $\gamma$ ’th prediction mode.

In Eq. 1, when ${s}_{x}$ are the same, ${t}_{y}$ are the same, and even if ${s}_{x}$ are different, ${t}_{y}$ may still be similar, because the coefficients ${\omega }_{\gamma }$ are also different. Therefore, different prediction modes may give similar predicted values. That means the residual signals derived by different prediction modes are semblable, and the distortions are also semblable. Accordingly, the RDO process can be saved, and there is no need to use so many bits to label different prediction modes.

To determine whether different predicted blocks are semblable, features of different predicted blocks must be extracted, and for easy implementation, means and variances are used. First, the predicted blocks of all the prediction modes are obtained. The mean and variance of each predicted block are then derived by Eqs. 2, 3,

## 2

${\mathrm{avg}}_{i}=\frac{1}{W×H}\sum _{m=1}^{W}\sum _{n=1}^{H}{\mathrm{pred}}_{i}\left(m,n\right),$

## 3

${\mathrm{var}}_{i}=\frac{1}{W×H}\sum _{m=1}^{W}\sum _{n=1}^{H}{\left[{\mathrm{pred}}_{i}\left(m,n\right)-{\mathrm{avg}}_{i}\right]}^{2},$
where $W$ and $H$ are the width and height of the block, respectively, ${\mathrm{pred}}_{i}\left(m,n\right)$ is the predicted pixel of the $i$ ’th prediction mode at position $\left(m,n\right)$ of the predicted block, and ${\mathrm{avg}}_{i}$ and ${\mathrm{var}}_{i}$ are the mean and variance of all the predicted pixels in the predicted block under the $i$ ’th prediction mode. Furthermore, Eqs. 4, 5 are then used to compute the variance of ${\mathrm{avg}}_{i}$ and ${\mathrm{var}}_{i}$ , respectively,

## 4

${\mathrm{VAR}}_{\mathrm{avg}}=\frac{1}{M}\sum _{i=0}^{M-1}{\left({\mathrm{avg}}_{i}-\frac{1}{M}\sum _{j=0}^{M-1}{\mathrm{avg}}_{j}\right)}^{2},$

## 5

${\mathrm{VAR}}_{\mathrm{var}}=\frac{1}{M}\sum _{i=0}^{M-1}{\left({\mathrm{var}}_{i}-\frac{1}{M}\sum _{j=0}^{M-1}{\mathrm{var}}_{j}\right)}^{2},$
where $M$ is the number of prediction modes of the block, and ${\mathrm{VAR}}_{\mathrm{avg}}$ and ${\mathrm{VAR}}_{\mathrm{var}}$ represent the variance of ${\mathrm{avg}}_{i}$ and ${\mathrm{var}}_{i}$ , respectively. Different prediction modes can give similar predicted values when ${\mathrm{VAR}}_{\mathrm{avg}}$ and ${\mathrm{VAR}}_{\mathrm{var}}$ are less than some threshold $\mathit{TH}$ . The block, which has similar predicted values under different prediction modes, is named the decoder side prediction method derivable (DSPMD) block for the decoder, and can derive the prediction method without labeling information. For a DSPMD block, any of the predicted values can be chosen as the final value. However, one cannot guarantee which prediction mode is best. Therefore, to get a better predicted value, the average of different predicted values is used as the final predicted value of the block, as shown in Eq. 6,

## 6

${\mathrm{PRED}}_{\text{final}}\left(m,n\right)=\frac{1}{M}\sum _{i=0}^{M-1}{\mathrm{pred}}_{i}\left(m,n\right),$
where ${\mathrm{PRED}}_{\text{final}}\left(m,n\right)$ is the average of different predicted values at position $\left(m,n\right)$ of the current block.

For a certain picture, the number of DSPMD blocks increases with the increment of $\mathit{TH}$ , and ${\mathrm{PRED}}_{\text{final}}$ of the DSPMD blocks will be distant from their optimal predicted values, which is determined by the RDO process when $\mathit{TH}$ increases. But when the quantization step $\left({Q}_{\text{step}}\right)$ is large, $\mathit{TH}$ should be increased. That is because even if the predicted values are a little farther from each other, the quantized signals (derived from discrete cosine transform on residual signals and the succeeding quantization) are still similar. To adapt to ${Q}_{\text{step}}$ , $\mathit{TH}$ is set to be ${Q}_{\text{step}}∕2$ empirically.

Figure 1 shows the proportions of DSPMD blocks of the Foreman sequence. The proportion of luminance and chrominance DSPMD blocks can reach as high as 60 and 80%. Therefore, the RDO process and prediction mode information can be greatly saved by detecting DSPMD blocks.

## Fig. 1

Proportions of DSPMD blocks in intraframes for the Foreman sequence.

In a bit stream of H.264/AVC, mode information of all the blocks of an MB are followed by a coded block pattern (CBP) and coefficient bits, as shown in Fig. 2, which means that the mode information of all the blocks must be decoded first without considering whether a block is a DSPMD block. To ensure that DSPMD blocks can be decoded, a bit stream of an MB is changed to Fig. 2, where CBP of the MB is decoded first, and then whether a block is a DSPMD block is determined. For a DSPMD block, ${\mathrm{PRED}}_{\text{final}}$ , shown in Eq. 6, is used as the prediction result, and the following bits are decided as coefficients.

## Fig. 2

Bit stream comparison between (a) H.264/AVC and (b) the proposed method.

## Experimental Results

In the experiments, H.264/AVC reference software Joint Model (JM) 15.1 was used to encode ten sequences with various representative contents. Each sequence is encoded at four different quantization parameters (QP) i.e., 24, 28, 32, and 36. The corresponding ${Q}_{\text{step}}$ are 10, 16, 26, and 40. The coding complexity and efficiency are evaluated by the percentage of average encoding time savings, and percentage of average bit rate savings under the same PSNR of reconstructed videos (BD bit rate),3 respectively. Statistic results of all the sequences are shown in Table 1. From Table 1, by integrating the proposed method into JM15.1, the 39.25% coding time can be saved and the BD bit rate is $-2.62\mathrm{%}$ on average.

## Table 1

Coding efficiency of different test sequences.

SequencesBD-BitRate (%) ΔTime (%)SequencesBD-BitRate (%) ΔTime (%)
Carphone (cif) $-2.86$ $-36.82$ Bigship (720p) $-5.35$ $-59.62$
Claire (cif) $-1.87$ $-27.41$ City (720p) $-1.93$ $-30.59$
Container (cif) $-1.02$ $-20.76$ Crew (720p) $-2.68$ $-44.56$
Foreman (cif) $-3.72$ $-45.01$ Night (720p) $-1.85$ $-30.35$
PeopleOnStreet (1080p) $-2.45$ $-41.54$ Traffic (1080p) $-2.51$ $-55.83$
Average of all the sequences $-\mathbf{2.62}$ $-\mathbf{39.25}$

Figure 3 compares bits of each component between the proposed method and JM15.1. It can be concluded that the gains of coding efficiency are embodied in bit reductions of both mode information and coefficients, while keeping the same coding quality. The bit reductions of the coefficients benefit from the combination of all the predicted values, as shown in Eq. 6. To show the results clearly, RD curves for the Bigship sequence are presented in Fig. 4. It can be concluded that the proposed method outperforms JM15.1.

## Fig. 3

Different bit components in the stream of the first frame of Bigship.

## Fig. 4

RD curves for Bigship sequence.

Moreover, the proposed method was compared with the estimation based method in Ref. 4 and the pixel-based direction detection (PDD) method in Ref. 5. Compared with the method in Ref. 4, the BD bit rate of the proposed method is $-1.05\mathrm{%}$ ; meanwhile 73.47% coding time is saved. Compared with the PDD method, 23.11% average coding time is increased. However, the BD bit rate of the proposed method can achieve $-5.33\mathrm{%}$ . Although the coding time of the proposed method is larger than that of the PDD method, coding efficiency of the proposed method is better than that of the PDD method.

## Conclusions

A fast and efficient intraprediction method is proposed. Whether a block is a DSPMD block is determined, and then all the predicted values of a DSPMD block are averaged to get the final predicted value. The RDO process can be neglected for those DSPMD blocks, and the identifiers of prediction methods can also be omitted as well. Experimental results show that the coding complexity is decreased, and that coding efficiency is improved as well.

## Acknowledgments

The work was supported by the National Natural Science Foundation Research Program of China numbers 60772134, 60902081, and 60902052, the 111 Project (B08038), the Natural Science Basic Research Plan in Shaanxi Province of China (program number SJ08F03), and the Basic Science Research Fund in Xidian University (72105457). We would like to thank the editors and anonymous reviewers for their valuable comments.

## References

1. Telecommunication Standardization Sector of ITU, “Series H: audio visual and multimedia systems Infrastructure of audiovisual services–coding of moving video,” Recommendation ITU-T H.264 (Mar. 2009). Google Scholar

2.  T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits Syst. Video Technol.1051-8215 13(7), 688–703 (Jul. 2003). 10.1109/TCSVT.2003.815168 Google Scholar

3.  G. Bjøntegaard, “Improvements of the BD-PSNR model,” ITU-T SG16 Q.6 Document, VCEG-AI11, Berlin (Jul. 2008). Google Scholar

4.  C. S. Park and S. J. Ko, “Estimation-based intra prediction algorithm for H.264/AVC,” Opt. Eng.0091-3286 48(3), 030506–1–3 (2009). 10.1117/1.3101375 Google Scholar

5.  A. C. Tsai, J. F. Wang, J. F. Yang, and W. G. Lin, “Effective subblock-based and pixel-based fast direction detections for H.264 intra prediction,” IEEE Trans. Circuits Syst. Video Technol.1051-8215 18(7), 975–982 (Jul. 2008). 10.1109/TCSVT.2008.920742 Google Scholar

© (2010) Society of Photo-Optical Instrumentation Engineers (SPIE)
Hui Yuan, Hui Yuan, Yilin Chang, Yilin Chang, Zhaoyang Lu, Zhaoyang Lu, Ming Li, Ming Li, } "Fast and efficient intraprediction method for H.264/AVC," Optical Engineering 49(4), 040501 (1 April 2010). https://doi.org/10.1117/1.3377968 . Submission:
JOURNAL ARTICLE
3 PAGES

SHARE
KEYWORDS
##### Show All Keywords
RELATED CONTENT

Joint source-channel coding for scalable video
Proceedings of SPIE (April 19 2000)
MPEG 2 variable bit rate coding algorithm analysis and...
Proceedings of SPIE (September 16 1996)