In block-based video coders, motion estimation (ME) carries a great significance because of its impact on the compression efficiency. In order to achieve high compression efficiency, ME is performed with quarter-pel (Qpel) accuracy as well as half-pel (Hpel) and integer-pel accuracies in the H.264/AVC standard.1 Even though ME with a high accuracy generally reduces the bits required for encoding the difference frame, it often compromises the total bit rate because the bits required for encoding the motion vector (MV) grows as the motion vector accuracy (MVA) increases.
Various methods of obtaining the optimal MVA has been introduced in the literature.2, 3 In Ref. 2, an optimal MVA is derived for each macroblock (MB) and for each frame. The optimal MVA formula in Ref. 2 reveals that the MVA is dependent on the texture and the interframe noise of the MB. In Ref. 3, the MVA is adaptively determined for each MB by examining all possible MVAs and selecting the one with the minimum Lagrange cost. However, the coding gain of these methods is limited, since additional bits indicating the MVA need to be encoded.
In this letter, a novel MVA decision algorithm for H.264/AVC is presented. The proposed method determines the validity of Qpel ME for each MB. Since no additional bit is required to indicate the MVA, the proposed algorithm can be implemented without modifying the syntax of the H.264/AVC standard. Then, in order to achieve the coding gain, we also propose an MV encoding technique that adaptively changes the variable length coding (VLC) table according to the MVA of the MB.
The proposed algorithm consists of two techniques. We first present an adaptive MV encoding technique for H.264/AVC. Then, based on the proposed MV coding technique, we also propose an MVA decision technique.
In H.264/AVC, not an original MV itself but the difference between the original MV and the predicted motion vector (PMV),1 the motion vector difference (MVD), is encoded. Let denote the MVD defined as follows:and represent the original MV and PMV, respectively. In H.264/AVC, each horizontal and vertical element of is independently encoded by using a common VLC table without considering the MVA of the MB.
For notational simplicity, we first define three motion vector sets:is a union set of and . If ME is performed up to Hpel accuracy, the resulting MV should belong to . If we allow Qpel ME, the additional MV set, , is required to express the MV.
Assume that ME is performed up to Hpel accuracy for the current MB, i.e., . Then, is an element of either or depending on as follows:
Note that the number of possible values is halved if Qpel ME is not applied. Therefore, instead of using a VLC table including all possible MVD values, , a reduced-size VLC table containing either Hpel or Qpel MVD values, or , can be used. By adaptively changing the VLC table according to the MVA of the MB, the bits required to encode MVD in H.264/AVC can be effectively reduced.
The bits required for encoding the MV can be reduced by skipping Qpel ME. However, Qpel ME needs to be omitted when it has a negligible impact on ME performance. In order to determine whether Qpel ME is necessary, we consider two cases. In the first case (C1), we skip Qpel ME for all MBs and encode the MVDs using . We allow, in the second case (C2), Qpel ME for all MBs. Then, the resulting MVDs are encoded using . By comparing C1 and C2, the effect of Qpel ME can be analyzed.
Let dRDcost denote the difference between the rate distortion costs (RDcosts) of C1 and C2. If dRDcost is positive, we can interpret that Qpel ME is required for the MB. This is because the loss caused by skipping Qpel ME is larger than the gain achieved by the proposed MVD coding technique. In the other case, Qpel ME can be considered unnecessary. Now, the remaining problem is to find the elements which affect dRDcost.
Motivated by the optimal MVA formula,2 we claim that the necessity of Qpel ME increases as the spatial and temporal complexity of the MB increases. In our work, we estimate the spatial and temporal complexity of the current MB from the MB at the reference frame indicated by , which is also available at the decoder. First, we measure the sum of the gradients of the luminance component as a spatial complexity metric.4 Let and denote the horizontal and vertical gradient defined asis the luminance component of the reference frame, and is the pixel coordinate. Then, the spatial complexity of the MB, , is obtained by averaging the gradient values inside of the MB as follows: is the size of the MB, is the pixel coordinate determined by , and max(⋅,⋅) returns the maximum value between two values. The temporal complexity of the MB is simply defined as a magnitude of , and represent the horizontal and vertical elements of , respectively.
Based on the preceding spatial and temporal complexity metrics, we examine the relation between dRDcost and complexity metrics. Figure 1 shows an example of the dRDcost result obtained by using the 80th frame of the Foreman test sequences. In this example, the quantization parameter (QP) is set to 36, one reference frame is used with CABAC coding, and the other experimental conditions are given in Table 1. For each MB, dRDcost is computed by performing C1 and C2, and or is assigned to represent whether its value is positive or not. We can see that if the temporal or spatial complexity of the MB is high, Qpel ME tends to be advantageous. In the other case, where the spatial and temporal complexities are both low, the skipping of Qpel ME is beneficial in most cases. This tendency is consistent with the results in Ref. 2. To this end, we define an exponential curve to determine whether Qpel ME is advantageous:and are modeling parameters. These parameters are obtained by taking logarithm function to Eq. 9 and applying the least-squares linear classification, so that and minimize the squared classification error.5 By using test sequences in Ref. 6 and different QPs in Table 1, and are achieved by 9.29 and , respectively. Then, Qpel ME is applied only when the estimated complexity point, , is located above the curve of Eq. 9, where Qpel ME tends to improve the coding efficiency.
|Software||JM 16.0 (Ref. 7)|
|Sequence resolution||CIF ,|
|Number of encoded frames||100|
|Number of reference frames||3|
|QP||32, 36, 40, 44|
|ME||Exhaustive ( resolution)|
|ME search range||32 (CIF), 64|
|Rate distortion optimization||Enabled|
|Entropy coding||CAVLC, CABAC|
Figure 2 summarizes the procedure of the proposed algorithm at the encoder. When encoding the MB, and are calculated by using the PMV of INTER- mode, and the Qpel ME is applied if . This Qpel ME decision result is shared for all subpartitioned blocks. In other words, we do not allow each subpartitioned block to have different MVA. At the decoder, is similarly obtained by Eq. 7. Then, if , the received MVD bits are decoded using either or , depending on the PMV. Since the same PMV should be used at both encoder and decoder, the PMV of INTER- mode is used at the decoder. In the other case, the MVD bits are decoded by using as the conventional decoder.
Experimental Results and Conclusion
In order to evaluate the performance of the proposed algorithm, the proposed method is compared with the conventional algorithm in Ref. 3. A PC with an Intel Core2 Quad, CPU, and 8 GB RAM is used. The detailed experimental conditions are given in Table 1. The changes of Bjontegaard Delta (BD) rate , encoding time , and decoding time are used to measure the performance.7
Table 2 indicates that the proposed algorithm improves the coding efficiency of the original JM 16.0 by 2.97% and 2.77% for CABAC and CALVC, respectively.8 Since the proposed algorithm does not require any overhead bit to indicate the MVA, superior coding efficiency is obtained when compared to the conventional algorithm. Here, it should be noted that the Bigships and Jets sequences are encoded by using 151th to 250th and 301th to 400th frames, where shot changes occur, respectively. Since the intracoding outperforms the intercoding in such cases, the performance of the proposed algorithm is not deteriorated.
Performance of the proposed algorithm.
|Sequence||Ref. 2 (CABAC/CAVLC)||Proposed (CABAC/CAVLC)|
From the viewpoint of the computational complexity, since all MVAs are examined and the best one is selected,3 the complexity of the original JM 16.0 encoder and decoder is maintained or slightly increased. In the proposed algorithm, by skipping unnecessary Qpel ME, additional computation for at the encoder is compensated and even a slight encoding time saving of 3.57% and 3.41% is achieved by CABAC and CAVLC, respectively. In addition, although the decoder should compute for each MB, the decoding time is also saved by 4.68% and 4.27% on average for CABAC and CAVLC, respectively. This is because the interpolation time for the motion compensation is decreased due to the skipping of the unnecessary Qpel ME at the encoder.
In this letter, we first presented an adaptive MVD coding scheme. Then, in order to apply the adaptive MVD coding technique effectively, we also proposed an algorithm that selectively performs Qpel ME based on the spatial and temporal complexity of the MB. The experimental results demonstrated that the proposed algorithm improves coding efficiency without requiring the computational overhead at both encoder and decoder.
This research was supported by Seoul Future Contents Convergence (SFCC) Cluster established by Seoul R&BD Program. This work was also supported by a Korea Science and Engineering Foundation (KOSEF) Grant funded by the Korean Government (MEST) (No. 2009-0080547).