Open Access
19 February 2014 Global-local correlation-based early large-size mode decision for multiview video coding
Wei Zhu, Yayu Zheng, Peng Chen, Jiani Xie
Author Affiliations +
Abstract
Multiview video coding (MVC) is a recent extension of H.264/AVC, and it consumes huge encoding time to select the optimal macroblock (MB) mode, among different size candidate modes. As compared with the small-size mode (Inter16×8 , Inter8×16 , Inter8×8 , Intra8×8 , and Intra4×4 ), the large-size mode (Skip/Direct, Inter16×16 , and Intra16×16 ) occupies most of the MB mode proportion with much less computational complexity. Thus, if the large-size mode could be early decided as the optimal MB mode, the complexity of mode decision could be effectively reduced. In this work, an early large-size mode decision algorithm is proposed based on the global correlation of rate-distortion (RD) costs between neighbor views and the local correlation of RD costs among candidate modes. Average RD costs of large-size and small-size MB modes in the neighbor view are employed as a global reference for the threshold of early decision. And RD costs of estimated modes are used to calculate the local adjustment for the threshold. Experimental results demonstrate that the proposed algorithm can significantly reduce the whole encoding time while maintaining an RD performance similar to that of the original MVC encoder.

1.

Introduction

Multiview video is captured from a set of viewpoints, and it is useful in many multimedia applications, such as three-dimensional (3-D) television, free viewpoint television, and glass-free portable 3-D display. Multiview video coding (MVC) was developed for the storage and transmission of very large multiview video data, and it has been standardized as the extension of H.264/AVC.1 Figure 1 shows an illustration of the basic prediction structure in the MVC reference software JMVC, where the group of picture (GOP) length is eight for each view. According to the backward compatibility, all views can be classified into a base view and some nonbase views. Only the base view (S0 in Fig. 1) is backward compatible with H.264/AVC; nonbase views (S1 and S2 in Fig. 1) are encoded with new coding tools for providing complete multiview video bitstreams.2 Every GOP in each view includes one anchor frame and some nonanchor frames. Frames with the time instants T0 and T8 in Fig. 1 are anchor frames, and other frames are nonanchor frames. For the random access of video bitstreams, anchor frames are not allowed to adopt temporal prediction. To improve the compression performance, nonanchor frames adopt temporal prediction and interview prediction.3 In Fig. 1, solid arrows represent temporal prediction and dotted arrows represent interview prediction. According to prediction directions for nonanchor frames, all views can also be classified into temporal views and interview views. Nonanchor frames employ both temporal prediction and interview prediction for interview views, while they only employ temporal prediction for temporal views. In Fig. 1, views S0 and S2 belong to temporal views, and view S1 belongs to interview views.

Fig. 1

Illustration of the basic prediction structure in JMVC. S0, S1, and S2 represent three views, T0 to T8 represent nine time instants. In each view, I, B, and P represent frame types, and their subscript values represent corresponding temporal levels.

JEI_23_1_013027_f001.png

Like H.264/AVC, MVC also needs to select the optimal macroblock (MB) mode among multiple candidate modes for each frame, and it has five intermodes (Skip/Direct, Inter16×16, Inter16×8, Inter8×16, and Inter8×8) and three intramodes (Intra16×16, Intra8×8, and Intra4×4). Except for Skip/Direct, other intermodes are needed to consume a lot of time for motion/disparity estimation in reference frames. In MVC reference software JMVC, the exhaustive mode decision algorithm is used to select the optimal MB mode, and it checks all candidate modes in sequence. Besides, due to the use of interview prediction, the computational complexity of mode decision for single view is greater than that of H.264/AVC. Thus, the computational complexity of MVC is very high, and it has hindered the practical use for real-time and mobile applications.

Many state-of-the-art fast mode decision algorithms have been developed for H.264/AVC. Wu et al.4 presented a fast intermode decision by using of the spatial homogeneity and the temporal stationarity characteristic of video objects. Hu et al.5 proposed a fast intermode decision algorithm based on rate-distortion (RD) cost characteristics, which includes an early skip mode decision and a three-stage mode prediction. Sung and Wang6 introduced a multiphase classification scheme that builds a mode decision tree according to the clustering of RD costs. These algorithms can be employed to speed up the mode decision of MVC, and their ideas could also enlighten the design of fast mode decision algorithms for MVC. However, the complexity of MVC is still very high, and it could be further reduced by using characteristics of MVC. To address this issue, various mode decision algorithms have been studied, including Refs. 78.9.10.11.12.13.14.15.16.17.18.19. These algorithms presented some effective optimization techniques, such as adaptive termination strategy,710 candidate modes selection,1013 prediction direction selection,1214 and early Skip/Direct mode decision.1418 For the reduction of the whole complexity, Shen et al.13 combined the candidate modes selection with the fast motion estimation and the prediction direction selection; meanwhile, Khattak et al.19 provided a complete framework that includes not only mode decision but also reference frame selection and fast motion/disparity estimation. In these algorithms, the correlation of coding information between neighbor views and the RD costs of MB mode in MVC are usually employed to arrive at a faster mode decision. As Skip/Direct mode occupies the largest proportion of MB modes with negligible computational complexity, several algorithms only focused on early Skip/Direct mode decision. Zeng et al.15 introduced an early decision algorithm by using RD costs of nearby MBs. Zatt et al.16 proposed an early decision algorithm based on mode correlation in the 3-D-neighborhood. Shen et al.17 presented an early decision algorithm based on the analysis of prediction mode distribution regarding the corresponding MBs in the neighbor view. Zhang et al.18 proposed an efficient statistical Skip/Direct mode termination model named SDMET to adjust the RD cost threshold adaptively by using statistical information of coded MBs. As mentioned above, early Skip/Direct mode decision algorithms can reduce the complexity effectively with high RD performance, and they also can be combined with fast algorithms of motion/disparity estimation and multireference frames selection1923 to further reduce the complexity. However, since they mainly utilize the local correlation of coding information between neighbor views, the global correlation of coding information has not been exploited. Moreover, they cannot perform very well for video scenes with fast motions and large disparities, and they also have not considered the complexity reduction for anchor frames.

In this article, an early large-size mode decision algorithm based on the global-local correlation of RD costs is proposed to reduce the computational complexity of MVC. According to mode sizes and RD properties, all candidate modes in mode decision are classified into two types: the large-size mode and the small-size mode. The large-size mode includes Skip/Direct, Inter16×16, and Intra16×16, and the small-size mode includes Inter16×8, Inter8×16, Inter8×8, Intra8×8, and Intra4×4, where Inter8×8 further contains four submodes (sub8×8, sub8×4, sub4×8, and sub4×4). Compared with the small-size mode, the large-size mode occupies much more MB mode proportion with much less computational complexity. Because of including Inter16×16 and Intra16×16 modes, the large-size mode also occupies more proportion than Skip/Direct mode, especially for frames with fast motions and large disparities. Therefore, the proposed algorithm focuses on the early decision of large-size mode instead of the early decision of Skip/Direct mode. The global correlation of RD costs between views is adopted to calculate the basis portion of the early decision threshold. For each MB, the local correlation of RD costs among different size modes is employed to calculate the adjustable portion of the threshold, and the minimal RD cost of the large-size mode is compared with the threshold to early terminate mode decision. In addition, the proposed algorithm also considers the optimization for frames in the base view by using the nearest forward-coded frame on temporal direction. Therefore, the proposed algorithm can be applied to all interframes of all views for effectively reducing the whole computational complexity of MVC.

The rest of this article is organized as follows: In Sec. 2, the characteristics of large-size and small-size modes are analyzed. Then, an early large-size mode decision based on the global-local correlation of RD costs is proposed in Sec. 3. Experimental results and conclusions are given in Secs. 4 and 5, respectively.

2.

Motivation and Analysis

As an extension of H.264/AVC, MVC also employs the RD optimization technique24 to select the optimal MB mode with the minimum RD cost, among candidate inter- or intramodes, and RD cost of each mode includes rate portion and distortion portion. Both inter and intramodes include different size modes, which have distinct characteristics of rate portion and distortion portion. By providing more precise motion/disparity estimation, small-size intermodes which include Inter16×8, Inter8×16, and Inter8×8 can obtain less distortion portions of RD costs than large-size intermodes (Skip/Direct, Inter16×16). However, because small-size intermodes need more bits for encoding motion vector of each partition block, they have larger rate portions of RD costs than that of large-size intermodes. For MBs with complex motion/disparity, small-size intermodes can obtain a much less distortion portion of RD cost than large-size intermodes, and they are more likely to be selected as the optimal MB mode. For MBs with smooth motion/disparity, large-size intermodes usually can provide the same level performance of distortion portions as small-size modes, and they have less rate portions than small-size intermodes due to encoding only 16×16 partition information. Similar to intermodes, small-size intramodes (Intra8×8 and Intra4×4) can also obtain less distortion portions and more rate portions than that of large-size intramode (Intra16×16). For MBs with complex textures, small-size intramodes will often have less RD costs than large-size intramode. For MBs with smooth texture, large-size intramode usually obtains less RD cost than small-size intramode. In this article, according to the RD properties of different size modes, all intermodes and intramodes are classified into the large-size mode (includes Skip/Direct, Inter16×16, and Intra16×16 modes) and the small-size mode (includes Inter16×8, Inter8×16, Inter8×8, Intra8×8, and Intra4×4 modes).

In order to verify the above theoretical analysis, JMVC 8.0 was used to investigate statistic characteristics of large-size and small-size modes, and three typical test sequences (“Exit,” “Ballroom,” and “Race1”) with eight views (S0 to S7) were chosen. The GOP length was set to 12, and five GOPs were selected for each view. For each frame, the default coding conditions were used: maximum two reference frames are available for forward reference list and backward reference list, respectively, only one interview reference frame is allowed for each reference list, search method “TZ search” is enabled with search range 96. Table 1 gives the MB proportion of large-size and small-size modes, and the experimental basis quantization parameter (QP) is 32. It can be seen that the large-size mode occupies a proportion of 85% to 97%, which is much larger than the small-size mode. The mode proportion inside large-size MB mode under different basis QPs for “Race1” sequence is further shown in Fig. 2. It can be observed that Skip/Direct occupies the largest proportion, while Inter16×16 occupies the second largest proportion and Intra16×16 occupies a considerable proportion. Furthermore, the encoding time proportions of large-size and small-size modes are given in Table 2. The large-size mode consumes 7% to 15% of the encoding time for three sequences over eight views, whereas the small-size mode consumes 83% to 86% of the encoding time, which is much greater than that of the large-size mode. From Tables 1 and 2, it can be found that the large-size mode occupies a majority of the proportion, but only consumes a small part of the encoding time. Thus, if the large-size mode can be identified early as the optimal MB mode, the estimation of small-size mode can be skipped. In addition, it also indicated that the statistical proportions of MB modes are similar, among various views for each sequence. This is because those different views’ video are originated from the same scene, and the effect of occlusions between views is small in most cases. So, the statistical information of neighboring views can be adopted for the early decision of large-size mode.

Table 1

Macroblock (MB) mode proportion for “Exit,” “Ballroom,” and “Race1” sequences.

View IDsProportion of large-size MB mode (%)Proportion of small-size MB mode (%)
ExitBallroomRace1ExitBallroomRace1
S097.389.889.72.710.210.3
S195.989.592.04.110.58.0
S294.085.687.86.014.412.2
S395.389.891.54.710.28.5
S492.885.487.27.214.612.8
S594.089.691.46.010.48.6
S689.785.086.610.315.013.4
S790.686.890.09.413.210.0
Avg.93.787.789.56.312.310.5

Fig. 2

Mode proportion inside large-size macroblock (MB) mode under different basis quantization parameters (QPs) for view S1 of “Race1” sequence.

JEI_23_1_013027_f002.png

Table 2

Encoding time proportion for “Exit,” “Ballroom,” and “Race1” sequences.

View IDsEncoding time proportion of large-size mode (%)Encoding time proportion of small-size mode (%)
ExitBallroomRace1ExitBallroomRace1
S014.413.37.783.384.984.7
S114.613.89.085.085.585.1
S214.813.710.183.085.284.5
S314.413.712.784.885.285.7
S413.913.710.183.485.384.6
S514.014.211.885.085.485.5
S613.813.58.983.885.384.6
S713.214.811.585.385.384.9
Avg.14.113.810.284.285.385.0
Note: Experimental conditions are the same as Table 1.

Further studies on RD cost characteristics of large-size mode and small-size mode are performed, and experimental conditions are the same as Table 1. RD costs distribution of large-size and small-size MB modes in a single frame is shown in Fig. 3. It can be seen that most of large-size MB mode RD costs are less than the average RD cost of small-size MB mode in Fig. 3(a), and most of small-size MB mode RD costs are larger than the average RD cost of large-size MB mode in Fig. 3(b). These indicate that most of large-size MB modes have relatively low RD costs and most of small-size MB modes have relatively high RD costs, which are consistent with our previous theoretical analysis, and it can draw the conclusion that the average RD cost of large-size MB modes and the average RD cost of small-size MB modes can be used to calculate the thresholds for the early decision of large-size mode.

Fig. 3

Distribution of MB mode rate-distortion (RD) costs on view S1 frame 6 for “Ballroom” sequence. (a) RD costs of large-size MB mode and the average RD cost of small-size MB mode. (b) RD costs of small-size MB mode and the average RD cost of large-size MB mode.

JEI_23_1_013027_f003.png

Moreover, the RD cost gap between large-size mode and small-size mode is further analyzed. Figure 4 shows the sorting of large-size mode RD costs with the red curve, and the corresponding small-size mode RD costs with same sorting indexes are also given. It can be seen that most large-size mode RD costs are smaller than their corresponding small-size mode RD costs. This is consistent with the statistical results in Table 1 and Fig. 3. If the large-size mode RD cost is relatively low, the large-size mode is likely to be selected as the optimal MB mode. The RD cost gap between the large-size mode and the small-size mode is increasing with the large-size mode RD cost. Therefore, if the large-size mode RD cost is a small value, the misjudgment cost of early termination would also be small.

Fig. 4

Sorting of large-size mode RD costs and their corresponding small-size mode RD costs on view S1 frame 6 for the “Ballroom” sequence.

JEI_23_1_013027_f004.png

Characteristics of large-size and small-size modes in MVC are summarized as follows:

  • 1. The large-size mode occupies most of the proportion of MB modes, while it consumes only a small part of the encoding time. The small-size mode occupies a smaller proportion of MB modes, but consumes most of the encoding time.

  • 2. Although Skip/Direct occupies the largest proportion inside large-size MB mode, the proportions of Inter16×16 and Intra16×16 are also considerable.

  • 3. Compared with small-size mode RD costs, most of large-size mode RD costs are relatively low, and their gaps are small when RD costs are low.

  • 4. Different test sequences have various MB mode proportions, while different views in the same test sequence have similar MB mode proportions as they originate from the same scene. Thus, there is a global correlation of coding information between views, and the average RD cost of large-size MB modes and the average RD cost of small-size MB modes in neighboring views can be used for the mode decision of current view.

3.

Proposed Early Large-Size Mode Decision Algorithm

According to the analysis in the previous section, the proposed algorithm focuses on the early decision of large-size mode, which is estimated first, then, its RD cost is compared with a global–local adaptive threshold to early terminate mode decision. The detail processes are introduced as follows.

Based on the motivation in Fig. 3, the average RD cost of large-size MB modes (AvgJLarge) and the average RD cost of small-size MB modes (AvgJSmall) of the coded frame are employed. For nonbase views, the coded frame in the forward neighboring view with the same time instant as the current frame is selected to calculate AvgJLarge as follows:

Eq. (1)

AvgJLarge=i=1NLargeJLarge(i)NLarge,
where JLarge is the large-size mode RD cost of the MB i with large-size mode as its optimum MB mode, NLarge is the number of JLarge. AvgJSmall can be calculated in the same way. Due to the global correlation between views, AvgJLarge and AvgJSmall are adaptive to the video scene and coding feature. Therefore, AvgJLarge and AvgJSmall of neighboring views can be employed as a global measure for the early decision algorithm of current view. The curves of AvgJLarge and AvgJSmall under different basis QPs for “Ballroom” sequence are shown in Fig. 5. It can be seen that both AvgJLarge and AvgJSmall increase with basis QPs. Because the calculation of RD cost is relative to QP, AvgJLarge, and AvgJSmall are adaptive to the change of QP, and they are suitable to be employed as the global reference. For each MB, the estimation of Skip/Direct, Inter16×16, and Intra16×16 are performed, and the minimum RD cost of these estimated modes is selected as the large-size mode RD cost JLarge. Then, the early decision of large-size mode is determined in Eq. (2):

Eq. (2)

EarlyDecision(n)={1,ifJLarge(n)<EarlyTH0,otherwise,
where n is the index of the current MB, and EarlyTH is the early decision threshold. If JLarge is smaller than the termination threshold, EarlyTH, the optimal large-size mode with the minimum RD cost is selected early as the final MB mode and mode decision process is terminated. The selection of EarlyTH directly affects the performance of the proposed algorithm, and its calculation is analyzed as follows:

Fig. 5

Average MB mode RD costs on view S1 under different basis QPs for “Ballroom” sequence.

JEI_23_1_013027_f005.png

First, AvgJLarge of the coded frame can be adopted as a measurement for JLarge in the current frame, and it is multiplied with a parameter α as the EarlyTH. To study the relation between parameter α and the variation of RD performance which is caused by the misjudgment of large-size mode, the increments of total RD costs for MBs, which select small-size mode as the optimal mode are shown in Fig. 6, where the experimental basis QP is 32. It can be seen that the increments of RD costs are very small when α is 1, while the increments obviously grow when α is larger than 1. Thus, AvgJLarge (α is equal to 1) can be employed as the basis portion of the threshold. Although AvgJLarge is adaptive to video content of the current frame, the gap between AvgJLarge and AvgJSmall also changes with basis QPs in Fig. 5. If the threshold is directly calculated by multiplying JLarge with a fixed parameter, the performance of early termination will not be stable. The threshold may be larger than AvgJSmall under small basis QPs, then the RD performance of the early decision will drop dramatically due to the low-decision accuracy. And the threshold also may be much less than AvgJSmall under large basis QPs, which will lead to low time savings because of the small early termination ratio. So, a good threshold calculation method that solves the above problem is expected in the proposed algorithm.

Fig. 6

The relationship between the parameter α and the increment of total RD costs for MBs with small-size MB mode.

JEI_23_1_013027_f006.png

Second, the threshold is further adjusted by utilizing the local feature of MB to improve the computational performance while maintaining a high RD performance. After the estimation of large-size mode, its RD costs reflect the local feature of current MB, and it can be used to predict the probability of a small-size MB mode. To reduce the misjudgment of large-size MB mode, MBs which selected the small-size mode as the final MB mode are employed to study the relation between large-size mode and small-size mode. For these MBs, Skip/Direct mode RD cost (JSkip/Direct) and Inter16×16 mode RD cost (JInter16×16) are compared, and ratios of the smaller RD cost are given in Table 3, where the experimental conditions are same with Sec. 2. It can be seen that JInter16×16 occupies a majority of the proportion. Therefore, if JInter16×16 is less than JSkip/Direct, there is more probability for current MB to select the small-size mode, and the threshold should be decreased for maintaining RD performance. Conversely, if JSkip/Direct is less than JInter16×16, the threshold could be increased for achieving more time saving.

Table 3

Proportions of the smaller rate-distortion (RD) cost between JSkip/Direct and JInter16×16.

SequencesBasis quantization parameters (QPs)Proportions of the smaller RD cost (%)
JSkip/DirectJInter16×16
Exit2415.584.5
2814.385.7
3214.885.2
3615.684.4
Ballroom2417.682.4
2817.083.0
3217.582.5
3618.082.0
Race12418.281.8
2817.682.4
3216.983.1
3617.982.1
Average16.783.3

Based on the above analysis, EarlyTH in Eq. (2) is finally calculated as follows for each MB:

Eq. (3)

EarlyTH=AvgJLarge+JInter16×16JInter16×16+JSkip/Direct×(AvgJSmallAvgJLarge),
where AvgJLarge is the basis portion of EarlyTH, and the proportional divisor calculated by JInter16×16 and JSkip/Direct is used to adjust the gap between AvgJSmall and AvgJLarge with the local feature of current MB. In Eq. (3), if JInter16×16 is less than JSkip/Direct, EarlyTH is closer to AvgJLarge, and if JSkip/Direct is less than JInter16×16, EarlyTH is closer to AvgJSmall. Thus, the range of EarlyTH is from AvgJLarge to AvgJSmall, and its value is dependent on the local MB coding feature. For different video sequences, a few frames may have no small-size MB mode under high basis QPs. In this particular case, AvgJSmall is replaced with 5×AvgJLarge according to extensive experiments for getting a preferable performance.

In addition, the proposed algorithm is also extended to the base view by using AvgJSmall and AvgJLarge of the nearest forward-coded frame in the current view. Thus, it can reduce the computational complexity of all views. Besides, the proposed algorithm need not to store all MB RD costs of the coded frame because the use of global correlation, and only two clasified average RD costs of current frame are stored for the optimization of following frames.

To verify the effectiveness of the proposed algorithm, simulations have been conducted under the same test conditions as Sec. 2. Table 4 gives decision accuracies and termination ratios of the proposed algorithm under different thresholds (EarlyTH, AvgJLarge, and AvgJSmall). The decision accuracy and termination ratio are defined as follows:

Eq. (4)

{Accuracy(%)=NHitNEarly×100Ratio(%)=NEarlyNMB×100,
where NMB represents the number of MBs, NEarly represents the number of early termination in Eq. (2), and NHit is the number of MBs which early select large-size MB mode in Eq. (2), and their optimal MB modes are also large-size mode. For the threshold EarlyTH, the average decision accuracy is 96.5%, and the average termination ratio is 81.6% that is close to the average proportion of large-size MB mode. This indicate that EarlyTH can achieve large termination ratios with high decision accuracies. For threshold AvgJLarge, the average decision accuracy is 98.3%, while the average termination ratio is only 56.2%, which is 31.8% less than the average proportion of large-size MB mode. For the threshold AvgJSmall, all termination ratios are larger than the corresponding proportion of large-size MB mode, which leads to only 93.1% average decision accuracy. These results demonstrate that EarlyTH, which is calculated using global-local RD costs, is more suitable for the proposed algorithm than thresholds only using global RD costs.

Table 4

Decision accuracies and termination ratios of large-size MB mode under different thresholds.

SequencesBasis QPsDecision accuracies of large-size MB mode under different thresholds (%)Termination ratios of large-size MB mode under different thresholds (%)Proportion of large-size MB mode
EarlyTHAvgJLargeAvgJSmallEarlyTHAvgJLargeAvgJSmall
Exit2496.898.692.779.350.290.186.7
2898.499.695.283.855.093.490.9
3299.099.796.786.758.795.493.7
3699.399.897.989.061.296.695.8
Ballroom2493.596.887.976.151.486.979.2
2895.898.490.878.954.289.583.7
3297.399.393.381.656.391.687.7
3698.399.695.584.158.993.491.3
Race12490.893.686.874.755.185.479.4
2894.496.890.978.656.888.985.3
3296.698.593.681.857.791.989.5
3697.799.295.484.458.593.992.4
Average96.598.393.181.656.291.488.0

A flow diagram of the proposed algorithm is illustrated in Fig. 7, and the detailed steps of proposed algorithm are given as follows:

  • 1. If the current frame is an intraframe, perform the estimation of all intramodes, then go to step 7. Otherwise, get AvgJLarge and AvgJSmall from the coded frame for the current frame. If the current frame belongs to nonbase views, then the coded frame with the same time instant in the forward neighboring view is selected. If the current frame belongs to the base view, then the nearest forward-coded frame in the current view is employed.

  • 2. Process the estimation of large-size mode (Skip/Direct, Inter16×16, and Intra16×16) for current MB, and obtain Skip/Direct mode RD cost (JSkip/Direct), Inter16×16 mode RD cost (JInter16×16), and Intra16×16 mode RD cost. Then select the optimal large-size mode with the minimal RD cost (JLarge).

  • 3. Calculate the threshold EarlyTH in Eq. (3) by utilizing AvgJLarge, AvgJSmall, JSkip/Direct, and JInter16×16.

  • 4. Perform the early large-size mode decision in Eq. (2). If the JLarge is less than EarlyTH, then the optimal large-size mode is decided as the final MB mode, go to step 6.

  • 5. Perform the estimation of small-size mode (Inter16×8, Inter8×16, Inter8×8, Intra8×8, and Intra4×4). Then select the final optimal mode with the minimal RD cost, among all estimated modes.

  • 6. If all MBs of current frame have been processed, go to step 7, else go to step 2 for the mode decision of the next MB.

  • 7. Generate AvgJLarge and AvgJSmall of the current frame for the early large-size mode decision of subsequent encoding frames.

Fig. 7

Flow diagram of the proposed early large-size mode decision algorithm for multiview video coding.

JEI_23_1_013027_f007.png

4.

Experimental Results

The proposed algorithm was implemented on the MVC reference software JMVC 8.0, and its detailed configuration is given in Table 5. JMVC adopts the I-B-P prediction structure for view coding order and hierarchical B picture prediction structure for temporal coding order.3,25 Eight typical test sequences were chosen for the simulation, which include three 640×480 sequences (“Exit,” “Ballroom,” and “Race1”), four 1024×768 sequences (“Breakdancers,” “Ballet,” “Doorflowers,” and “Lovebird1”), and one 1280×960 sequence (“Dog”). These sequences are representative in camera setups, video scenes, and frame rates, and eight views (S0 to S7) were chosen for each sequence. Compared with the exhaustive mode decision in JMVC, the encoding time saving (ΔTime) was calculated to evaluate the computational performance, and the change of peak signal-to-noise ratio (ΔPSNR) and the change of bit rate (ΔBits) were calculated to evaluate the RD performance under each QP. To evaluate the overall RD performance under four basis QPs, Bjontegaard delta peak signal-to-noise ratio (BDPSNR) and Bjontegaard delta bit rate (BDBR)26 were also employed. A negative BDPSNR or a positive BDBR indicates a coding loss and is not preferred.

Table 5

Experimental configuration of JMVC.

Basis QP24, 28, 32, 36
Delta QP values0, 3, 4, 5, 6, 7
View order0-2-1-4-3-6-5-7
Group of picture (GOP) length12
Frames coded5GOPs×8views
Entropy codingCABAC
RDOOn
Search mode4 (TZ search)
Search range96
Bi prediction iteration4
Iteration search range8
Reference number2

Because interview views employ not only temporal prediction but also interview prediction for nonanchor frames, they consume more encoding time than temporal views. Table 6 gives the performances of the proposed algorithm for temporal views and interview views separately. For temporal views of the eight sequences, the proposed algorithm reduces encoding time from 49.1% to 71.3%, and the PSNR loss is from 0.013 to 0.042 dB with the change of bit rate from 0.35% to 0.05%. The average time saving of the eight sequences is 61.5%, while the average loss of PSNR is only 0.028 dB with a 0.15% decrement of average bit rate. For interview views of the eight sequences, the proposed algorithm reduces encoding time from 65.4% to 75.7%, and the loss of PSNR is from 0.022 to 0.059 dB with the change of bit rate from 0.49% to 0.42%. The average time saving of the eight sequences is 71.4%, while the average PSNR loss is only 0.035 dB with a 0.08% decrement of average bit rate. These demonstrate that the proposed algorithm achieves significant time saving while maintaining an RD performance similar to that of the original encoder. In Table 6, it can be seen that the proposed algorithm achieves about 10% more average time savings for interview views when compared with that for temporal views. This is because interview views consume more encoding time originally than temporal views. From Table 6, it can also be found that the proposed algorithm achieves more encoding time savings for the sequences only with a few motions, such as “Ballet,” “Doorflowers,” “Lovebird1,” and “Dog” sequences. This is because the original proportions of large-size MB mode in these sequences are larger than in other sequences, and the proposed algorithm saves more encoding time adaptively. For “Ballroom,” “Race1,” “Exit,” and “Breakdancer” sequences which have a lot of motions, the proposed algorithm also reduces the encoding time significantly from 65.4% to 73.5% for interview views.

Table 6

Performance of proposed algorithm under different basis QPs.

SequencesΔTime(%)/ΔPSNR(dB)/ΔBit(%)SequencesΔTime(%)/ΔPSNR(dB)/ΔBit(%)
Basis QPTemporal viewsInterview viewsBasis QPTemporal viewsInterview views
Exit2448.5/0.018/0.1967.9/0.020/0.24Break-dancers2454.4/0.033/0.0564.6/0.044/0.07
2853.6/0.019/0.2669.7/0.024/0.092857.1/0.028/0.2065.6/0.031/0.28
3258.0/0.019/0.0970.7/0.031/0.083258.6/0.012/0.0366.1/0.021/0.42
3660.2/0.012/0.1071.3/0.038/0.533658.5/0.009/0.0465.3/0.017/0.02
Avg.55.1/0.017/0.1669.9/0.028/0.12Avg.57.2/0.020/0.0865.4/0.028/0.16
Ballroom2445.2/0.035/0.0066.3/0.044/0.36Door-flowers2469.0/0.017/0.1075.6/0.027/0.53
2847.7/0.040/0.0666.7/0.047/0.052869.6/0.026/0.2174.5/0.028/0.39
3250.5/0.043/0.1867.0/0.045/0.063269.1/0.016/0.1673.4/0.016/0.42
3653.1/0.044/0.1466.9/0.045/0.093668.9/0.012/0.1472.3/0.015/0.33
Avg.49.1/0.040/0.1066.7/0.045/0.06Avg.69.2/0.018/0.0574.0/0.022/0.42
Race12463.4/0.050/0.0873.2/0.067/0.03Lovebird12469.6/0.020/0.0179.0/0.020/0.25
2864.7/0.051/0.0274.0/0.066/0.402871.4/0.030/0.3277.7/0.024/0.04
3265.6/0.033/0.3173.9/0.045/0.633272.0/0.045/0.5574.3/0.038/0.07
3666.1/0.032/0.3373.1/0.057/0.953672.5/0.034/0.4872.0/0.030/0.21
Avg.65.0/0.042/0.1473.5/0.059/0.49Avg.71.3/0.032/0.3475.7/0.028/0.00
Ballet2460.7/0.019/0.1574.6/0.023/0.19Dog2460.6/0.026/0.2273.8/0.031/0.10
2862.6/0.014/0.1073.8/0.025/0.122862.4/0.036/0.2973.6/0.036/0.32
3263.6/0.014/0.1272.6/0.023/0.093263.3/0.047/0.4472.5/0.050/0.38
3663.9/0.005/0.1870.9/0.023/0.243663.9/0.044/0.4670.8/0.056/0.52
Avg.62.7/0.013/0.0673.0/0.024/0.00Avg.62.6/0.038/0.3572.6/0.043/0.33
Average performance of temporal viewsAverage performance of interview views
61.5/0.028/0.1571.4/0.035/0.08

BD-metric26 was adopted to assess RD performances of the proposed algorithm and state-of-the-art mode decision algorithms. Table 7 gives the encoding time saving, BDPSNR and BDBR under four basis QPs for the proposed algorithm, and the state-of-the-art early Skip/Direct mode decision algorithm SDMET.18 For temporal views of the eight sequences, SDMET achieves a 46.8% time saving on average with a 0.04-dB BDPSNR loss and a 1.24% BDBR increment, and the proposed algorithm achieves a 14.7% more time saving with slightly better RD performance than SDMET. For interview views of the eight sequences, SDMET achieves a 54.7% time savings on average with a 0.04-dB BDPSNR loss and a 1.31% BDBR increment, and the proposed algorithm achieves a 16.7% more time savings with also a slightly better RD performance than SDMET. Table 7 also gives results of the fast intermode decision algorithm based on textural segmentation and correlations (indicated as FIMD), which has been presented in our previous work.14 FIMD includes an early Skip/Direct mode decision method and two assistant methods (selection of disparity estimation and the reduction of Inter8×8 mode estimation), and it was implemented on interview views. In Table 7, it can be seen that FIMD achieves a 58.5% time saving on average with a 0.00-dB BDPSNR loss and a 0.13% BDBR increment. The RD performance of FIMD almost maintains the same as the original JMVC, while the proposed algorithm has a 12.9% more time saving with a similar RD performance.

Table 7

Performance of the proposed algorithm and state-of-the-art algorithms under four basis QPs.

SequencesΔTime(%)/BDPSNR(dB)/BDBR(%)
Temporal viewsInterview views
ProposedSDMET in Ref. 18ProposedSDMET in Ref. 18FIMD in Ref. 14
Exit55.1/0.01/0.5144.1/0.02/0.8269.9/0.02/0.8757.2/0.02/0.9762.4/0.00/0.19
Ballroom49.1/0.04/0.9134.1/0.03/0.6866.7/0.05/1.2051.6/0.03/0.7852.6/0.01/0.37
Race165.0/0.04/0.8839.5/0.06/1.3473.5/0.04/0.9242.4/0.08/1.8252.6/0.01/0.15
Ballet62.7/0.01/0.3354.5/0.02/0.9573.0/0.02/0.6765.6/0.02/0.9765.0/0.00/0.03
Breakdancers57.2/0.02/0.8334.4/0.06/2.9365.4/0.02/0.9342.4/0.05/2.1443.3/0.00/0.05
Doorflowers69.2/0.02/0.7164.3/0.06/2.2074.0/0.04/1.1965.6/0.07/2.5567.2/0.00/0.09
Lovebird171.3/0.02/0.5953.1/0.01/0.2775.7/0.03/0.8455.1/0.01/0.3568.2/0.00/0.04
Dog62.6/0.02/0.6650.2/0.03/0.7472.6/0.03/0.9157.6/0.03/0.8656.9/0.01/0.16
Average61.5/0.02/0.6846.8/0.04/1.2471.4/0.03/0.9454.7/0.04/1.3158.5/0.00/0.13

For a better observation, Fig. 8 shows the histogram comparison of the time savings among the proposed algorithm, SDMET and FIMD for interview views of different sequences. It can be observed that the proposed algorithm achieves larger time savings than both SDMET and FIMD over all sequences. For sequences which have lots of fast motions and large disparities, such as “Race1” and “Breakdancers,” the time savings of SDMET and FIMD are less significant than the proposed algorithm. For “Race1” sequence, the proposed algorithm achieves up to 30% more time saving than SDMET, and it also achieves up to 20% more time saving than FIMD. For sequences which have a few motions and simple textures, such as “Ballet” and “Doorflowers,” the time savings of SDMET and FIMD are close to the proposed algorithm. The above comparisons indicate that the proposed algorithm also has more stable time savings than these two state-of-the-art algorithms. This is because that the original proportion of large-size MB mode is larger than the proportion of Skip/Direct MB mode for all sequences, especially for sequences with lots of fast motions. Thus, the performances of SDMET and FIMD are more sensitive to motions than the proposed algorithm. Additionally, SDMET and FIMD have not considered the optimization for anchor frames, which also affects their computational performances.

Fig. 8

Encoding time saving ratios of SDMET, FIMD, and the proposed algorithm for interview views of the eight sequences.

JEI_23_1_013027_f008.png

Moreover, Fig. 9 illustrates the time savings curves of the proposed algorithm, SDMET, and FIMD for interview views under different QPs. It can be seen that the proposed algorithm achieves a more stable and larger time savings than SDMET over different QPs. For “Race1” sequence, which has fast global motions, the proposed algorithm obtains about 30% more time savings than SDMET under all QPs. For “Dog” sequence, which has slow motions and small disparities, the time savings of SDMET is close to the proposed algorithm under higher QPs, while it has less time savings than the proposed algorithm under lower QPs. Due to the complex textures and picture noises in the static background of “Dog” sequence, the distribution of Skip/Direct MB mode is dispersive under lower QPs, and SDMET has lesser decision ratios for ensuring small RD degradation. With the increase of QPs, more MBs select Skip/Direct as the optimal MB mode, and SDMET also achieves much more time savings than that under lower QPs. Owing to the use of global RD costs obtained from neighbor views, the proposed algorithm has taken both QP and video contents into consideration, and its performance is not sensitive to QP and video contents. Compared with FIMD in our previous work,14 the proposed algorithm achieves about 20% more time saving under all QPs for “Race1” sequence, and it also achieves about 15% more time saving under all QPs for “Dog” sequence. It is because FIMD employs more strict decision conditions for maintaining the same RD performance as the original encoder, which leads to less time savings. Besides, the computational performance of FIMD depends largely on the proportion of Skip/Direct MB mode, which is originally less than the proportion of large-size mode. Therefore, the proposed algorithm can obtain more time savings.

Fig. 9

Time saving curves of proposed algorithm, SDMET, and FIMD under different basis QPs for interview views. (a) Time saving curves for “Race1” sequence. (b) Time saving curves for “Dog” sequence.

JEI_23_1_013027_f009.png

Although the proposed algorithm can reduce about 71% of encoding time for interview views, the computational complexity of interview views is still very large due to the disparity estimation for interview prediction. To further reduce the complexity, the proposed algorithm can be combined with state-of-the-art fast search algorithms, and it was integrated with the fast disparity estimation algorithm (indicated as FDE) in our previous work21 for interview views. Table 8 gives the performance of the integrated scheme. It can be seen that the integrated scheme achieves 86.3% average time saving, which is about 15% more time savings than the proposed algorithm. And the average coding efficiency loss of all sequences is 0.04-dB PSNR loss and 0.23% bit rate decrement (0.03-dB BDPSNR loss and 0.94% BDBR increment), which is same with the average performance of the proposed algorithm in Table 7. The view-adaptive motion estimation and disparity estimation (VAMEDE),13 which includes mode size decision, fast motion estimation, and selective disparity estimation, was also implemented on JMVC for interview views, and it achieves 79.6% average time saving with 0.04-dB PSNR loss and 1.79% bit rate increment (0.10-dB BDPSNR loss and 3.16% BDBR increment). The average performance of VAMEDE in Table 8 is consistent with that mentioned in Ref. 13, and the reason of the slightly coding efficiency degradation is as follows. VAMEDE is developed based on JMVM platform, which is the MVC reference software before JMVC. JMVM adopts the motion skip mode to highly improve the overall coding performance. However, the motion skip mode is excluded in JMVC, and VAMEDE would cause relatively larger coding efficiency loss on JMVC than that on JMVM. Compared with VAMEDE, our integrated scheme achieves about 6% more time saving on average with a same level PSNR loss and less bit rate increments. For “Lovebird1” and “Doorflowers” sequences which have a few smooth motions, our integrated scheme achieves about 3% more time saving than VAMEDE with similar coding efficiency loss. For “Race1” and “Ballroom” sequences which have a lot of complex motions, our integrated scheme achieves about 10% more time saving than VAMEDE with less coding efficiency loss.

Table 8

Performance of the proposed integrated scheme and VAMEDE under four basis QPs for interview views.

SequencesProposed algorithm+FDE in Ref. 21VAMEDE in Ref. 13
ΔTime (%)ΔPSNR (dB)ΔBit (%)BDPSNR (dB)BDBR (%)ΔTime (%)ΔPSNR (dB)ΔBit (%)BDPSNR (dB)BDBR (%)
Exit87.50.030.040.031.0682.00.061.950.114.04
Ballroom83.20.050.310.061.4872.70.053.190.184.67
Race185.70.060.92−0.010.3475.40.104.050.276.87
Ballet89.30.050.220.041.1283.50.030.240.041.11
Breakdancers81.30.030.280.031.0671.10.042.170.103.98
Doorflowers89.50.030.220.031.1386.70.021.510.062.16
Lovebird191.10.030.100.030.8087.90.010.130.020.46
Dog82.70.050.780.020.5577.30.031.080.071.97
Average86.30.040.230.030.9479.60.041.790.103.16

5.

Conclusion

To reduce the computational complexity of MVC, this work presents a fast mode decision algorithm which focuses on the early decision of large-size mode. Based on the global correlation of RD costs between views and the local correlation of RD costs among candidate modes, the average RD costs of large-size and small-size MB modes in the neighboring view are combined with large-size mode RD costs of the current MB for the early selection of large-size mode as the optimal MB mode. Compared with the exhaustive mode decision, experimental results show that the proposed algorithm saves the encoding time significantly with negligible loss of RD performance, and it also achieves a better performance than the state-of-the-art algorithms, especially for test sequences with fast motions and large disparities. Moreover, the proposed algorithm was integrated with the FDE of our previous work, and the integrated scheme also achieves better performance than the state-of-the-art algorithm VAMEDE.

Acknowledgments

This work was supported in part by the Zhejiang Provincial Natural Science Foundation of China under Grants LQ12F01008 and Y1110532 and the Natural Science Foundation of China under Grant 61303139. The authors would like to thank journal reviewers for their insightful comments and valuable suggestions.

References

1. 

“Advanced video coding for generic audiovisual services,” ITU-T Recommendation H.264 and ISO/IEC 14496 (MPEG-4 AVC), (2010). Google Scholar

2. 

A. VetroT. WiegandG. J. Sullivan, “Overview of the stereo and multiview video coding extensions of the H.264/AVC standard,” Proc. IEEE, 99 (4), 626 –642 (2011). http://dx.doi.org/10.1109/JPROC.2010.2098830 IEEPAD 0018-9219 Google Scholar

3. 

P. Merkleet al., “Efficient prediction structures for multiview video coding,” IEEE Trans. Circuits Syst. Video Technol., 17 (11), 1461 –1473 (2007). http://dx.doi.org/10.1109/TCSVT.2007.903665 ITCTEM 1051-8215 Google Scholar

4. 

D. Wuet al., “Fast intermode decision in H.264/AVC video coding,” IEEE Trans. Circuits Syst. Video Technol., 15 (7), 953 –958 (2005). http://dx.doi.org/10.1109/TCSVT.2005.848304 ITCTEM 1051-8215 Google Scholar

5. 

S. Huet al., “Fast inter-mode decision based on rate-distortion cost characteristics,” in Pacific-Rim Conf. Multimedia, 145 –155 (2010). Google Scholar

6. 

Y. SungJ. Wang, “Fast mode decision for H.264/AVC based on rate-distortion clustering,” IEEE Trans. Multimedia, 14 (3), 693 –702 (2012). http://dx.doi.org/10.1109/TMM.2012.2186793 ITMUF8 1520-9210 Google Scholar

7. 

L. Dinget al., “Content-aware prediction algorithm with inter-view mode decision for multiview video coding,” IEEE Trans. Multimedia, 10 (8), 1553 –1564 (2008). http://dx.doi.org/10.1109/TMM.2008.2007314 ITMUF8 1520-9210 Google Scholar

8. 

B. Zattet al., “A multi-level dynamic complexity reduction scheme for multiview video coding,” in IEEE Int. Conf. Image Process., 749 –752 (2011). Google Scholar

9. 

T. Zhaoet al., “Multiview coding mode decision with hybrid optimal stopping model,” IEEE Trans. Image Process., 22 (4), 1598 –1609 (2013). http://dx.doi.org/10.1109/TIP.2012.2235451 IIPRE4 1057-7149 Google Scholar

10. 

L. Shenet al., “Low-complexity mode decision for MVC,” IEEE Trans. Circuits Syst. Video Technol., 21 (6), 837 –843 (2011). http://dx.doi.org/10.1109/TCSVT.2011.2130310 ITCTEM 1051-8215 Google Scholar

11. 

H. ZengK. MaC. Cai, “Fast mode decision for multiview video coding using mode correlation,” IEEE Trans. Circuits Syst. Video Technol., 21 (11), 1659 –1666 (2011). http://dx.doi.org/10.1109/TCSVT.2011.2133350 ITCTEM 1051-8215 Google Scholar

12. 

L. Shenet al., “Selective disparity estimation and variable size motion estimation based on motion homogeneity for multi-view coding,” IEEE Trans. Broadcast., 55 (4), 761 –766 (2009). http://dx.doi.org/10.1109/TBC.2009.2030453 IETBAC 0018-9316 Google Scholar

13. 

L. Shenet al., “View-adaptive motion estimation and disparity estimation for low complexity multiview video coding,” IEEE Trans. Circuits Syst. Video Technol., 20 (6), 925 –930 (2010). http://dx.doi.org/10.1109/TCSVT.2010.2045910 ITCTEM 1051-8215 Google Scholar

14. 

W. Zhuet al., “Fast inter mode decision based on textural segmentation and correlations for multiview video coding,” IEEE Trans. Consumer Electron., 56 (3), 1696 –1704 (2010). http://dx.doi.org/10.1109/TCE.2010.5606315 ITCEDA 0098-3063 Google Scholar

15. 

H. ZengK. MaC. Cai, “Mode-correlation-based early termination mode decision for multi-view video coding,” in IEEE Int. Conf. on Image Process., 3405 –3408 (2010). Google Scholar

16. 

B. Zattet al., “An adaptive early skip mode decision scheme for multiview video coding,” in Picture Coding Symp., 42 –45 (2010). Google Scholar

17. 

L. Shenet al., “Early SKIP mode decision for MVC using inter-view correlation,” Signal Process.: Image Commun., 25 (2), 88 –93 (2010). http://dx.doi.org/10.1016/j.image.2009.11.003 SPICEF 0923-5965 Google Scholar

18. 

Y. Zhanget al., “Statistical early termination model for fast mode decision and reference frame selection in multiview video coding,” IEEE Trans. Broadcast., 58 (1), 10 –23 (2012). http://dx.doi.org/10.1109/TBC.2011.2174282 IETBAC 0018-9316 Google Scholar

19. 

S. Khattaket al., “Fast encoding techniques for multiview video coding,” Signal Process.: Image Commun., 28 (6), 569 –580 (2013). http://dx.doi.org/10.1016/j.image.2012.12.010 SPICEF 0923-5965 Google Scholar

20. 

L. Shenet al., “Macroblock-level adaptive search range algorithm for motion estimation in multiview video coding,” J. Electron. Imaging, 18 (3), 033003 (2009). http://dx.doi.org/10.1117/1.3167850 JEIME5 1017-9909 Google Scholar

21. 

W. Zhuet al., “Fast disparity estimation using spatio-temporal correlation of disparity field for multiview video coding,” IEEE Trans. Consumer Electron., 56 (2), 957 –964 (2010). http://dx.doi.org/10.1109/TCE.2010.5506026 ITCEDA 0098-3063 Google Scholar

22. 

Y. Zhanget al., “Efficient multi-reference frame selection algorithm for hierarchical B pictures in multiview video coding,” IEEE Trans. Broadcast., 57 (1), 15 –23 (2011). http://dx.doi.org/10.1109/TBC.2010.2082670 IETBAC 0018-9316 Google Scholar

23. 

Z. Denget al., “Iterative search strategy with selective bi-directional prediction for low complexity multiview video coding,” J. Vis. Commun. Image R., 23 522 –534 (2012). http://dx.doi.org/10.1016/j.jvcir.2012.01.016 JVCRE7 1047-3203 Google Scholar

24. 

T. Wiegandet al., “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits Syst. Video Technol., 13 (7), 688 –703 (2003). http://dx.doi.org/10.1109/TCSVT.2003.815168 ITCTEM 1051-8215 Google Scholar

25. 

A. Vetroet al., “Joint Multiview Video Model (JMVM) 8.0,” ISO/IEC JTC1/SC29/WG11 and ITU-T Q6/SG16, Geneva (2008). Google Scholar

26. 

G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” ITU-T Q6/SG16, (2001). Google Scholar

Biography

Wei Zhu received his BSc and PhD degrees from Zhejiang University, Hangzhou, China, in 2004 and 2010, respectively. He is currently a lecturer of Zhejiang University of Technology. His major research fields are video coding, video analysis, and parallel processing.

Yayu Zheng received his BSc and PhD degrees from Zhejiang University, Hangzhou, China, in 2002 and 2008, respectively. He is currently an associate professor of Zhejiang University of Technology. His major research fields are networking multimedia systems and video coding.

Peng Chen received his BSc and PhD degrees from Zhejiang University, Hangzhou, China, in 2003 and 2009, respectively. He is currently an associate professor of Zhejiang University of Technology. His major research fields are signal processing and system design for multimedia.

Jiani Xie received her BSc and MS degrees from Zhejiang University, Hangzhou, China, in 2004 and 2008, respectively. She is currently in State Intellectual Property Office. Her major research field is 3-D video processing.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Wei Zhu, Yayu Zheng, Peng Chen, and Jiani Xie "Global-local correlation-based early large-size mode decision for multiview video coding," Journal of Electronic Imaging 23(1), 013027 (19 February 2014). https://doi.org/10.1117/1.JEI.23.1.013027
Published: 19 February 2014
Lens.org Logo
CITATIONS
Cited by 2 patents.
Advertisement
Advertisement
KEYWORDS
Computer programming

Video coding

Video

Distortion

Motion estimation

Optimization (mathematics)

3D displays

Back to Top