Multiview video is captured from a set of viewpoints, and it is useful in many multimedia applications, such as three-dimensional (3-D) television, free viewpoint television, and glass-free portable 3-D display. Multiview video coding (MVC) was developed for the storage and transmission of very large multiview video data, and it has been standardized as the extension of H.264/AVC.1 Figure 1 shows an illustration of the basic prediction structure in the MVC reference software JMVC, where the group of picture (GOP) length is eight for each view. According to the backward compatibility, all views can be classified into a base view and some nonbase views. Only the base view (S0 in Fig. 1) is backward compatible with H.264/AVC; nonbase views (S1 and S2 in Fig. 1) are encoded with new coding tools for providing complete multiview video bitstreams.2 Every GOP in each view includes one anchor frame and some nonanchor frames. Frames with the time instants T0 and T8 in Fig. 1 are anchor frames, and other frames are nonanchor frames. For the random access of video bitstreams, anchor frames are not allowed to adopt temporal prediction. To improve the compression performance, nonanchor frames adopt temporal prediction and interview prediction.3 In Fig. 1, solid arrows represent temporal prediction and dotted arrows represent interview prediction. According to prediction directions for nonanchor frames, all views can also be classified into temporal views and interview views. Nonanchor frames employ both temporal prediction and interview prediction for interview views, while they only employ temporal prediction for temporal views. In Fig. 1, views S0 and S2 belong to temporal views, and view S1 belongs to interview views.
Like H.264/AVC, MVC also needs to select the optimal macroblock (MB) mode among multiple candidate modes for each frame, and it has five intermodes (Skip/Direct, , , , and ) and three intramodes (, , and ). Except for Skip/Direct, other intermodes are needed to consume a lot of time for motion/disparity estimation in reference frames. In MVC reference software JMVC, the exhaustive mode decision algorithm is used to select the optimal MB mode, and it checks all candidate modes in sequence. Besides, due to the use of interview prediction, the computational complexity of mode decision for single view is greater than that of H.264/AVC. Thus, the computational complexity of MVC is very high, and it has hindered the practical use for real-time and mobile applications.
Many state-of-the-art fast mode decision algorithms have been developed for H.264/AVC. Wu et al.4 presented a fast intermode decision by using of the spatial homogeneity and the temporal stationarity characteristic of video objects. Hu et al.5 proposed a fast intermode decision algorithm based on rate-distortion (RD) cost characteristics, which includes an early skip mode decision and a three-stage mode prediction. Sung and Wang6 introduced a multiphase classification scheme that builds a mode decision tree according to the clustering of RD costs. These algorithms can be employed to speed up the mode decision of MVC, and their ideas could also enlighten the design of fast mode decision algorithms for MVC. However, the complexity of MVC is still very high, and it could be further reduced by using characteristics of MVC. To address this issue, various mode decision algorithms have been studied, including Refs. 18.104.22.168.22.214.171.124.16.17.18.–19. These algorithms presented some effective optimization techniques, such as adaptive termination strategy,78.9.–10 candidate modes selection,1011.12.–13 prediction direction selection,1213.–14 and early Skip/Direct mode decision.1415.16.17.–18 For the reduction of the whole complexity, Shen et al.13 combined the candidate modes selection with the fast motion estimation and the prediction direction selection; meanwhile, Khattak et al.19 provided a complete framework that includes not only mode decision but also reference frame selection and fast motion/disparity estimation. In these algorithms, the correlation of coding information between neighbor views and the RD costs of MB mode in MVC are usually employed to arrive at a faster mode decision. As Skip/Direct mode occupies the largest proportion of MB modes with negligible computational complexity, several algorithms only focused on early Skip/Direct mode decision. Zeng et al.15 introduced an early decision algorithm by using RD costs of nearby MBs. Zatt et al.16 proposed an early decision algorithm based on mode correlation in the 3-D-neighborhood. Shen et al.17 presented an early decision algorithm based on the analysis of prediction mode distribution regarding the corresponding MBs in the neighbor view. Zhang et al.18 proposed an efficient statistical Skip/Direct mode termination model named SDMET to adjust the RD cost threshold adaptively by using statistical information of coded MBs. As mentioned above, early Skip/Direct mode decision algorithms can reduce the complexity effectively with high RD performance, and they also can be combined with fast algorithms of motion/disparity estimation and multireference frames selection1920.21.22.–23 to further reduce the complexity. However, since they mainly utilize the local correlation of coding information between neighbor views, the global correlation of coding information has not been exploited. Moreover, they cannot perform very well for video scenes with fast motions and large disparities, and they also have not considered the complexity reduction for anchor frames.
In this article, an early large-size mode decision algorithm based on the global-local correlation of RD costs is proposed to reduce the computational complexity of MVC. According to mode sizes and RD properties, all candidate modes in mode decision are classified into two types: the large-size mode and the small-size mode. The large-size mode includes Skip/Direct, , and , and the small-size mode includes , , , , and , where further contains four submodes (, , , and ). Compared with the small-size mode, the large-size mode occupies much more MB mode proportion with much less computational complexity. Because of including and modes, the large-size mode also occupies more proportion than Skip/Direct mode, especially for frames with fast motions and large disparities. Therefore, the proposed algorithm focuses on the early decision of large-size mode instead of the early decision of Skip/Direct mode. The global correlation of RD costs between views is adopted to calculate the basis portion of the early decision threshold. For each MB, the local correlation of RD costs among different size modes is employed to calculate the adjustable portion of the threshold, and the minimal RD cost of the large-size mode is compared with the threshold to early terminate mode decision. In addition, the proposed algorithm also considers the optimization for frames in the base view by using the nearest forward-coded frame on temporal direction. Therefore, the proposed algorithm can be applied to all interframes of all views for effectively reducing the whole computational complexity of MVC.
The rest of this article is organized as follows: In Sec. 2, the characteristics of large-size and small-size modes are analyzed. Then, an early large-size mode decision based on the global-local correlation of RD costs is proposed in Sec. 3. Experimental results and conclusions are given in Secs. 4 and 5, respectively.
Motivation and Analysis
As an extension of H.264/AVC, MVC also employs the RD optimization technique24 to select the optimal MB mode with the minimum RD cost, among candidate inter- or intramodes, and RD cost of each mode includes rate portion and distortion portion. Both inter and intramodes include different size modes, which have distinct characteristics of rate portion and distortion portion. By providing more precise motion/disparity estimation, small-size intermodes which include , , and can obtain less distortion portions of RD costs than large-size intermodes (Skip/Direct, ). However, because small-size intermodes need more bits for encoding motion vector of each partition block, they have larger rate portions of RD costs than that of large-size intermodes. For MBs with complex motion/disparity, small-size intermodes can obtain a much less distortion portion of RD cost than large-size intermodes, and they are more likely to be selected as the optimal MB mode. For MBs with smooth motion/disparity, large-size intermodes usually can provide the same level performance of distortion portions as small-size modes, and they have less rate portions than small-size intermodes due to encoding only partition information. Similar to intermodes, small-size intramodes ( and ) can also obtain less distortion portions and more rate portions than that of large-size intramode (). For MBs with complex textures, small-size intramodes will often have less RD costs than large-size intramode. For MBs with smooth texture, large-size intramode usually obtains less RD cost than small-size intramode. In this article, according to the RD properties of different size modes, all intermodes and intramodes are classified into the large-size mode (includes Skip/Direct, , and modes) and the small-size mode (includes , , , , and modes).
In order to verify the above theoretical analysis, JMVC 8.0 was used to investigate statistic characteristics of large-size and small-size modes, and three typical test sequences (“Exit,” “Ballroom,” and “Race1”) with eight views (S0 to S7) were chosen. The GOP length was set to 12, and five GOPs were selected for each view. For each frame, the default coding conditions were used: maximum two reference frames are available for forward reference list and backward reference list, respectively, only one interview reference frame is allowed for each reference list, search method “TZ search” is enabled with search range 96. Table 1 gives the MB proportion of large-size and small-size modes, and the experimental basis quantization parameter (QP) is 32. It can be seen that the large-size mode occupies a proportion of 85% to 97%, which is much larger than the small-size mode. The mode proportion inside large-size MB mode under different basis QPs for “Race1” sequence is further shown in Fig. 2. It can be observed that Skip/Direct occupies the largest proportion, while occupies the second largest proportion and occupies a considerable proportion. Furthermore, the encoding time proportions of large-size and small-size modes are given in Table 2. The large-size mode consumes 7% to 15% of the encoding time for three sequences over eight views, whereas the small-size mode consumes 83% to 86% of the encoding time, which is much greater than that of the large-size mode. From Tables 1 and 2, it can be found that the large-size mode occupies a majority of the proportion, but only consumes a small part of the encoding time. Thus, if the large-size mode can be identified early as the optimal MB mode, the estimation of small-size mode can be skipped. In addition, it also indicated that the statistical proportions of MB modes are similar, among various views for each sequence. This is because those different views’ video are originated from the same scene, and the effect of occlusions between views is small in most cases. So, the statistical information of neighboring views can be adopted for the early decision of large-size mode.
Macroblock (MB) mode proportion for “Exit,” “Ballroom,” and “Race1” sequences.
|View IDs||Proportion of large-size MB mode (%)||Proportion of small-size MB mode (%)|
Encoding time proportion for “Exit,” “Ballroom,” and “Race1” sequences.
|View IDs||Encoding time proportion of large-size mode (%)||Encoding time proportion of small-size mode (%)|
Note: Experimental conditions are the same as Table 1.
Further studies on RD cost characteristics of large-size mode and small-size mode are performed, and experimental conditions are the same as Table 1. RD costs distribution of large-size and small-size MB modes in a single frame is shown in Fig. 3. It can be seen that most of large-size MB mode RD costs are less than the average RD cost of small-size MB mode in Fig. 3(a), and most of small-size MB mode RD costs are larger than the average RD cost of large-size MB mode in Fig. 3(b). These indicate that most of large-size MB modes have relatively low RD costs and most of small-size MB modes have relatively high RD costs, which are consistent with our previous theoretical analysis, and it can draw the conclusion that the average RD cost of large-size MB modes and the average RD cost of small-size MB modes can be used to calculate the thresholds for the early decision of large-size mode.
Moreover, the RD cost gap between large-size mode and small-size mode is further analyzed. Figure 4 shows the sorting of large-size mode RD costs with the red curve, and the corresponding small-size mode RD costs with same sorting indexes are also given. It can be seen that most large-size mode RD costs are smaller than their corresponding small-size mode RD costs. This is consistent with the statistical results in Table 1 and Fig. 3. If the large-size mode RD cost is relatively low, the large-size mode is likely to be selected as the optimal MB mode. The RD cost gap between the large-size mode and the small-size mode is increasing with the large-size mode RD cost. Therefore, if the large-size mode RD cost is a small value, the misjudgment cost of early termination would also be small.
Characteristics of large-size and small-size modes in MVC are summarized as follows:
1. The large-size mode occupies most of the proportion of MB modes, while it consumes only a small part of the encoding time. The small-size mode occupies a smaller proportion of MB modes, but consumes most of the encoding time.
2. Although Skip/Direct occupies the largest proportion inside large-size MB mode, the proportions of and are also considerable.
3. Compared with small-size mode RD costs, most of large-size mode RD costs are relatively low, and their gaps are small when RD costs are low.
4. Different test sequences have various MB mode proportions, while different views in the same test sequence have similar MB mode proportions as they originate from the same scene. Thus, there is a global correlation of coding information between views, and the average RD cost of large-size MB modes and the average RD cost of small-size MB modes in neighboring views can be used for the mode decision of current view.
Proposed Early Large-Size Mode Decision Algorithm
According to the analysis in the previous section, the proposed algorithm focuses on the early decision of large-size mode, which is estimated first, then, its RD cost is compared with a global–local adaptive threshold to early terminate mode decision. The detail processes are introduced as follows.
Based on the motivation in Fig. 3, the average RD cost of large-size MB modes () and the average RD cost of small-size MB modes () of the coded frame are employed. For nonbase views, the coded frame in the forward neighboring view with the same time instant as the current frame is selected to calculate as follows:Fig. 5. It can be seen that both and increase with basis QPs. Because the calculation of RD cost is relative to QP, , and are adaptive to the change of QP, and they are suitable to be employed as the global reference. For each MB, the estimation of Skip/Direct, , and are performed, and the minimum RD cost of these estimated modes is selected as the large-size mode RD cost . Then, the early decision of large-size mode is determined in Eq. (2):
First, of the coded frame can be adopted as a measurement for in the current frame, and it is multiplied with a parameter as the EarlyTH. To study the relation between parameter and the variation of RD performance which is caused by the misjudgment of large-size mode, the increments of total RD costs for MBs, which select small-size mode as the optimal mode are shown in Fig. 6, where the experimental basis QP is 32. It can be seen that the increments of RD costs are very small when is 1, while the increments obviously grow when is larger than 1. Thus, ( is equal to 1) can be employed as the basis portion of the threshold. Although is adaptive to video content of the current frame, the gap between and also changes with basis QPs in Fig. 5. If the threshold is directly calculated by multiplying with a fixed parameter, the performance of early termination will not be stable. The threshold may be larger than under small basis QPs, then the RD performance of the early decision will drop dramatically due to the low-decision accuracy. And the threshold also may be much less than under large basis QPs, which will lead to low time savings because of the small early termination ratio. So, a good threshold calculation method that solves the above problem is expected in the proposed algorithm.
Second, the threshold is further adjusted by utilizing the local feature of MB to improve the computational performance while maintaining a high RD performance. After the estimation of large-size mode, its RD costs reflect the local feature of current MB, and it can be used to predict the probability of a small-size MB mode. To reduce the misjudgment of large-size MB mode, MBs which selected the small-size mode as the final MB mode are employed to study the relation between large-size mode and small-size mode. For these MBs, Skip/Direct mode RD cost () and mode RD cost () are compared, and ratios of the smaller RD cost are given in Table 3, where the experimental conditions are same with Sec. 2. It can be seen that occupies a majority of the proportion. Therefore, if is less than , there is more probability for current MB to select the small-size mode, and the threshold should be decreased for maintaining RD performance. Conversely, if is less than , the threshold could be increased for achieving more time saving.
Proportions of the smaller rate-distortion (RD) cost between JSkip/Direct and JInter16×16.
|Sequences||Basis quantization parameters (QPs)||Proportions of the smaller RD cost (%)|
Based on the above analysis, EarlyTH in Eq. (2) is finally calculated as follows for each MB:3), if is less than , EarlyTH is closer to , and if is less than , EarlyTH is closer to . Thus, the range of EarlyTH is from to , and its value is dependent on the local MB coding feature. For different video sequences, a few frames may have no small-size MB mode under high basis QPs. In this particular case, is replaced with according to extensive experiments for getting a preferable performance.
In addition, the proposed algorithm is also extended to the base view by using and of the nearest forward-coded frame in the current view. Thus, it can reduce the computational complexity of all views. Besides, the proposed algorithm need not to store all MB RD costs of the coded frame because the use of global correlation, and only two clasified average RD costs of current frame are stored for the optimization of following frames.
To verify the effectiveness of the proposed algorithm, simulations have been conducted under the same test conditions as Sec. 2. Table 4 gives decision accuracies and termination ratios of the proposed algorithm under different thresholds (EarlyTH, , and ). The decision accuracy and termination ratio are defined as follows:2), and is the number of MBs which early select large-size MB mode in Eq. (2), and their optimal MB modes are also large-size mode. For the threshold EarlyTH, the average decision accuracy is 96.5%, and the average termination ratio is 81.6% that is close to the average proportion of large-size MB mode. This indicate that EarlyTH can achieve large termination ratios with high decision accuracies. For threshold , the average decision accuracy is 98.3%, while the average termination ratio is only 56.2%, which is 31.8% less than the average proportion of large-size MB mode. For the threshold , all termination ratios are larger than the corresponding proportion of large-size MB mode, which leads to only 93.1% average decision accuracy. These results demonstrate that EarlyTH, which is calculated using global-local RD costs, is more suitable for the proposed algorithm than thresholds only using global RD costs.
Decision accuracies and termination ratios of large-size MB mode under different thresholds.
|Sequences||Basis QPs||Decision accuracies of large-size MB mode under different thresholds (%)||Termination ratios of large-size MB mode under different thresholds (%)||Proportion of large-size MB mode|
A flow diagram of the proposed algorithm is illustrated in Fig. 7, and the detailed steps of proposed algorithm are given as follows:
1. If the current frame is an intraframe, perform the estimation of all intramodes, then go to step 7. Otherwise, get and from the coded frame for the current frame. If the current frame belongs to nonbase views, then the coded frame with the same time instant in the forward neighboring view is selected. If the current frame belongs to the base view, then the nearest forward-coded frame in the current view is employed.
2. Process the estimation of large-size mode (Skip/Direct, , and ) for current MB, and obtain Skip/Direct mode RD cost (), mode RD cost (), and mode RD cost. Then select the optimal large-size mode with the minimal RD cost ().
3. Calculate the threshold EarlyTH in Eq. (3) by utilizing , , , and .
4. Perform the early large-size mode decision in Eq. (2). If the is less than EarlyTH, then the optimal large-size mode is decided as the final MB mode, go to step 6.
5. Perform the estimation of small-size mode (, , , , and ). Then select the final optimal mode with the minimal RD cost, among all estimated modes.
6. If all MBs of current frame have been processed, go to step 7, else go to step 2 for the mode decision of the next MB.
7. Generate and of the current frame for the early large-size mode decision of subsequent encoding frames.
The proposed algorithm was implemented on the MVC reference software JMVC 8.0, and its detailed configuration is given in Table 5. JMVC adopts the I-B-P prediction structure for view coding order and hierarchical B picture prediction structure for temporal coding order.3,25 Eight typical test sequences were chosen for the simulation, which include three sequences (“Exit,” “Ballroom,” and “Race1”), four sequences (“Breakdancers,” “Ballet,” “Doorflowers,” and “Lovebird1”), and one sequence (“Dog”). These sequences are representative in camera setups, video scenes, and frame rates, and eight views (S0 to S7) were chosen for each sequence. Compared with the exhaustive mode decision in JMVC, the encoding time saving () was calculated to evaluate the computational performance, and the change of peak signal-to-noise ratio () and the change of bit rate () were calculated to evaluate the RD performance under each QP. To evaluate the overall RD performance under four basis QPs, Bjontegaard delta peak signal-to-noise ratio (BDPSNR) and Bjontegaard delta bit rate (BDBR)26 were also employed. A negative BDPSNR or a positive BDBR indicates a coding loss and is not preferred.
Experimental configuration of JMVC.
|Basis QP||24, 28, 32, 36|
|Delta QP values||0, 3, 4, 5, 6, 7|
|Group of picture (GOP) length||12|
|Frames coded||5 GOPs×8 views|
|Search mode||4 (TZ search)|
|Bi prediction iteration||4|
|Iteration search range||8|
Because interview views employ not only temporal prediction but also interview prediction for nonanchor frames, they consume more encoding time than temporal views. Table 6 gives the performances of the proposed algorithm for temporal views and interview views separately. For temporal views of the eight sequences, the proposed algorithm reduces encoding time from 49.1% to 71.3%, and the PSNR loss is from 0.013 to 0.042 dB with the change of bit rate from to 0.05%. The average time saving of the eight sequences is 61.5%, while the average loss of PSNR is only 0.028 dB with a 0.15% decrement of average bit rate. For interview views of the eight sequences, the proposed algorithm reduces encoding time from 65.4% to 75.7%, and the loss of PSNR is from 0.022 to 0.059 dB with the change of bit rate from to 0.42%. The average time saving of the eight sequences is 71.4%, while the average PSNR loss is only 0.035 dB with a 0.08% decrement of average bit rate. These demonstrate that the proposed algorithm achieves significant time saving while maintaining an RD performance similar to that of the original encoder. In Table 6, it can be seen that the proposed algorithm achieves about 10% more average time savings for interview views when compared with that for temporal views. This is because interview views consume more encoding time originally than temporal views. From Table 6, it can also be found that the proposed algorithm achieves more encoding time savings for the sequences only with a few motions, such as “Ballet,” “Doorflowers,” “Lovebird1,” and “Dog” sequences. This is because the original proportions of large-size MB mode in these sequences are larger than in other sequences, and the proposed algorithm saves more encoding time adaptively. For “Ballroom,” “Race1,” “Exit,” and “Breakdancer” sequences which have a lot of motions, the proposed algorithm also reduces the encoding time significantly from 65.4% to 73.5% for interview views.
Performance of proposed algorithm under different basis QPs.
|Sequences||ΔTime (%)/ΔPSNR (dB)/ΔBit (%)||Sequences||ΔTime (%)/ΔPSNR (dB)/ΔBit (%)|
|Basis QP||Temporal views||Interview views||Basis QP||Temporal views||Interview views|
|Average performance of temporal views||Average performance of interview views|
BD-metric26 was adopted to assess RD performances of the proposed algorithm and state-of-the-art mode decision algorithms. Table 7 gives the encoding time saving, BDPSNR and BDBR under four basis QPs for the proposed algorithm, and the state-of-the-art early Skip/Direct mode decision algorithm SDMET.18 For temporal views of the eight sequences, SDMET achieves a 46.8% time saving on average with a 0.04-dB BDPSNR loss and a 1.24% BDBR increment, and the proposed algorithm achieves a 14.7% more time saving with slightly better RD performance than SDMET. For interview views of the eight sequences, SDMET achieves a 54.7% time savings on average with a 0.04-dB BDPSNR loss and a 1.31% BDBR increment, and the proposed algorithm achieves a 16.7% more time savings with also a slightly better RD performance than SDMET. Table 7 also gives results of the fast intermode decision algorithm based on textural segmentation and correlations (indicated as FIMD), which has been presented in our previous work.14 FIMD includes an early Skip/Direct mode decision method and two assistant methods (selection of disparity estimation and the reduction of mode estimation), and it was implemented on interview views. In Table 7, it can be seen that FIMD achieves a 58.5% time saving on average with a 0.00-dB BDPSNR loss and a 0.13% BDBR increment. The RD performance of FIMD almost maintains the same as the original JMVC, while the proposed algorithm has a 12.9% more time saving with a similar RD performance.
Performance of the proposed algorithm and state-of-the-art algorithms under four basis QPs.
|Sequences||ΔTime (%)/BDPSNR (dB)/BDBR (%)|
|Temporal views||Interview views|
|Proposed||SDMET in Ref. 18||Proposed||SDMET in Ref. 18||FIMD in Ref. 14|
For a better observation, Fig. 8 shows the histogram comparison of the time savings among the proposed algorithm, SDMET and FIMD for interview views of different sequences. It can be observed that the proposed algorithm achieves larger time savings than both SDMET and FIMD over all sequences. For sequences which have lots of fast motions and large disparities, such as “Race1” and “Breakdancers,” the time savings of SDMET and FIMD are less significant than the proposed algorithm. For “Race1” sequence, the proposed algorithm achieves up to 30% more time saving than SDMET, and it also achieves up to 20% more time saving than FIMD. For sequences which have a few motions and simple textures, such as “Ballet” and “Doorflowers,” the time savings of SDMET and FIMD are close to the proposed algorithm. The above comparisons indicate that the proposed algorithm also has more stable time savings than these two state-of-the-art algorithms. This is because that the original proportion of large-size MB mode is larger than the proportion of Skip/Direct MB mode for all sequences, especially for sequences with lots of fast motions. Thus, the performances of SDMET and FIMD are more sensitive to motions than the proposed algorithm. Additionally, SDMET and FIMD have not considered the optimization for anchor frames, which also affects their computational performances.
Moreover, Fig. 9 illustrates the time savings curves of the proposed algorithm, SDMET, and FIMD for interview views under different QPs. It can be seen that the proposed algorithm achieves a more stable and larger time savings than SDMET over different QPs. For “Race1” sequence, which has fast global motions, the proposed algorithm obtains about 30% more time savings than SDMET under all QPs. For “Dog” sequence, which has slow motions and small disparities, the time savings of SDMET is close to the proposed algorithm under higher QPs, while it has less time savings than the proposed algorithm under lower QPs. Due to the complex textures and picture noises in the static background of “Dog” sequence, the distribution of Skip/Direct MB mode is dispersive under lower QPs, and SDMET has lesser decision ratios for ensuring small RD degradation. With the increase of QPs, more MBs select Skip/Direct as the optimal MB mode, and SDMET also achieves much more time savings than that under lower QPs. Owing to the use of global RD costs obtained from neighbor views, the proposed algorithm has taken both QP and video contents into consideration, and its performance is not sensitive to QP and video contents. Compared with FIMD in our previous work,14 the proposed algorithm achieves about 20% more time saving under all QPs for “Race1” sequence, and it also achieves about 15% more time saving under all QPs for “Dog” sequence. It is because FIMD employs more strict decision conditions for maintaining the same RD performance as the original encoder, which leads to less time savings. Besides, the computational performance of FIMD depends largely on the proportion of Skip/Direct MB mode, which is originally less than the proportion of large-size mode. Therefore, the proposed algorithm can obtain more time savings.
Although the proposed algorithm can reduce about 71% of encoding time for interview views, the computational complexity of interview views is still very large due to the disparity estimation for interview prediction. To further reduce the complexity, the proposed algorithm can be combined with state-of-the-art fast search algorithms, and it was integrated with the fast disparity estimation algorithm (indicated as FDE) in our previous work21 for interview views. Table 8 gives the performance of the integrated scheme. It can be seen that the integrated scheme achieves 86.3% average time saving, which is about 15% more time savings than the proposed algorithm. And the average coding efficiency loss of all sequences is 0.04-dB PSNR loss and 0.23% bit rate decrement (0.03-dB BDPSNR loss and 0.94% BDBR increment), which is same with the average performance of the proposed algorithm in Table 7. The view-adaptive motion estimation and disparity estimation (VAMEDE),13 which includes mode size decision, fast motion estimation, and selective disparity estimation, was also implemented on JMVC for interview views, and it achieves 79.6% average time saving with 0.04-dB PSNR loss and 1.79% bit rate increment (0.10-dB BDPSNR loss and 3.16% BDBR increment). The average performance of VAMEDE in Table 8 is consistent with that mentioned in Ref. 13, and the reason of the slightly coding efficiency degradation is as follows. VAMEDE is developed based on JMVM platform, which is the MVC reference software before JMVC. JMVM adopts the motion skip mode to highly improve the overall coding performance. However, the motion skip mode is excluded in JMVC, and VAMEDE would cause relatively larger coding efficiency loss on JMVC than that on JMVM. Compared with VAMEDE, our integrated scheme achieves about 6% more time saving on average with a same level PSNR loss and less bit rate increments. For “Lovebird1” and “Doorflowers” sequences which have a few smooth motions, our integrated scheme achieves about 3% more time saving than VAMEDE with similar coding efficiency loss. For “Race1” and “Ballroom” sequences which have a lot of complex motions, our integrated scheme achieves about 10% more time saving than VAMEDE with less coding efficiency loss.
Performance of the proposed integrated scheme and VAMEDE under four basis QPs for interview views.
|Sequences||Proposed algorithm+FDE in Ref. 21||VAMEDE in Ref. 13|
|ΔTime (%)||ΔPSNR (dB)||ΔBit (%)||BDPSNR (dB)||BDBR (%)||ΔTime (%)||ΔPSNR (dB)||ΔBit (%)||BDPSNR (dB)||BDBR (%)|
To reduce the computational complexity of MVC, this work presents a fast mode decision algorithm which focuses on the early decision of large-size mode. Based on the global correlation of RD costs between views and the local correlation of RD costs among candidate modes, the average RD costs of large-size and small-size MB modes in the neighboring view are combined with large-size mode RD costs of the current MB for the early selection of large-size mode as the optimal MB mode. Compared with the exhaustive mode decision, experimental results show that the proposed algorithm saves the encoding time significantly with negligible loss of RD performance, and it also achieves a better performance than the state-of-the-art algorithms, especially for test sequences with fast motions and large disparities. Moreover, the proposed algorithm was integrated with the FDE of our previous work, and the integrated scheme also achieves better performance than the state-of-the-art algorithm VAMEDE.
This work was supported in part by the Zhejiang Provincial Natural Science Foundation of China under Grants LQ12F01008 and Y1110532 and the Natural Science Foundation of China under Grant 61303139. The authors would like to thank journal reviewers for their insightful comments and valuable suggestions.
ITU-T, and ISO/JTCIEC 1, “Advanced video coding for generic audiovisual services,” ITU-T Recommendation H.264 and ISO/IEC 14496 (MPEG-4 AVC) (2010).Google Scholar
A. VetroT. WiegandG. J. Sullivan, “Overview of the stereo and multiview video coding extensions of the H.264/AVC standard,” Proc. IEEE 99(4), 626–642 (2011).IEEPAD0018-9219http://dx.doi.org/10.1109/JPROC.2010.2098830Google Scholar
P. Merkleet al., “Efficient prediction structures for multiview video coding,” IEEE Trans. Circuits Syst. Video Technol. 17(11), 1461–1473 (2007).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2007.903665Google Scholar
D. Wuet al., “Fast intermode decision in H.264/AVC video coding,” IEEE Trans. Circuits Syst. Video Technol. 15(7), 953–958 (2005).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2005.848304Google Scholar
S. Huet al., “Fast inter-mode decision based on rate-distortion cost characteristics,” in Pacific-Rim Conf. Multimedia, pp. 145–155, Springer, Shanghai (2010).Google Scholar
Y. SungJ. Wang, “Fast mode decision for H.264/AVC based on rate-distortion clustering,” IEEE Trans. Multimedia 14(3), 693–702 (2012).ITMUF81520-9210http://dx.doi.org/10.1109/TMM.2012.2186793Google Scholar
L. Dinget al., “Content-aware prediction algorithm with inter-view mode decision for multiview video coding,” IEEE Trans. Multimedia 10(8), 1553–1564 (2008).ITMUF81520-9210http://dx.doi.org/10.1109/TMM.2008.2007314Google Scholar
B. Zattet al., “A multi-level dynamic complexity reduction scheme for multiview video coding,” in IEEE Int. Conf. Image Process., pp. 749–752, IEEE, Brussels (2011).Google Scholar
T. Zhaoet al., “Multiview coding mode decision with hybrid optimal stopping model,” IEEE Trans. Image Process. 22(4), 1598–1609 (2013).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2012.2235451Google Scholar
H. ZengK. MaC. Cai, “Fast mode decision for multiview video coding using mode correlation,” IEEE Trans. Circuits Syst. Video Technol. 21(11), 1659–1666 (2011).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2011.2133350Google Scholar
L. Shenet al., “Selective disparity estimation and variable size motion estimation based on motion homogeneity for multi-view coding,” IEEE Trans. Broadcast. 55(4), 761–766 (2009).IETBAC0018-9316http://dx.doi.org/10.1109/TBC.2009.2030453Google Scholar
L. Shenet al., “View-adaptive motion estimation and disparity estimation for low complexity multiview video coding,” IEEE Trans. Circuits Syst. Video Technol. 20(6), 925–930 (2010).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2010.2045910Google Scholar
W. Zhuet al., “Fast inter mode decision based on textural segmentation and correlations for multiview video coding,” IEEE Trans. Consumer Electron. 56(3), 1696–1704 (2010).ITCEDA0098-3063http://dx.doi.org/10.1109/TCE.2010.5606315Google Scholar
H. ZengK. MaC. Cai, “Mode-correlation-based early termination mode decision for multi-view video coding,” in IEEE Int. Conf. on Image Process., pp. 3405–3408, IEEE, Hongkong (2010).Google Scholar
B. Zattet al., “An adaptive early skip mode decision scheme for multiview video coding,” in Picture Coding Symp., pp. 42–45, IEEE, Nagoya (2010).Google Scholar
L. Shenet al., “Early SKIP mode decision for MVC using inter-view correlation,” Signal Process.: Image Commun. 25(2), 88–93 (2010).SPICEF0923-5965http://dx.doi.org/10.1016/j.image.2009.11.003Google Scholar
Y. Zhanget al., “Statistical early termination model for fast mode decision and reference frame selection in multiview video coding,” IEEE Trans. Broadcast. 58(1), 10–23 (2012).IETBAC0018-9316http://dx.doi.org/10.1109/TBC.2011.2174282Google Scholar
S. Khattaket al., “Fast encoding techniques for multiview video coding,” Signal Process.: Image Commun. 28(6), 569–580 (2013).SPICEF0923-5965http://dx.doi.org/10.1016/j.image.2012.12.010Google Scholar
L. Shenet al., “Macroblock-level adaptive search range algorithm for motion estimation in multiview video coding,” J. Electron. Imaging 18(3), 033003 (2009).JEIME51017-9909http://dx.doi.org/10.1117/1.3167850Google Scholar
W. Zhuet al., “Fast disparity estimation using spatio-temporal correlation of disparity field for multiview video coding,” IEEE Trans. Consumer Electron. 56(2), 957–964 (2010).ITCEDA0098-3063http://dx.doi.org/10.1109/TCE.2010.5506026Google Scholar
Y. Zhanget al., “Efficient multi-reference frame selection algorithm for hierarchical B pictures in multiview video coding,” IEEE Trans. Broadcast. 57(1), 15–23 (2011).IETBAC0018-9316http://dx.doi.org/10.1109/TBC.2010.2082670Google Scholar
Z. Denget al., “Iterative search strategy with selective bi-directional prediction for low complexity multiview video coding,” J. Vis. Commun. Image R. 23, 522–534 (2012).JVCRE71047-3203http://dx.doi.org/10.1016/j.jvcir.2012.01.016Google Scholar
T. Wiegandet al., “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits Syst. Video Technol. 13(7), 688–703 (2003).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2003.815168Google Scholar
A. Vetroet al., “Joint Multiview Video Model (JMVM) 8.0,” ISO/IEC JTC1/SC29/WG11 and ITU-T Q6/SG16, Doc. JVT-AA207, Geneva (2008).Google Scholar
G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” ITU-T Q6/SG16, Doc. VCEG-M33, Austin (2001).Google Scholar
Wei Zhu received his BSc and PhD degrees from Zhejiang University, Hangzhou, China, in 2004 and 2010, respectively. He is currently a lecturer of Zhejiang University of Technology. His major research fields are video coding, video analysis, and parallel processing.
Yayu Zheng received his BSc and PhD degrees from Zhejiang University, Hangzhou, China, in 2002 and 2008, respectively. He is currently an associate professor of Zhejiang University of Technology. His major research fields are networking multimedia systems and video coding.
Peng Chen received his BSc and PhD degrees from Zhejiang University, Hangzhou, China, in 2003 and 2009, respectively. He is currently an associate professor of Zhejiang University of Technology. His major research fields are signal processing and system design for multimedia.