Global-local correlation-based early large-size mode decision for multiview video coding

Abstract. Multiview video coding (MVC) is a recent extension of H.264/AVC, and it consumes huge encoding time to select the optimal macroblock (MB) mode, among different size candidate modes. As compared with the small-size mode (Inter16×8, Inter8×16, Inter8×8, Intra8×8, and Intra4×4), the large-size mode (Skip/Direct, Inter16×16, and Intra16×16) occupies most of the MB mode proportion with much less computational complexity. Thus, if the large-size mode could be early decided as the optimal MB mode, the complexity of mode decision could be effectively reduced. In this work, an early large-size mode decision algorithm is proposed based on the global correlation of rate-distortion (RD) costs between neighbor views and the local correlation of RD costs among candidate modes. Average RD costs of large-size and small-size MB modes in the neighbor view are employed as a global reference for the threshold of early decision. And RD costs of estimated modes are used to calculate the local adjustment for the threshold. Experimental results demonstrate that the proposed algorithm can significantly reduce the whole encoding time while maintaining an RD performance similar to that of the original MVC encoder.


Introduction
Multiview video is captured from a set of viewpoints, and it is useful in many multimedia applications, such as threedimensional (3-D) television, free viewpoint television, and glass-free portable 3-D display.Multiview video coding (MVC) was developed for the storage and transmission of very large multiview video data, and it has been standardized as the extension of H.264/AVC. 1 Figure 1 shows an illustration of the basic prediction structure in the MVC reference software JMVC, where the group of picture (GOP) length is eight for each view.According to the backward compatibility, all views can be classified into a base view and some nonbase views.Only the base view (S0 in Fig. 1) is backward compatible with H.264/AVC; nonbase views (S1 and S2 in Fig. 1) are encoded with new coding tools for providing complete multiview video bitstreams. 2 Every GOP in each view includes one anchor frame and some nonanchor frames.Frames with the time instants T0 and T8 in Fig. 1 are anchor frames, and other frames are nonanchor frames.For the random access of video bitstreams, anchor frames are not allowed to adopt temporal prediction.To improve the compression performance, nonanchor frames adopt temporal prediction and interview prediction. 3In Fig. 1, solid arrows represent temporal prediction and dotted arrows represent interview prediction.According to prediction directions for nonanchor frames, all views can also be classified into temporal views and interview views.Nonanchor frames employ both temporal prediction and interview prediction for interview views, while they only employ temporal prediction for temporal views.In Fig. 1, views S0 and S2 belong to temporal views, and view S1 belongs to interview views.
Like H.264/AVC, MVC also needs to select the optimal macroblock (MB) mode among multiple candidate modes for each frame, and it has five intermodes (Skip/Direct, Inter16 × 16, Inter16 × 8, Inter8 × 16, and Inter8 × 8) and three intramodes (Intra16 × 16, Intra8 × 8, and Intra4 × 4).Except for Skip/Direct, other intermodes are needed to consume a lot of time for motion/disparity estimation in reference frames.In MVC reference software JMVC, the exhaustive mode decision algorithm is used to select the optimal MB mode, and it checks all candidate modes in sequence.Besides, due to the use of interview prediction, the computational complexity of mode decision for single view is greater than that of H.264/AVC.Thus, the computational complexity of MVC is very high, and it has hindered the practical use for real-time and mobile applications.
Many state-of-the-art fast mode decision algorithms have been developed for H.264/AVC.Wu et al. 4 presented a fast intermode decision by using of the spatial homogeneity and the temporal stationarity characteristic of video objects.Hu et al. 5 proposed a fast intermode decision algorithm based on rate-distortion (RD) cost characteristics, which includes an early skip mode decision and a three-stage mode prediction.Sung and Wang 6 introduced a multiphase classification scheme that builds a mode decision tree according to the clustering of RD costs.These algorithms can be employed to speed up the mode decision of MVC, and their ideas could also enlighten the design of fast mode decision algorithms for MVC.However, the complexity of MVC is still very high, and it could be further reduced by using characteristics of MVC.To address this issue, various mode decision algorithms have been studied, including Refs.7-19.5][16][17][18] For the reduction of the whole complexity, Shen et al. 13 combined the candidate modes selection with the fast motion estimation and the prediction direction selection; meanwhile, Khattak et al. 19 provided a complete framework that includes not only mode decision but also reference frame selection and fast motion/disparity estimation.In these algorithms, the correlation of coding information between neighbor views and the RD costs of MB mode in MVC are usually employed to arrive at a faster mode decision.As Skip/Direct mode occupies the largest proportion of MB modes with negligible computational complexity, several algorithms only focused on early Skip/Direct mode decision.Zeng et al. 15 introduced an early decision algorithm by using RD costs of nearby MBs.Zatt et al. 16 proposed an early decision algorithm based on mode correlation in the 3-D-neighborhood.Shen et al. 17 presented an early decision algorithm based on the analysis of prediction mode distribution regarding the corresponding MBs in the neighbor view.Zhang et al. 18 proposed an efficient statistical Skip/Direct mode termination model named SDMET to adjust the RD cost threshold adaptively by using statistical information of coded MBs.As mentioned above, early Skip/Direct mode decision algorithms can reduce the complexity effectively with high RD performance, and they also can be combined with fast algorithms of motion/disparity estimation and multireference frames selection [19][20][21][22][23] to further reduce the complexity.However, since they mainly utilize the local correlation of coding information between neighbor views, the global correlation of coding information has not been exploited.Moreover, they cannot perform very well for video scenes with fast motions and large disparities, and they also have not considered the complexity reduction for anchor frames.
In this article, an early large-size mode decision algorithm based on the global-local correlation of RD costs is proposed to reduce the computational complexity of MVC.According to mode sizes and RD properties, all candidate modes in mode decision are classified into two types: the large-size mode and the small-size mode.The large-size mode includes Skip/Direct, Inter16 × 16, and Intra16 × 16, and the smallsize mode includes Inter16 × 8, Inter8 × 16, Inter8 × 8, Intra8 × 8, and Intra4 × 4, where Inter8 × 8 further contains four submodes (sub8 × 8, sub8 × 4, sub4 × 8, and sub4 × 4).Compared with the small-size mode, the large-size mode occupies much more MB mode proportion with much less computational complexity.Because of including Inter16 × 16 and Intra16 × 16 modes, the large-size mode also occupies more proportion than Skip/Direct mode, especially for frames with fast motions and large disparities.Therefore, the proposed algorithm focuses on the early decision of large-size mode instead of the early decision of Skip/ Direct mode.The global correlation of RD costs between views is adopted to calculate the basis portion of the early decision threshold.For each MB, the local correlation of RD costs among different size modes is employed to calculate the adjustable portion of the threshold, and the minimal RD cost of the large-size mode is compared with the threshold to early terminate mode decision.In addition, the proposed algorithm also considers the optimization for frames in the base view by using the nearest forwardcoded frame on temporal direction.Therefore, the proposed algorithm can be applied to all interframes of all views for effectively reducing the whole computational complexity of MVC.
The rest of this article is organized as follows: In Sec. 2, the characteristics of large-size and small-size modes are analyzed.Then, an early large-size mode decision based on the global-local correlation of RD costs is proposed in Sec. 3. Experimental results and conclusions are given in Secs. 4 and 5, respectively.

Motivation and Analysis
As an extension of H.264/AVC, MVC also employs the RD optimization technique 24 to select the optimal MB mode with the minimum RD cost, among candidate inter-or intramodes, and RD cost of each mode includes rate portion and distortion portion.Both inter and intramodes include different size modes, which have distinct characteristics of rate portion and distortion portion.By providing more precise motion/ disparity estimation, small-size intermodes which include Inter16 × 8, Inter8 × 16, and Inter8 × 8 can obtain less distortion portions of RD costs than large-size intermodes (Skip/Direct, Inter16 × 16).However, because small-size intermodes need more bits for encoding motion vector of each partition block, they have larger rate portions of RD costs than that of large-size intermodes.For MBs with complex motion/disparity, small-size intermodes can obtain a much less distortion portion of RD cost than large-size intermodes, and they are more likely to be selected as the optimal MB mode.For MBs with smooth motion/disparity, large-size intermodes usually can provide the same level performance of distortion portions as small-size modes, and they have less rate portions than small-size intermodes due to encoding only 16 × 16 partition information.Similar to intermodes, small-size intramodes (Intra8 × 8 and Intra4 × 4) can also obtain less distortion portions and more rate portions than that of large-size intramode (Intra16 × 16).For MBs with complex textures, small-size intramodes will often have less RD costs than large-size intramode.For MBs with smooth texture, large-size intramode usually obtains less RD cost than small-size intramode.In this article, according to the RD properties of different size modes, all intermodes and intramodes are classified into the large-size mode (includes Skip/Direct, Inter16 × 16, and Intra16 × 16 modes) and the small-size mode (includes Inter16 × 8, Inter8 × 16, Inter8 × 8, Intra8 × 8, and Intra4 × 4 modes).
In order to verify the above theoretical analysis, JMVC 8.0 was used to investigate statistic characteristics of largesize and small-size modes, and three typical test sequences ("Exit," "Ballroom," and "Race1") with eight views (S0 to S7) were chosen.The GOP length was set to 12, and five GOPs were selected for each view.For each frame, the default coding conditions were used: maximum two reference frames are available for forward reference list and backward reference list, respectively, only one interview reference frame is allowed for each reference list, search method "TZ search" is enabled with search range 96.Table 1 gives the MB proportion of large-size and smallsize modes, and the experimental basis quantization parameter (QP) is 32.It can be seen that the large-size mode occupies a proportion of 85% to 97%, which is much larger than the small-size mode.The mode proportion inside large-size MB mode under different basis QPs for "Race1" sequence is further shown in Fig. 2. It can be observed that Skip/Direct occupies the largest proportion, while Inter16 × 16 occupies the second largest proportion and Intra16 × 16 occupies a considerable proportion.Furthermore, the encoding time proportions of large-size and small-size modes are given in Table 2.The large-size mode consumes 7% to 15% of the encoding time for three sequences over eight views, whereas the small-size mode consumes 83% to 86% of the encoding time, which is much greater than that of the large-size mode.From Tables 1 and 2, it can be found that the large-size mode occupies a majority of the proportion, but only consumes a small part of the encoding time.Thus, if the large-size mode can be identified early as the optimal MB mode, the estimation of small-size mode can be skipped.In addition, it also indicated that the statistical proportions of MB modes are similar, among various views for each sequence.This is because those different views' video are originated from the same scene, and the effect of occlusions between views is small in most cases.So, the statistical information of neighboring views can be adopted for the early decision of large-size mode.
Further studies on RD cost characteristics of large-size mode and small-size mode are performed, and experimental conditions are the same as Table 1.RD costs distribution of large-size and small-size MB modes in a single frame is shown in Fig. 3.It can be seen that most of large-size MB mode RD costs are less than the average RD cost of small-size MB mode in Fig. 3(a), and most of small-size MB mode RD costs are larger than the average RD cost of large-size MB mode in Fig. 3(b).These indicate that most of large-size MB modes have relatively low RD costs and most of small-size MB modes have relatively high RD costs, which are consistent with our previous theoretical analysis, and it can draw the conclusion that the average RD cost of large-size MB modes and the average RD cost of small-size MB modes can be used to calculate the thresholds for the early decision of largesize mode.Moreover, the RD cost gap between large-size mode and small-size mode is further analyzed.Figure 4 shows the sorting of large-size mode RD costs with the red curve, and the corresponding small-size mode RD costs with same sorting indexes are also given.It can be seen that most large-size mode RD costs are smaller than their corresponding small-size mode RD costs.This is consistent with the statistical results in Table 1 and Fig. 3.If the large-size mode RD cost is relatively low, the large-size mode is likely to be selected as the optimal MB mode.The RD cost gap between the large-size mode and the small-size mode is increasing with the large-size mode RD cost.Therefore, if the large-size mode RD cost is a small value, the misjudgment cost of early termination would also be small.
Characteristics of large-size and small-size modes in MVC are summarized as follows: 1.The large-size mode occupies most of the proportion of MB modes, while it consumes only a small part of the encoding time.The small-size mode occupies a smaller proportion of MB modes, but consumes most of the encoding time.3 Proposed Early Large-Size Mode Decision Algorithm According to the analysis in the previous section, the proposed algorithm focuses on the early decision of largesize mode, which is estimated first, then, its RD cost is compared with a global-local adaptive threshold to early terminate mode decision.The detail processes are introduced as follows.
Based on the motivation in Fig. 3, the average RD cost of large-size MB modes (AvgJ Large ) and the average RD cost of small-size MB modes (AvgJ Small ) of the coded frame are employed.For nonbase views, the coded frame in the forward neighboring view with the same time instant as the current frame is selected to calculate AvgJ Large as follows: where J Large is the large-size mode RD cost of the MB i with large-size mode as its optimum MB mode, N Large is the number of J Large .AvgJ Small can be calculated in the same way.Due to the global correlation between views, AvgJ Large and AvgJ Small are adaptive to the video scene and coding feature.Therefore, AvgJ Large and AvgJ Small of neighboring views can be employed as a global measure for the early decision algorithm of current view.The curves of AvgJ Large and AvgJ Small under different basis QPs for "Ballroom" sequence are shown in Fig. 5.It can be seen that both AvgJ Large and AvgJ Small increase with basis QPs.Because the calculation of RD cost is relative to QP, AvgJ Large , and AvgJ Small are adaptive to the change of QP, and they are suitable to be employed as the global reference.For each MB, the estimation of Skip/Direct, Inter16 × 16, and Intra16 × 16 are performed, and the minimum RD cost of these estimated modes is selected as the large-size mode RD cost J Large .Then, the early decision of large-size mode is determined in Eq. ( 2): where n is the index of the current MB, and EarlyTH is the early decision threshold.If J Large is smaller than the termination threshold, EarlyTH, the optimal large-size mode with the minimum RD cost is selected early as the final MB mode and mode decision process is terminated.The selection of EarlyTH directly affects the performance of the proposed algorithm, and its calculation is analyzed as follows: First, AvgJ Large of the coded frame can be adopted as a measurement for J Large in the current frame, and it is multiplied with a parameter α as the EarlyTH.To study the relation between parameter α and the variation of RD performance which is caused by the misjudgment of large-size mode, the increments of total RD costs for MBs, which select small-size mode as the optimal mode are shown in Fig. 6, where the experimental basis QP is 32.It can be seen that the increments of RD costs are very small when α is 1, while the increments obviously grow when α is larger than 1.Thus, AvgJ Large (α is equal to 1) can be employed as the basis portion of the threshold.Although AvgJ Large is adaptive to video content of the current frame, the gap between AvgJ Large and AvgJ Small also changes with basis QPs in Fig. 5.If the threshold is directly calculated by multiplying J Large with a fixed parameter, the performance of early termination will not be stable.The threshold may be larger than AvgJ Small under small basis QPs, then the RD performance of the early decision will drop dramatically due to the low-decision accuracy.And the threshold also may be much less than AvgJ Small under large basis QPs, which will lead to low time savings because of the small early termination ratio.So, a good threshold calculation method that solves the above problem is expected in the proposed algorithm.
Fig. 5 Average MB mode RD costs on view S1 under different basis QPs for "Ballroom" sequence.
Fig. 6 The relationship between the parameter α and the increment of total RD costs for MBs with small-size MB mode.
Second, the threshold is further adjusted by utilizing the local feature of MB to improve the computational performance while maintaining a high RD performance.After the estimation of large-size mode, its RD costs reflect the local feature of current MB, and it can be used to predict the probability of a small-size MB mode.To reduce the misjudgment of large-size MB mode, MBs which selected the small-size mode as the final MB mode are employed to study the relation between large-size mode and smallsize mode.For these MBs, Skip/Direct mode RD cost (J Skip∕Direct ) and Inter16 × 16 mode RD cost (J Inter16×16 ) are compared, and ratios of the smaller RD cost are given in Table 3, where the experimental conditions are same with Sec. 2. It can be seen that J Inter16×16 occupies a majority of the proportion.Therefore, if J Inter16×16 is less than J Skip∕Direct , there is more probability for current MB to select the small-size mode, and the threshold should be decreased for maintaining RD performance.Conversely, if J Skip∕Direct is less than J Inter16×16 , the threshold could be increased for achieving more time saving.
Based on the above analysis, EarlyTH in Eq. ( 2) is finally calculated as follows for each MB: where AvgJ Large is the basis portion of EarlyTH, and the proportional divisor calculated by J Inter16×16 and J Skip∕Direct is used to adjust the gap between AvgJ Small and AvgJ Large with the local feature of current MB.In Eq. ( 3), if J Inter16×16 is less than J Skip∕Direct , EarlyTH is closer to AvgJ Large , and if J Skip∕Direct is less than J Inter16×16 , EarlyTH is closer to AvgJ Small .Thus, the range of EarlyTH is from AvgJ Large to AvgJ Small , and its value is dependent on the local MB coding feature.For different video sequences, a few frames may have no small-size MB mode under high basis QPs.In this particular case, AvgJ Small is replaced with 5 × AvgJ Large according to extensive experiments for getting a preferable performance.
In addition, the proposed algorithm is also extended to the base view by using AvgJ Small and AvgJ Large of the nearest forward-coded frame in the current view.Thus, it can reduce the computational complexity of all views.Besides, the proposed algorithm need not to store all MB RD costs of the coded frame because the use of global correlation, and only two clasified average RD costs of current frame are stored for the optimization of following frames.
To verify the effectiveness of the proposed algorithm, simulations have been conducted under the same test conditions as Sec. 2. Table 4 gives decision accuracies and where N MB represents the number of MBs, N Early represents the number of early termination in Eq. ( 2), and N Hit is the number of MBs which early select large-size MB mode in Eq. ( 2), and their optimal MB modes are also large-size mode.For the threshold EarlyTH, the average decision accuracy is 96.5%, and the average termination ratio is 81.6% that is close to the average proportion of large-size MB mode.This indicate that EarlyTH can achieve large termination ratios with high decision accuracies.For threshold AvgJ Large , the average decision accuracy is 98.3%, while the average termination ratio is only 56.2%, which is 31.8%less than the average proportion of large-size MB mode.For the threshold AvgJ Small , all termination ratios are larger than the corresponding proportion of large-size MB mode, which leads to only 93.1% average decision accuracy.These results demonstrate that EarlyTH, which is calculated using global-local RD costs, is more suitable for the proposed algorithm than thresholds only using global RD costs.A flow diagram of the proposed algorithm is illustrated in Fig. 7, and the detailed steps of proposed algorithm are given as follows: 1.If the current frame is an intraframe, perform the estimation of all intramodes, then go to step 7. Otherwise, get AvgJ Large and AvgJ Small from the coded frame for the current frame.If the current frame belongs to nonbase views, then the coded frame with the same time instant in the forward neighboring view is selected.If the current frame belongs to the base view, then the nearest forward-coded frame in the current view is employed.
2. Process the estimation of large-size mode (Skip/ Direct, Inter16 × 16, and Intra16 × 16) for current MB, and obtain Skip/Direct mode RD cost (J Skip∕Direct ), Inter16 × 16 mode RD cost (J Inter16×16 ), and Intra16 × 16 mode RD cost.Then select the optimal large-size mode with the minimal RD cost (J Large ).
If the J Large is less than EarlyTH, then the optimal large-size mode is decided as the final MB mode, go to step 6.  4).Then select the final optimal mode with the minimal RD cost, among all estimated modes.6.If all MBs of current frame have been processed, go to step 7, else go to step 2 for the mode decision of the next MB.7. Generate AvgJ Large and AvgJ Small of the current frame for the early large-size mode decision of subsequent encoding frames.

Experimental Results
The proposed algorithm was implemented on the MVC reference software JMVC 8.0, and its detailed configuration is given in Table 5. JMVC adopts the I-B-P prediction structure for view coding order and hierarchical B picture prediction structure for temporal coding order. 3,25Eight typical test sequences were chosen for the simulation, which include three 640 × 480 sequences ("Exit," "Ballroom," and "Race1"), four 1024 × 768 sequences ("Breakdancers," "Ballet," "Doorflowers," and "Lovebird1"), and one 1280 × 960 sequence ("Dog").These sequences are representative in camera setups, video scenes, and frame rates, and eight views (S0 to S7) were chosen for each sequence.
Compared with the exhaustive mode decision in JMVC, the encoding time saving (ΔTime) was calculated to evaluate the computational performance, and the change of peak signal-to-noise ratio (ΔPSNR) and the change of bit rate (ΔBits) were calculated to evaluate the RD performance under each QP.To evaluate the overall RD performance under four basis QPs, Bjontegaard delta peak signal-to-noise ratio (BDPSNR) and Bjontegaard delta bit rate (BDBR) 26 were also employed.A negative BDPSNR or a positive BDBR indicates a coding loss and is not preferred.Because interview views employ not only temporal prediction but also interview prediction for nonanchor frames, they consume more encoding time than temporal views.Table 6 gives the performances of the proposed algorithm for temporal views and interview views separately.For temporal views of the eight sequences, the proposed algorithm reduces encoding time from 49.1% to 71.3%, and the PSNR loss is from 0.013 to 0.042 dB with the change of bit rate from −0.35% to 0.05%.The average time saving of the eight sequences is 61.5%, while the average loss of PSNR is only 0.028 dB with a 0.15% decrement of average bit rate.For interview views of the eight sequences, the proposed algorithm reduces encoding time from 65.4% to 75.7%, and the loss of PSNR is from 0.022 to 0.059 dB with the change of bit rate from −0.49% to 0.42%.The average time saving of the eight sequences is 71.4%, while the average PSNR loss is only 0.035 dB with a 0.08% decrement of average bit rate.These demonstrate that the proposed algorithm achieves significant time saving while maintaining an RD performance similar to that of the original encoder.In Table 6, it can be seen that the proposed algorithm achieves about 10% more average time savings for interview views when compared with that for temporal views.This is because interview views consume more encoding time originally than temporal views.From Table 6, it can also be found that the proposed algorithm achieves more encoding time savings for the sequences only with a few motions, such as "Ballet," "Doorflowers," "Lovebird1," and "Dog" sequences.This is because the original proportions of large-size MB mode in these sequences are larger than in other sequences, and the proposed algorithm saves more encoding time adaptively.For "Ballroom," "Race1," "Exit," and "Breakdancer" sequences which have a lot of motions, the proposed algorithm also reduces the encoding time significantly from 65.4% to 73.5% for interview views.
BD-metric 26 was adopted to assess RD performances of the proposed algorithm and state-of-the-art mode decision algorithms.Table 7 gives the encoding time saving, BDPSNR and BDBR under four basis QPs for the proposed algorithm, and the state-of-the-art early Skip/Direct mode decision algorithm SDMET. 18For temporal views of the eight sequences, SDMET achieves a 46.8% time saving on average with a 0.04-dB BDPSNR loss and a 1.24% BDBR increment, and the proposed algorithm achieves a 14.7% more time saving with slightly better RD performance than SDMET.For interview views of the eight sequences, SDMET achieves a 54.7% time savings on average with a 0.04-dB BDPSNR loss and a 1.31% BDBR increment, and the proposed algorithm achieves a 16.7% more time savings with also a slightly better RD performance than SDMET.Table 7 also gives results of the fast intermode decision algorithm based on textural segmentation and correlations (indicated as FIMD), which has been presented in our previous work. 14FIMD includes an early Skip/Direct mode decision method and two assistant methods (selection of disparity estimation and the reduction of Inter8 × 8 mode estimation), and it was implemented on interview views.
In Table 7, it can be seen that FIMD achieves a 58.5% time saving on average with a 0.00-dB BDPSNR loss and a 0.13% BDBR increment.The RD performance of FIMD almost maintains the same as the original JMVC, while the proposed algorithm has a 12.9% more time saving with a similar RD performance.For a better observation, Fig. 8 shows the histogram comparison of the time savings among the proposed algorithm, SDMET and FIMD for interview views of different sequences.It can be observed that the proposed algorithm achieves larger time savings than both SDMET and FIMD over all sequences.For sequences which have lots of fast motions and large disparities, such as "Race1" and "Breakdancers," the time savings of SDMET and FIMD are less significant than the proposed algorithm.For "Race1" sequence, the proposed algorithm achieves up to 30% more time saving than SDMET, and it also achieves up to 20% more time saving than FIMD.For sequences which have a few motions and simple textures, such as "Ballet" and "Doorflowers," the time savings of SDMET and FIMD are close to the proposed algorithm.The above comparisons indicate that the proposed algorithm also has more stable time savings than these two state-of-the-art algorithms.This is because that the original proportion of large-size MB mode is larger than the proportion of Skip/Direct MB mode for all sequences, especially for sequences with lots of fast motions.Thus, the performances of SDMET and FIMD are more sensitive to motions than the proposed algorithm.Additionally, SDMET and FIMD have not considered the optimization for anchor frames, which also affects their computational performances.
Moreover, Fig. 9 illustrates the time savings curves of the proposed algorithm, SDMET, and FIMD for interview views under different QPs.It can be seen that the proposed algorithm achieves a more stable and larger time savings than SDMET over different QPs.For "Race1" sequence, which has fast global motions, the proposed algorithm obtains about 30% more time savings than SDMET under all QPs.For "Dog" sequence, which has slow motions and small disparities, the time savings of SDMET is close to the proposed algorithm under higher QPs, while it has less time savings than the proposed algorithm under lower QPs.Due to the complex textures and picture noises in the static background of "Dog" sequence, the distribution of Skip/Direct MB mode is dispersive under lower QPs, and SDMET has lesser decision ratios for ensuring small RD degradation.With the increase of QPs, more MBs select Skip/Direct as the optimal MB mode, and SDMET also achieves much more time savings than that under lower QPs.Owing to the use of global RD costs obtained from neighbor views, the proposed algorithm has taken both QP and video contents  into consideration, and its performance is not sensitive to QP and video contents.Compared with FIMD in our previous work, 14 the proposed algorithm achieves about 20% more time saving under all QPs for "Race1" sequence, and it also achieves about 15% more time saving under all QPs for "Dog" sequence.It is because FIMD employs more strict decision conditions for maintaining the same RD performance as the original encoder, which leads to less time savings.Besides, the computational performance of FIMD depends largely on the proportion of Skip/Direct MB mode, which is originally less than the proportion of large-size mode.Therefore, the proposed algorithm can obtain more time savings.
Although the proposed algorithm can reduce about 71% of encoding time for interview views, the computational complexity of interview views is still very large due to the disparity estimation for interview prediction.To further reduce the complexity, the proposed algorithm can be combined with state-of-the-art fast search algorithms, and it was integrated with the fast disparity estimation algorithm (indicated as FDE) in our previous work 21 for interview views.Table 8 gives the performance of the integrated scheme.It can be seen that the integrated scheme achieves 86.3% average time saving, which is about 15% more time savings than the proposed algorithm.And the average coding efficiency loss of all sequences is 0.04-dB PSNR loss and 0.23% bit rate decrement (0.03-dB BDPSNR loss and 0.94% BDBR increment), which is same with the average performance of the proposed algorithm in Table 7.The view-adaptive motion estimation and disparity estimation (VAMEDE), 13 which includes mode size decision, fast motion estimation, and selective disparity estimation, was also implemented on JMVC for interview views, and it achieves 79.6% average time saving with 0.04-dB PSNR loss and 1.79% bit rate increment (0.10-dB BDPSNR loss and 3.16% BDBR increment).The average performance of VAMEDE in Table 8 is consistent with that mentioned in Ref. 13, and the reason of the slightly coding efficiency degradation is as follows.VAMEDE is developed based on JMVM platform, which is the MVC reference software before JMVC.JMVM adopts the motion skip mode to highly improve the overall coding performance.However, the motion skip mode is excluded in JMVC, and VAMEDE would cause relatively larger coding efficiency loss on JMVC than that on JMVM.Compared with VAMEDE, our integrated scheme achieves about 6% more time saving on average with a same level PSNR loss and less bit rate increments.For "Lovebird1" and "Doorflowers" sequences which have a few smooth motions, our integrated scheme achieves about 3% more time saving than VAMEDE with similar coding efficiency loss.For "Race1" and "Ballroom" sequences which have a lot of complex motions, our integrated scheme achieves about 10% more time saving than VAMEDE with less coding efficiency loss.

Conclusion
To reduce the computational complexity of MVC, this work presents a fast mode decision algorithm which focuses on the early decision of large-size mode.Based on the global correlation of RD costs between views and the local correlation of RD costs among candidate modes, the average RD costs of large-size and small-size MB modes in the neighboring view are combined with large-size mode RD costs of the current MB for the early selection of large-size mode as the optimal MB mode.Compared with the exhaustive mode decision, experimental results show that the proposed algorithm saves the encoding time significantly with negligible loss of RD performance, and it also achieves a better performance than the state-of-the-art algorithms, especially for test sequences with fast motions and large disparities.Moreover, the proposed algorithm was integrated with the FDE of our previous work, and the integrated scheme also achieves better performance than the state-of-the-art algorithm VAMEDE.

Fig. 1
Fig. 1 Illustration of the basic prediction structure in JMVC.S0, S1, and S2 represent three views, T0 to T8 represent nine time instants.In each view, I, B, and P represent frame types, and their subscript values represent corresponding temporal levels.

Fig. 3
Fig. 3 Distribution of MB mode rate-distortion (RD) costs on view S1 frame 6 for "Ballroom" sequence.(a) RD costs of large-size MB mode and the average RD cost of small-size MB mode.(b) RD costs of small-size MB mode and the average RD cost of large-size MB mode.

Fig. 4
Fig.4Sorting of large-size mode RD costs and their corresponding small-size mode RD costs on view S1 frame 6 for the "Ballroom" sequence.

Fig. 7
Fig. 7 Flow diagram of the proposed early large-size mode decision algorithm for multiview video coding.

Fig. 8
Fig. 8 Encoding time saving ratios of SDMET, FIMD, and the proposed algorithm for interview views of the eight sequences.

Fig. 9
Fig. 9 Time saving curves of proposed algorithm, SDMET, and FIMD under different basis QPs for interview views.(a) Time saving curves for "Race1" sequence.(b) Time saving curves for "Dog" sequence.
Note: Experimental conditions are the same as Table1.
2. Although Skip/Direct occupies the largest proportion inside large-size MB mode, the proportions of Inter16 × 16 and Intra16 × 16 are also considerable.3. Compared with small-size mode RD costs, most of large-size mode RD costs are relatively low, and their gaps are small when RD costs are low.

Table 3
Proportions of the smaller rate-distortion (RD) cost between J Skip∕Direct and J Inter16×16 .

Table 4
Decision accuracies and termination ratios of large-size MB mode under different thresholds.

Table 5
Experimental configuration of JMVC.

Table 6
Performance of proposed algorithm under different basis QPs.

Table 7
Performance of the proposed algorithm and state-of-the-art algorithms under four basis QPs.

Table 8
Performance of the proposed integrated scheme and VAMEDE under four basis QPs for interview views.