The proliferation of ubiquitous communication infrastructures makes various video services increasingly popular on portable multimedia devices. Although recent technological advancements have made real-time video playback possible on many portable multimedia devices, when it comes to high resolution and/or high quality video content, especially on mobile devices, there are still impending issues in real-time playback due to limited resources of portable devices in battery capacity, processor speed, and memory.
The problem addressed in this paper is slightly different from the conventional usage scenario of video decoders since it addresses what to do if an H.264/AVC video decoder is given a compressed bitstream exceeding its level specification. The level in a video coding standard specifies the minimal resources with which a given standard-conformant decoder shall be equipped. Encountering video contents exceeding a decoder’s level specification is not rare in reality due to the recent ubiquity of various communication networks and the coexistence of portable devices with wide variability in their computation capacity. That is because video content available on mobile networks is accessible basically to every sort of devices, and some content may have a higher level specification of H.264/AVC than that of a receiving decoder. In such cases, currently a conventional decoder just refuses to decode. However, it would be much friendlier to users if the decoder were able to show the decoded pictures at a slightly lower quality that its available resources could provide with the best efforts. In order to do this, a video decoder should be able to flexibly decode the bitstream exceeding the level specification according to its available computing resources. This kind of simplified decoding capability is also quite essential when a decoder knows a priori that fully compliant decoding is not necessary, e.g., in fast-forwarding of video, making thumbnails, or skimming through a video playback. As a whole, the issue in this problem is complexity scalable video decoding in accordance with the available resources in a decoder.
The aforementioned video playback capability itself is already implementable in a sense if the play-back quality is not concerned much. However, such a less careful approach would be practically useless due to the significant degradation of both objective and subjective quality resulting from the processing mismatch between the encoder and the decoder. A quality distortion caused by a complexity-reduced decoding process is propagated to subsequent pictures, and pretty soon, the quality degradation will become unbearable. Therefore, it is very important to carefully design a complexity scalable decoding algorithm which can manage optimal complexity control depending on the resource availability of the device.
Note that various complexity scalable video decoding algorithms have been already developed.188.8.131.52.184.108.40.206.10.11.12.13.220.127.116.11.18.–19 A major approach to complexity scalable decoding is to control the computational complexity of one or two decoding processes. Peng1 proposed a discrete cosine transform (DCT)-based complexity scalable video decoder which controlled the decoding complexity by pruning out some DCT data in order to skip the inverse discrete cosine transform (IDCT) process. Peng and Zhong2 proposed a selective B-residual decoding method based on computational resources and the energy level of B-residual blocks. Chen et al.3 realized the complexity scalability by using both IDCT pruning and a simpler interpolation filter based on the frame type. Lei et al.4 proposed a complexity scalable algorithm for the audio and video coding standard in China (AVS) by using a loop filter along with a luminance interpolation filter scaling method. In this approach, the encoder sends complexity information about the loop filter and interpolation filter to the decoder for the complexity control. Meanwhile, Ji et al.5 developed an energy scalable video decoding strategy for multimedia devices. Lee et al. 6 also worked towards complexity scalability by controlling the complexity of motion compensation and of the deblocking filter. Mahjoub et al.7 proposed a complexity reduction method of the deblocking filter, and Lu et al.8 optimized the context adaptive variable length coding (CAVLC) lookup tables to reduce the decoding complexity. de Oliveira et al.9 optimized the inverse transform (IT) matrix for each frame, as a function of both content and quantization noise.
Several approaches also designed the complexity model of decoder for complexity scalable video decoding.1011.12.–13 They include complexity models of H.264/AVC decoding parts such as motion compensation,10,11 entropy decoding,12 or the whole H.264/AVC decoder. 13
As another approach to the complexity scalable algorithms, Park et al.14 reduced the energy consumption of the MPEG-4 decoding process by using a re-quantization process that reduced the amount of data to process. Nam et al.15 proposed a method using spatial downsizing decoding.
Some other approaches addressed the reduction of decoding complexity from the viewpoint of hardware architecture design.1617.18.–19 Chao et al.16 designed an optimized IT architecture to support multistandard video coding applications, and Wei et al.17 proposed parallel decoding algorithms for multicore processors. Tsai et al.18 designed a parallel level decoding method of CAVLC, and Sze and Chandrakasan19 proposed a parallel context adaptive binary arithmetic coding (CABAC) decoding method. However, these hardware perspective approaches are not easy for achieving flexible control because of their hardwired characteristics.
In this paper, an algorithmic complexity scalable video decoding scheme for the H.264/AVC decoder of portable multimedia devices is investigated. First, each decoding element of H.264/AVC was studied in terms of both its complexity versus degradation performance from the viewpoint of complexity scalability and its complexity control parameters. By adjusting the complexity control parameters in a complexity-distortion (C-D) optimized way, the optimum complexity control levels which give the best performance in complexity reduction with minimal quality degradation were found.
The rest of this paper is structured as follows. Section 2 investigates the video decoding elements from the viewpoint of complexity control and develops a complexity reduction method. Section 3 presents the proposed complexity scalable decoding method of each control element and the optimal complexity control level. Section 4 presents experimental results, and Sec. 5 concludes with discussion and some directions for possible future work.
Complexity Control of Video Decoding Elements
The H.264/AVC decoder is composed of several decoding elements as depicted in Fig. 1: a variable length decoder (VLD), an inverse quantizer (IQ), an IT, motion compensation, intra prediction, reconstruction, and deblocking filter. In previous research,6 complexity of the H.264/AVC decoding elements in the main profile was evaluated, and its complexity profiling result6 is summarized in Table 1 to show that the most complex decoding element is the motion compensation and variable length decoding (CABAC decoding), and that the second major complex element is the deblocking filter. “Others” in Table 1 represent operating system overheads such as file I/O for reading bitstreams from the file system.20 Based on this profiling result, the motion compensation and deblocking filtering were chosen in this paper as targets for complexity control.
Complexity profile result of H.264/AVC decoding elements (Ref. 6).
|Decoding elements||Complexity (%)|
|Variable length decoding(VLD)||25.19|
|inverse quantizer/inverse transform(IQ/IT)||10.65|
However, by noting the lossless nature of the entropy decoder, simplification of CABAC is not considered since even small degradation from compromised entropy decoding can result in fatal decoding errors. Since the complexity control range using the motion compensation and deblocking filter were not sufficient, a macroblock (MB) decoding skipping method was further developed to provide more flexibility in the complexity control.
Complexity Reduction of the Motion Compensation
In the H.264/AVC motion compensation, motion predicted values at half-pel samples were generated by horizontal or vertical one-dimensional 6-tap finite impulse response (FIR) interpolation filtering, while quarter-pel samples were generated by averaging the nearest half-pel and integer-pel samples.21 Figure 2 depicts quarter-pel samples for luma components. Since the interpolation process accounts for most of the complexity of motion compensation, its complexity for the luma component is modeled asTable 2 shows the interpolation filtering process based on the quarter-pel sample positions in Fig. 2 and their normalized complexity with respect to .6 Note the nonidentical computational complexity: it depends on the sample positions. Quarter-pel positions at , , , are the most complex since they require seven 6-tap filtering and one 2-tap filtering. On the other hand, those half-pel samples labeled , are the least complex ones — they require just one time of 6-tap filtering. This observation suggests that the motion compensation complexity depends critically on the complexity of the interpolation filter. Therefore, to reduce the complexity, the 6-tap interpolation filter can be replaced by 2-tap or 4-tap filter, depending on the subpel position, with filter coefficients as shown in Table 3. Note that the filter coefficients of the 4-tap filter are used in a scalable extension of H.264/AVC22 for inter-layer intra prediction, and those of the 2-tap filter are generated by simplifying the H.264/AVC interpolation filter with adjacent int-pel samples.6
Complexity comparison of interpolation in H.264/AVC decoding.
|Sample position||Interpolation operation||Normalized Complexity w.r.t. G(0.0)|
|G(0,0)||Integer (pixel copy)||1.00|
Complexity reduction method for interpolation filtering.
|position||1-D filter coefficient|
Prediction values for chroma sample positions are generated by bilinear interpolation of four neighboring integer samples using
Complexity Reduction of the Deblocking Filter
Since H.264/AVC performs block-based transform and lossy quantization of integer DCT coefficients, blocking artifacts occur in the reconstructed picture. To eliminate the blocking artifacts, a deblocking filter is applied both in the encoder and in the decoder. This deblocking filter has two processes: block boundary strength (Bs) decision and actual pixel filtering using the determined Bs. At each block boundary, Bs is determined as one integer between 0 and 4 according to the rules23 based on whether or not it is a MB boundary, whether its blocks have intra/inter prediction mode, whether it has nonzero DCT coefficients, its motion vector, its reference picture index, etc. Subsequently, the actual filtering process is applied to each block boundary depending on the selected value of Bs. In case of , a special filter is applied according to the specific condition.23 When Bs is from 1 to 3, a normal filter is applied.23 Therefore, the computational complexity of the deblocking filter is modeled as
In order to reduce , a simplified Bs decision process in the previous research6 was proposed based on some observations.6,24 The proposed simplified Bs decision process contained the same Bs decision rules as the previous research,6 but different Bs values according to the rules as depicted in Fig. 3.
A slightly different filtering method compared to the previous research6 was also designed by adjusting filtering tap size and the number of samples to be filtered according to Bs to reduce . When , the proposed simplified filtering method is the same as that of H.264/AVC. On the other hand, if , only one nearest pixel from each side of the block boundary (i.e., and ) is filtered23 — that is, unlike the H.264/AVC, the second immediate pixels from the block boundaries ( and ) are not filtered.23 In the case, the filtering complexity is reduced by using a 2-tap FIR filter which is applied only to the and samples.
Complexity Reduction by MB Decoding Skipping
For further complexity scalability, a MB was skipped from the whole decoding process after executing VLD. This method is similar to frame skipping for frame rate control in transcoders.2526.27.–28 Reduction in temporal resolution due to the frame skipping may cause noticeably perceivable motion jerkiness and consequently significant subjective quality loss can follow. To prevent such mishaps, existing approaches selectively skip frames that satisfy some conditions, for examples, scene change,25 motion activity,26,27 or motion continuity.28 Such skipping methods were also used for the proposed complexity scalable video decoder. That is, the MB decoding in a B slice was skipped when the MB satisfied three conditions of MB coding type, coded block pattern () value, and motion activity. If an MB (in inter-coded slice or picture) was determined to be intra-coded, its correlation with the blocks in reference picture(s) must be low. Therefore, an intra-coded MB should not be skipped from the decoding since otherwise its pixel values cannot be faithfully reproduced. On the other hand, if an inter-coded MB has no nonzero coded coefficients (that is, ), then it was safe to assume that the MB was highly correlated to its reference block, and it was possible to estimate the MB quite faithfully from its reference. Motion activity can indicate whether the motion of the current MB is fast or slow. If an MB with fast motion is skipped from decoding, its estimated reconstruction is highly likely to have noticeable motion jerkiness. Therefore, the decoding skipping also needs to check the motion activity of the current MB which can be calculated by2930.–31 Those skipped MBs were reconstructed by motion compensation using motion vectors and the reference index of the current MB. For example, if the current MB was predicted to be from list 0, its reconstructed MB was motion-compensated from the reference slice in the list 0 memory. If the current MB was bi-predicted, the reconstructed signal was formed by averaging the motion-compensated signals from the list 0 and list 1 memory.
Proposed Complexity Scalable Decoding Scheme
The complexity scalable video decoding scheme should satisfy the following optimality criterion
Complexity Scalable Method for the Motion Compensation
To reduce the complexity of motion compensation, simplified interpolation filtering methods were developed as depicted in Sec. 2. The most effective complexity scalability for motion compensation can be achieved by selectively using the simplified interpolation filtering methods according to a specified scalability level. The proposed four motion compensation complexity reduction levels () are shown in Table 4. When , the motion compensation had the maximum complexity, which is the same as the conventional H.264/AVC motion compensation. When or 2, simplified methods were applied to the slice only. This gives the minimal degradation on the video quality since the degradation in the -slice is not propagated to other slices unless the stored -slice is used. When , simplified motion compensation methods were used for the -slices (4-tap filter) and the -slices (2-tap filter). Therefore, encoder-decoder mismatch could occur, however, in return, the decoder achieved a significant reduction in complexity. For chroma components, when , the simplified method from Sec. 2.1 for chroma was applied.
Motion compensation complexity reduction (MCR) levels.
|0||6-tap (no reduction)|
|1||4-tap (only B-slice)|
|2||2-tap (only B-slice)|
|3||4-tap (P-slice) 2-tap (B-slice)|
Complexity Scalable Method for the Deblocking Filter
To control the complexity of the deblocking filter, a simplified Bs decision and deblocking method were applied as shown in Table 5, based on the six deblocking filter complexity reduction levels (). When , the conventional H.264/AVC deblocking filter was used without any complexity reduction. As the increased, simplified methods of deblocking in Sec. 2 were applied one by one. When , the deblocking filtering was applied only to the MB boundary with the simplified Bs decision and filtering method in Sec. 2. To achieve the maximum complexity reduction in the deblocking filter, the deblocking filter process to whole slices was switched off when .
Deblocking filter complexity reduction (DFR) levels.
|DFRLevel||Bs decision||Deblocking filtering|
|0||H.264/AVC method||H.264/AVC method|
|1||H.264/AVC method||simplified method|
|2||simplified method||H.264/AVC method|
|3||simplified method||simplified method|
|4||simplified method(only MB boundary)||simplified method(only MB boundary)|
|5||Forced deblocking filter off||Forced deblocking filter off|
Complexity Scalable Method for MB Decoding Skipping
The MB decoding process skipping has three levels which are described in terms of their MB decoding reduction level (), as in Table 6. If , those MBs satisfying the skip conditions (inter-coded MB, , ) were not decoded, that is, were skipped from decoding. On the other hand, if , the inter-coded MBs were skipped from the decoding process, thus saving a tremendous amount of computation.
Complexity reduction (MDR) levels for macroblock decoding processing skipping.
|1||Conditional skip decoding of macroblock|
|2||Forced skip decoding of inter-coded macroblock|
Proposed Complexity Scalable Decoding Scheme
The previous subsections discussed how to realize decoding complexity scalability individually for each key decoding element. In this subsection, the same problem is discussed but focusing specifically on the best scalable control of total video decoding by optimally adjusting the three control parameters , , and together. A block diagram of the proposed complexity scalable video decoder is shown in Fig. 4.
Let us define a total complexity reduction level () as a function of the three levels,6,32 was compared by using and which are respectively defined as
The C-D curves are shown in Fig. 5; each curve represents the complexity reduction versus quality degradation obtained by controlling the individual level parameter independently ( in Table 4, in Table 5, and in Table 6, respectively). By controlling each parameter, the decoder complexity is reduced by up to 13.6% by , 12.9% by , and 25.9% by . The quality degradations compared to the maximum complexity reduction are , , and , respectively. As in Fig. 5, it was verified that these three parameters were suitable for complexity control of a decoder.
The joint C-D curve in terms of all three control parameters together (, , and ) is shown in Fig. 6. The optimal complexity control points which have the maximum complexity reduction subject to minimum distortion are specified by a line in Fig. 6. Therefore, if a decoder adjusts the control parameters following the line, the optimal decoding complexity scalability can be attained. Table 7 shows the control level of each parameter according to , while its relative complexity reduction and quality degradation are shown in Fig. 6. As shown in Table 7, the expected maximum complexity reduction was up to 41% compared with . Since the choice of in Table 7, compared to , reduced the complexity by not more than 1% but incurred additional quality degradation of about 0.6 dB, the maximum complexity reduction level was set to 14 instead of 15. Therefore, the complexity of a decoder was adjusted in up to 15 steps. When was 0, the decoder has no complexity reduction, that is, the same complexity as conventional H.264/AVC decoder. As the increased, the complexity control level of each parameter was adjusted according to Table 7.
Joint complexity reduction level.
In order to assess the performance of the proposed scheme, it was implemented on JM18.0 H.264/AVC reference software. Bitstreams for experiments were coded for the H.264/AVC main profile with group of pictures (GOP) under IBPBP structure (Here, I, B, and P represents Intra, Bi-predictive, and Predictive picture, respectively). The number of reference frames was 5, and one picture was coded as one slice. The quantization parameter (QP) was set to 22, 27, 32, and 37. The video sequences used for performance evaluation were Bigships, City_corr, Night, and Crew (). The performance of the proposed scheme was measured by AST (%) in Eq. (7) and [dB] in Eq. (8).
Figure 7 shows how the subjective quality changes when each control parameter of complexity (, , and ) is individually adjusted. Figure 7(a) corresponds to the maximum complexity level (, i.e., , , and ) while the others, Fig. 7(b)–7(d), respectively, correspond to when each control parameter is changed to its maximum reduction value individually [Fig. 7(b): ; Fig. 7(c): ; Fig. 7(d): ]. Figure 7 shows that while the maximum complexity reduction of deblocking filtering () in Fig. 7(c) gives similar subjective quality as the no complexity reduction case in Fig. 7(a), the sharpness of the picture is degraded when or are adjusted to those of the minimum complexity level (, ). The sharpness degradation appears in all regions in Fig. 7(b), but it is less apparent in the low motion area in Fig. 7(d) (see the middle building in the picture). As depicted in Fig. 7, although control parameters had some degradation in the minimum complexity level, they still maintained proper overall subjective quality.
Table 8 shows experimental results of the proposed scheme, indicating that it attains complexity scalability. The proposed scheme reduces the decoding complexity by up to 44% (in City_corr sequence, ) as compared with the conventional H.264/AVC decoder. In Table 9, represents the target complexity reduction and is the complexity control level to achieve the complexity reduction up to . Reduced complexity for the is similar to the in most cases. However, the reduced complexity does not reach the target complexity in the Crew sequence. This is because the proposed scheme does not consider an intra-coded MB as a candidate for complexity control, therefore, there may exist a problem of insufficiency in the adjustable range of complexity especially if there are many intra MBs. The many intra-coded MBs (boxed areas) in the Crew sequence are shown in Fig. 8 in an inter slice. This behavior is due to flash lights). This is one area for future extension of the proposed method.
Experimental results of the proposed scheme.
|ΔPSNR [dB]||AST(%)||ΔPSNR [dB]||AST(%)||ΔPSNR [dB]||AST(%)||ΔPSNR [dB]||AST(%)|
Performance comparison of the proposed scheme with previous methods (Refs. 4, 6).
|ΔPSNR [dB]||AST(%)||ΔPSNR [dB]||AST(%)||ΔPSNR [dB]||AST(%)||ΔPSNR [dB]||AST(%)|
|10||Method 1 (Ref. 4)||−1.33||11.83||−0.99||10.74||−1.63||14.36||−0.01||6.56|
|Method 2 (Ref. 6)||−0.40||8.73||−0.70||12.89||−0.31||9.37||−0.06||5.91|
|20||Method 1 (Ref. 4)||−4.94||18.91||−7.54||16.27||−9.78||20.17||−2.22||15.48|
|Method 2 (Ref. 6)||−2.23||23.89||−1.33||22.84||−2.64||24.88||−2.13||21.29|
Table 9 shows the distortion and complexity reduction performance of the proposed method compared to the previous research.4,6 For a fair comparison, previous research4,6 was implemented on the JM18.0 H.264/AVC reference software. In Table 9, methods 1 and 2, respectively represent methods of research from Refs. 4 and 6. The reduced complexity according to the and the relative distortion compared with the conventional H.264/AVC decoder were measured. Since method 2 reduces the complexity by up to 25% and method 1 exploits the picture decoding skip method to reduce the complexity by more than 30%, was only considered for 10% and 20% in this experiment. Compared to methods 1 and 2, the proposed method is better in objective quality loss. Since those methods reduce the complexity of the interpolation filtering process regardless of picture type — that is, through the picture or stored- picture which are used as references of following pictures — error will be propagated to the following pictures and the objective quality loss was much higher than the proposed method. Complexity control of the proposed method was also more accurate than methods 1 and 2, and the reduced complexity of proposed method was more similar to than these two methods.
Figure 9 shows a subjective quality comparison according to the represented in Table 8 from the 118th (the last decoded frame in one GOP) frame in City_corr sequence coded with QP 22 (high quality). Even if the increases, the decoded pictures still have acceptable subjective quality, although the objective quality degradation becomes larger.
To verify the proposed scheme on mobile devices, it was also implemented on a mobile device. Figure 10 shows subjective quality with the maximum complexity () and the minimum complexity () on a mobile device. Since the mobile device is not able to show the original resolution of the video sequence, it only displays a downsized picture after performing a downsizing process. As shown in Fig. 10, the subjective quality of the decoded picture compared to the one of maximum complexity was acceptable even with the minimum complexity.
This paper presented a complexity scalable H.264/AVC decoding scheme for portable multimedia devices. The proposed method controls the motion compensation, deblocking filtering, and MB decoding skipping process by adjusting these three complexity control parameters. Its C-D performance was evaluated according to the controlling parameters, and the optimal complexity control levels for each parameter were sought. The proposed scheme can control the decoding complexity with variable complexity control levels without significant subjective quality loss. Since the current scheme can adjust the decoding complexity of inter MBs only, future work may extend it to include also the decoding complexity controlling capability of intra MBs.
This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2011-001-7578).
Hoyoung Lee received the BS degree in electronics electrical engineering from Sungkyunkwan University, Suwon, Korea in 2007. He is currently a PhD candidate in the Digital Media Laboratory at Sungkyunkwan University. His research interests include video compression, resource-aware video coding and mobile multimedia framework.
Younghyeon Park received the BS degree in electronics electrical engineering from Sungkyunkwan University, Suwon, Korea in 2011. He is currently a PhD candidate in the Digital Media Laboratory at Sungkyunkwan University. His research interests include video compression and compressed sensing.
Byeungwoo Jeon received his BS degree in 1985 and an MS degree in 1987 from the Department of Electronics Engineering, Seoul National University, Seoul, Korea. He received his PhD degree in 1992 from the School of Electrical Engineering at Purdue University, Indiana, United States. From 1993 to 1997 he was in the Signal Processing Laboratory at Samsung Electronics in Korea, where he worked on video compression algorithms, designing digital broadcasting satellite receivers, and other MPEG-related research for multimedia applications. Since September 1997, he has been with the faculty of the School of Information and Communication Engineering, Sungkyunkwan University, Korea, where he is currently a professor. His research interests include multimedia signal processing, video compression, statistical pattern recognition, and remote sensing.