The performance improvement of the in-loop deblocking filter module in the H.264/AVC video coding standard in embedded systems is studied in this research. A novel prediction scheme is presented in to reduce the complexity of the filter selection process and hence increase overall performance. We first examine the H.264/AVC deblocking filters by studying their correlation in terms of the filter type and
pattern among a sequence of consecutive P frames and I frames. The experimental results show a high correlation of the filter skip rate and the filter pattern between different P frames and their leading I frame. Based on the correlation analysis, a binary history table predictor (the BHT predictor) and a complete history table predictor (the CHT predictor) are proposed to facilitate the deblocking filter selection process while maintaining good subjective and objective visual quality. We further present a hybrid filter prediction scheme that integrates both BHT and CHT to further improve prediction results.
The energy consumption profiling of the H.264 video decoder on VLIW embedded processors using the Trimaran simulator is conducted. Based on this study, we observe that the branch operations in the quarter-pixel (QP) interpolation and the DCT slow down the issue rate of the VLIW processors. Then, several new instruction architecture sets are proposed to address this issue. These new instructions can be used to speedup the issue rate, and reduce the total energy consumption. Finally, experimental results of the proposed instruction-level power-efficient strategies on the TI C6416 processor are reported and discussed.
The power consumption for the battery-supplied DSP-embedded multimedia systems based on a test platform, i.e. TI C64x, is analyzed in this research. We focus on the behavior of some frequently used compress/decompress functional modules. In particular, a MPEG-4 simple-profile decoder consisting of these modules is evaluated at the highest compiler optimization level so as to understand power allocation in embedded multimedia systems. Two DCT schemes are examined to find out a better power behavior. The integer DCT can reach 47% power saving as compared with an implementation of the float DCT. Overall, our studies provide a better understanding of the system-level power modeling and consumption estimation for embedded multimedia applications, and suggest some optimization methods.
Elliptic curve cryptography (ECC) is an excellent candidate for secure embedded multimedia applications due to its small key size and high security protection. The performance profiling of the ECC implementation, such as execution time and data cache stalls, on TriMedia TM1300 and Intel Pentium 4 is conducted in this research. Based on this study, we identify the main bottlenecks of the EEC implementation, and propose some favorable micro-architecture for this application. Moreover, several integer multiplication schemes are presented for the TM1300 processor for performance enhancement. In particular, the FIR-based multiplication is built with the special FIR instruction provided by TM1300. The performance improvement of the proposed schemes is reported and discussed. Overall, we aim at providing a good understanding of the system architecture of secure embedded multimedia applications, hardware and software cryptography implementation with ECC as an example.
Due to the rising complexity of modern embedded media applications (EMAs), the instruction level parallelism (ILP) is not sufficient to meet the need. Compilers must have the capability to exploit the superword level parallelism (SLP), which can expose more concurrency lying in applications, minimize the latency created by memory access and hence produce more efficient codes. The loop is a good candidate for SLP extraction because of its paralleled structure between iterations. This work analyzes the memory access patterns found in EMAs and presents our method of loop unrolling to fully utilize these patterns to generate efficient Single Instruction Multiple Data (SIMD) instructions. Experimental results performed on TriMedia TM-1300 processor for the H.264 encoder show performance improvement by a factor ranging from 3 to 30 times with an average of 12 times.
We investigate the encoding speed improvement for H.264 with a special focus on fast intra-prediction mode selection in this work. It is possible to adopt the rate-distortion (RD) optimized mode in H.264 to maximize the coding gain at the cost of a very high computational complexity. To reduce the complexity associated with the intra-prediction mode selection, we propose a two-step fast algorithm. In the first step, we make a course-level decision to split all possible candidate modes into two groups: the group to be examined further and the group to be ignored. The sizes of these two groups are adaptively determined based on the block activities. Then, in the second step, we focus on the group of interest, and consider an RD model for final decision-making. It is demonstrated by experiment results that the proposed scheme performs 5 to 7 time faster than the current H.264 encoder (JM5.0c) with little degradation in the coding gain.
A detailed study of the impact of memory bank conflict on the performance of EMAs is presented. Based on the study, novel schemes utilizing SIMD and array padding are described to solve the memory bank conflict problem. Since the parameter in array padding has a great impact on the overall behavior of the memory system, how to achieve optimal padding is an important research topic. Here, we analyze the padding effect and develop a probabilistic model to determine the optimal padding distance. Preliminary experimental results are given to verify the correctness of this model.
In this work, we investigate the congestion control problem for layered video multicast in IP networks of active
queue management (AQM) using a simple random early detection (RED) queue model. AQM support from networks improves the visual quality of video streaming but makes network adaptation more di±cult for existing layered video multicast proticols that use the event-driven timer-based approach. We perform a simplified analysis on the response of the RED algorithm to burst traffic. The analysis shows that the primary problem lies in the weak correlation between the network feedback and the actual network congestion status when the RED queue is driven by burst traffic. Finally, a design guideline of the layered multicast protocol is proposed to overcome this problem.
The distributed <i>Multiplayer Online Game</i> (MOG) system is complex since it involves technologies in computer graphics, multimedia, artificial intelligence, computer networking, embedded systems, etc. Due to the large scope of this problem, the design of MOG systems has not yet been widely addressed in the literatures. In this paper, we review and analyze the current MOG system architecture followed by evaluation. Furthermore, we propose a clustered-server architecture to provide a scalable solution together with the region oriented allocation strategy. Two key issues, i.e. interesting management and synchronization, are discussed in depth. Some preliminary ideas to deal with the identified problems are described.
A new approach to estimate the surface curvatures from 3D triangular mesh surfaces with Gaussian curvature's geometry interpretation is proposed in this work. Unlike previous work, the proposed method does not use local surface fitting, partial derivative computation, or oriented normal vector recovery. Instead, the Gaussian curvature is estimated at a vertex as the area of its small neighborhood under the Gaussian map divided by the area of that neighborhood. The proposed approach can handle vertices with the zero Gaussian curvature uniformly without localizing them as a separate process. The performance is further improved with the local Bezier curve approximation and subdivision. The effectiveness of the proposed approach for meshes with a large range of coarseness is demonstrated by experiments. The application of the proposed method to 3D surface segmentation and 3D mesh feature extraction is also discussed.
A layered video multicast framework for differentiated service (DS) networks, which provides various levels of QoS guarantee for heterogeneous users with improved performance in network congestion adaptation, is examined in this research. The proposed system consists of three key components: extended active queue management, hierarchical priority marking, and receiver-driven layered multicast with ECN (RLME). Particularly, we introduce RLME protocol that effectively utilizes advanced features of active queues such as random early drop (RED) and early congestion notification (ECN) in DS networks. The RLME protocol quantitatively estimates the network congestion level via ECN and packet loss to improve adaptation capability to network congestion, and utilizes the priority service from DS networks to minimize the packet loss effect on reconstructed video quality. The simulation shows that the proposed system successfully achieves stable and controllable video QoS guarantee for heterogeneous video clients over DS networks.