A novel video compression scheme that exploits the idea of second-order-residual (SOR) coding is proposed for
high-bit-rate video applications in this work. We first study the limitation of today's high performance video
coding standard, H.264/AVC, and show that it is not effective in the coding of small image features and variations
for high-bit-rate video contents. For low to medium quality video streams, these small image features can be
removed by the quantization process. However, when the quantization stepsize becomes small in high-bit-rate
video, their existence degrades the rate-distortion coding performance significantly. To address this problem, we
propose a coding scheme that decomposes the residual signals into two layers: the first-order-residual (FOR) and
the second-order-residual (SOR). The FOR contains low frequency residuals while the SOR contains the high
frequency residuals. We adopt the H.264/AVC for the FOR coding and propose two schemes, called SOR-freq
and SOR-bp, for the SOR coding. It is shown by experimental results that the proposed FOR/SOR scheme
outperforms H.264/AVC by a significant margin (with about 20% bit rate saving) in high-bit-rate video.
The performance improvement of the in-loop deblocking filter module in the H.264/AVC video coding standard in embedded systems is studied in this research. A novel prediction scheme is presented in to reduce the complexity of the filter selection process and hence increase overall performance. We first examine the H.264/AVC deblocking filters by studying their correlation in terms of the filter type and
pattern among a sequence of consecutive P frames and I frames. The experimental results show a high correlation of the filter skip rate and the filter pattern between different P frames and their leading I frame. Based on the correlation analysis, a binary history table predictor (the BHT predictor) and a complete history table predictor (the CHT predictor) are proposed to facilitate the deblocking filter selection process while maintaining good subjective and objective visual quality. We further present a hybrid filter prediction scheme that integrates both BHT and CHT to further improve prediction results.
Due to the rising complexity of modern embedded media applications (EMAs), the instruction level parallelism (ILP) is not sufficient to meet the need. Compilers must have the capability to exploit the superword level parallelism (SLP), which can expose more concurrency lying in applications, minimize the latency created by memory access and hence produce more efficient codes. The loop is a good candidate for SLP extraction because of its paralleled structure between iterations. This work analyzes the memory access patterns found in EMAs and presents our method of loop unrolling to fully utilize these patterns to generate efficient Single Instruction Multiple Data (SIMD) instructions. Experimental results performed on TriMedia TM-1300 processor for the H.264 encoder show performance improvement by a factor ranging from 3 to 30 times with an average of 12 times.
A detailed study of the impact of memory bank conflict on the performance of EMAs is presented. Based on the study, novel schemes utilizing SIMD and array padding are described to solve the memory bank conflict problem. Since the parameter in array padding has a great impact on the overall behavior of the memory system, how to achieve optimal padding is an important research topic. Here, we analyze the padding effect and develop a probabilistic model to determine the optimal padding distance. Preliminary experimental results are given to verify the correctness of this model.