In movies and TV shows, it is common that several scenes repeat alternately. These videos are characterized with the long-term temporal correlation, which can be exploited to improve video coding efficiency. However, in applications supporting random access (RA), a video is typically divided into a number of RA segments (RASs) by RA points (RAPs), and different RASs are coded independently. In such a way, the long-term temporal correlation among RASs with similar scenes cannot be used. We present a scene-library-based video coding scheme for the coding of videos with repeated scenes. First, a compact scene library is built by clustering similar scenes and extracting representative frames in encoding video. Then, the video is coded using a layered scene-library-based coding structure, in which the library frames serve as long-term reference frames. The scene library is not cleared by RAPs so that the long-term temporal correlation between RASs from similar scenes can be exploited. Furthermore, the RAP frames are coded as interframes by only referencing library frames so as to improve coding efficiency while maintaining RA property. Experimental results show that the coding scheme can achieve significant coding gain over state-of-the-art methods.
3D Video system based on Depth-Image-Based Rendering relies on high quality depth data. Errors distributed randomly
in depth map sequences induce annoying temporal noise, such as flickering and object shifting. Prior studies on video
quality assessment focused mainly on spatial quality of the tested sequence and often ignored its temporal performance.
In synthesized sequences, a large number of tiny geometric distortions and illumination differences are temporally
constant and perceptually invisible. The dynamic noise impairs subjective quality of the sequences more greatly than the
static spatial noise. Temporal quality plays a dominant role in overall quality assessment on the synthesized sequences
with temporal instability problem. We propose a simple full-reference metric, Peak Signal to Perceptible Temporal Noise
Ratio, to evaluate quality of synthesized sequences by measuring the perceptible temporal noise in them.
Variable-length coding (VLC) is widely used in video coding to improve compression efficiency. However, suffering from loss of synchronization, the VLC bit stream is much more sensitive to random errors than a fixed-length coding (FLC) bit stream. Error-resilient entropy coding (EREC) is a valid tool combating random errors in a VLC bit stream. Due to its intrinsic property of error propagation, when EREC is applied to a video bit stream, those blocks placed later become much more likely to be lost. We propose a simple method to further improve the error robustness of a video bit stream by interleaving transform coefficients of blocks so that low-frequency information is always placed ahead of high-frequency information. Thus, low-frequency information of greater significance is less likely to be lost. Experimental results prove the superiority of the proposed method. In addition, block interleaving can also be used in a data-partitioned video bit stream with ease.
This paper describes fixed-point design methodologies and several resulting implementations of the Inverse
Discrete Cosine Transform (IDCT) contributed by the authors to MPEG's work on defining the new 8x8 fixed
point IDCT standard - ISO/IEC 23002-2. The algorithm currently specified in the Final Committee Draft (FCD)
of this standard is also described herein.
This paper presents a straightforward multiplier-less approximation of the forward and inverse Discrete Cosine Transform
(DCT) with low complexity and high accuracy. The implementation, design methodology, complexity and performance
tradeoffs are discussed. Particular, the proposed IDCT implementations, in spite of simplicity, comply with and
can reach far beyond the MPEG IDCT accuracy specification ISO/IEC 23002-1, and also reduce drift favorably compared
to other existing IDCT implementations.
This paper analyzes the drift phenomenon that occurs between video encoders and decoders that employ different
implementations of the Inverse Discrete Cosine Transform (IDCT). Our methodology utilizes MPEG-2, MPEG-4
Part 2, and H.263 encoders and decoders to measure drift occurring at low QP values for CIF resolution video
sequences. Our analysis is conducted as part of the effort to define specific implementations for the emerging ISO/IEC
23002-2 Fixed-Point 8x8 IDCT and DCT standard. Various IDCT implementations submitted as proposals for the new
standard are used to analyze drift. Each of these implementations complies with both the IEEE Standard 1180 and the
new MPEG IDCT precision specification ISO/IEC 23002-1. Reference implementations of the IDCT/DCT, and
implementations from well-known video encoders/decoders are also employed. Our results indicate that drift is
eliminated entirely only when the implementations of the IDCT in both the encoder and decoder match exactly. In this
case, the precision of the IDCT has no influence on drift. In cases where the implementations are not identical, then the
use of a highly precise IDCT in the decoder will reduce drift in the reconstructed video sequence only to the extent that
the IDCT used in the encoder is also precise.
In this paper, we propose a VLSI architecture for AVS intra frame encoder. Reconstruction loop hinders the parallelism
exploration and becomes the critical path in an intra frame encoder. A First Selection Then Prediction (FSTP) method is
proposed to break the loop and enable the parallel process of intra mode selection and reconstruction on neighboring
blocks. In addition, area-efficient modules were developed. Configurable intra predictor can support all the intra
prediction modes. A CA-2D-VLC engine with an area-efficient Exp-Golomb encoder was developed to meet the
encoding speed demand with comparably low hardware cost. Synthesized with 0.18 m CMOS standard-cell library, the
overall hardware cost of the proposed intra frame encoder is 89k logic gates at the clock frequency constraint of 125MHz.
Proposed encoder can satisfy real time encoding of 720x576 4:2:0 25fps video at the working frequency of 54MHz.
H.264/AVC achieves higher compression efficiency than previous video coding standards. However, this comes at the
cost of increased complexity due to the use of variable block size motion estimation and long-term memory motion
compensated prediction (LTMCP). In this paper, an efficient multi-frame dynamic search range motion estimation
algorithm is proposed. This algorithm can adjust the spatial search range and temporal search range according to the
video content dynamically. This algorithm can be on the top of many other fast motion estimation (Fast ME) algorithms.
Compared with the constant search range scheme used by multi-frame UMHexagonS algorithm, the proposed algorithm
can be 4.86 time faster, with negligible degradation of video quality.
Audio Video coding Standard (AVS) is established by the Working Group of China in the same name. AVS-video is an application driven coding standard. AVS Part 2 targets to high-definition digital video broadcasting and high-density storage media and AVS Part 7 targets to low complexity, low picture resolution mobility applications. Integer transform, intra and inter-picture prediction, in-loop deblocking filter and context-based two dimensional variable length coding are the major compression tools in AVS-video, which are well-tuned for target applications. It achieves similar performance to H.264/AVC with lower cost.
Adaptive Block-size Transforms (ABT) has been widely used in image/video coding, since it exploits the maximum feasible signal length for transform coding. However, if the transforms in an ABT coding system are Integer Cosine Transforms (ICT), not only separate transform units but also different scaling matrices are required, which consume a vast amount of resources in practical implementations. In this paper, a new approach to compatible ABT is presented, by which 8x8, 8x4, 4x8 and 4x4 transforms can be processed in one transform unit. Furthermore, with Pre-scaled Integer
Transform (PIT), the compatibility of scaling matrices especially for 8x4 and 4x8 ICT can be achieved and a single scaling matrix is required. Simulation results and analysis reveal that this approach greatly saves hardware resources and makes the implementation of ABT much easier without loss of performance.
In this paper, we describe a VLSI architecture of video decoder for AVS (Audio Video Coding Standard). The system architecture, as well as the design of major function-specific processing units (VAriable Length Decoder, Deblocking Filter), is discussed. Analyzing the architecture of decoder system and the feature of each processing unit, we develop a system controller combined the centralized and decentralized control scheme, which provides high efficient communication between the processing units and minimizes the size of interconnected buffers. A bus-arbitration algorithm named Token Ring algorithm is designed to control the allocation of the SDRAM bus. This algorithm can avoid the conflicts on bus and reduce the internal buffer size, and its control logic is simple. Our simulation shows that this architecture can meet the requirement of AVS Jizhun Profile@4.0 level real time decoding, without a high cost in hardware and clock rate. Moreover, some design idea in the AVS decoder can be expanded to H.264 because of the similarity between the two video coding standards.
The prediction error can be decreased by incorporating with high accuracy estimation and compensation, and the performance of compressed video can be improved. Two fast algorithms of fractional-pixel accuracy video motion estimation are proposed in this paper. After half-pixel accuracy motion estimation, a high accuracy motion estimation can be calculated with the intermediate results. The algorithm is based on the intermediate results in half pixel accuracy motion estimation, and traditional fast block matching algorithms can also be implemented in half pixel accuracy. The arbitrary fractional-pixel accuracy motion estimation can be achieved directly, at the cost of small computational overhead. The approach described in this paper eliminates the systematic limitations of conventional block matching. Experimental results using typical video sequences show that the proposed algorithm can achieve better PSNR and lower bit rates in higher fractional pixel accuracy than in the half pixel motion estimation and compensation. These fast motion estimation algorithms provide methods for studying higher pixel accuracy motion estimation in video compression coding. The proper fractional pixel accuracy motion estimation and compensation by truncating the precise results will be the best way to achieve more efficiently video compensation and higher image quality.
With the increase of the complex of VLSI such as the SoC (System on Chip) of MPEG-2 Video decoder with HDTV scalability especially, simulation and verification of the full design, even as high as the behavior level in HDL, often proves to be very slow, costly and it is difficult to perform full verification until late in the design process. Therefore, they become bottleneck of the procedure of HDTV video decoder design, and influence it's time-to-market mostly. In this paper, the architecture of Hardware/Software Interface of HDTV video decoder is studied, and a Hardware-Software Mixed Simulation (HSMS) platform is proposed to check and correct error in the early design stage, based on the algorithm of MPEG-2 video decoding. The application of HSMS to target system could be achieved by employing several introduced approaches. Those approaches speed up the simulation and verification task without decreasing performance.
Facial Animation Parameters (FAPs) are defined in MPEG-4 to animate a facial object. The algorithm proposed in this paper to extract these FAPs is applied to very low bit-rate video communication, in which the scene is composed of a head-and-shoulder object with complex background. This paper addresses the algorithm to automatically extract all FAPs needed to animate a generic facial model, estimate the 3D motion of head by points. The proposed algorithm extracts human facial region by color segmentation and intra-frame and inter-frame edge detection. Facial structure and edge distribution of facial feature such as vertical and horizontal gradient histograms are used to locate the facial feature region. Parabola and circle deformable templates are employed to fit facial feature and extract a part of FAPs. A special data structure is proposed to describe deformable templates to reduce time consumption for computing energy functions. Another part of FAPs, 3D rigid head motion vectors, are estimated by corresponding-points method. A 3D head wire-frame model provides facial semantic information for selection of proper corresponding points, which helps to increase accuracy of 3D rigid object motion estimation.
In this paper, a large range motion estimation algorithm based on hierarchical motion vector estimation is proposed. Aimed at objects move in large area in image sequences, the proposed algorithm encounter with problems caused by large range search, such as huge computational complexity and relative bad performance. The computer simulation results are also given for various test sequences.
KEYWORDS: Data storage, Video, Multiplexing, Dielectrophoresis, Data processing, Video processing, Time division multiplexing, Electronics engineering, Information science, Distributed interactive simulations
In this paper, a time division multiplexed task scheduling (TDM) is designed for HDTV video decoder is proposed. There are three tasks: to fetch decoded data from SDRAM for displaying (DIS), read the reference data from SDRAM for motion compensating (REF) and write the motion compensated data back to SDRAM (WB) on the bus. The proposed schedule is based on the novel 4 banks interlaced SDRAM storage structure which results in less overhead on read/write time. Two SDRAM of 64M bits (4Bank×512K×32bit) are used. Compared with two banks, the four banks storage strategy read/write data with 45% less time. Therefore the process data rates for those three tasks are reduced. TDM is developed by round robin scheduling and fixed slot allocating. There are both MB slot and task slot. As a result the conflicts on bus are avoided, and the buffer size is reduced 48% compared with the priority bus scheduling. Moreover, there is a compacted bus schedule for the worst case of stuffing owning to the reduced executing time on tasks. The size of buffer is reduced and the control logic is simplified.
Model-based coding has been studied mostly on video sequence with only one person. Normally there are several persons in the videoconference scenes. In this paper, model-based coding for multi-objects (or multi-person) sequence is explored. A multi-scale 3D wire-frame of head has been constructed to fit for different head size on the scene. The object is composed of several components such as head, shoulder and arm. Wire- frame of shoulder and arm are built in this contribution. By connecting different components according to physical structure, a more complete wire-frame of upper body with head is built. The ways of motion parameters transferring through the different components are determined. The bit-rate of three-object sequence is compared with one-object sequence with the same image format. Model-based coding efficiency of multi-object sequence is higher than that of one-object sequence.
In MPEG-4, Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs) are defined to animate 1 a facial object. Most of the previous facial animation reconstruction systems were focused on synthesizing animation from manually or automatically generated FAPs but not the FAPs extracted from natural video scene. In this paper, an analysis-synthesis MPEG-4 visual communication system is established, in which facial animation is reconstructed from FAPs extracted from natural video scene.
In MPEG-4, two sets of parameters are defined: Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs). The FDPs are used to customize the proprietary face model of the decoder to a particular face or to download a face model along with the information about how to animate it. And the FAPs are based on the study of minimal facial actions and are closely related to muscle actions, they represent a complete set of basic facial actions, and therefore allow the representation of most facial expressions. In this paper, we propose a simple key-point displacement-controlling muscle model, which describes how the adjacent facial tissue moves with the key points to reconstruct facial animation using FAPs.
Model-based image coding has been given extensive attention due to its high subject image quality and low bit-rates. But the estimation of object motion parameter is still a difficult problem, and there is not a proper error criteria for the quality assessment that are consistent with visual properties. This paper presents an algorithm of the facial motion parameter estimation based on feature point correspondence and gives the motion parameter error criteria. The facial motion model comprises of three parts. The first part is the global 3-D rigid motion of the head, the second part is non-rigid translation motion in jaw area, and the third part consists of local non-rigid expression motion in eyes and mouth areas. The feature points are automatically selected by a function of edges, brightness and end-node outside the blocks of eyes and mouth. The numbers of feature point are adjusted adaptively. The jaw translation motion is tracked by the changes of the feature point position of jaw. The areas of non-rigid expression motion can be rebuilt by using block-pasting method. The estimation approach of motion parameter error based on the quality of reconstructed image is suggested, and area error function and the error function of contour transition-turn rate are used to be quality criteria. The criteria reflect the image geometric distortion caused by the error of estimated motion parameters properly.
Model-based image coding is a well-known solution for image communication at very low bit-rate. But very complex techniques and large amount of computation are involved in these systems. It is especially difficult to automatically extract Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs), which are defined in MPEG-4, from 2D image to represent 3D moving objects. In this paper, an algorithm using intra- and inter-frame information to estimate features parameters is proposed. It utilizes spatial information (edge information) as well as temporal difference between successive frames. The combination using of 2 kinds of information makes the system more robust. Physiological symmetry and proportion is another kind of knowledge used here to make the system to less computational intenseness.