Depth-based 3D formats are currently being developed as extensions to both AVC and HEVC standards. The availability
of depth information facilitates the generation of intermediate views for advanced 3D applications and displays, and also
enables more efficient coding of the multiview input data through view synthesis prediction techniques. This paper
outlines several approaches that have been explored to realize view synthesis prediction in modern video coding
standards such as AVC and HEVC. The benefits and drawbacks of various architectures are analyzed in terms of
performance, complexity, and other design considerations. It is hence concluded that block-based VSP prediction for
multiview video signals provides attractive coding gains with comparable complexity as traditional motion/disparity
This paper presents a coding mode decision algorithm for MPEG-2 spatial transcoding. The optimization for coding mode and quantization scale are formulated in an operational rate distortion sense and solved by Lagarange multiplier method. The experimental results show that the proposed transcoder with optimized coding mode and quantizer can achieve better quality and lower bit rate than those obtained using cascaded transcoder or MPEG-2 TM5 encoder.
In practice, interlaced video sequences are typically coded with either a frame-only or field-only structure, irrespective of the content. However, coding in this way will not provide the best coding
efficiency. This paper proposes an adaptive picture-level field/frame coding scheme with corresponding rate control. First, a two-pass field/frame decision scheme is proposed. In this scheme, we formulate the field/frame decision as a constrained optimization problem.
The actual rate and distortion data are collected and the optimal picture-level coding decision is determined based on this data. An effective rate control for the proposed two-pass algorithm is also presented. However, since the complexity of the two-pass scheme
is relatively large since motion estimation must be performed for both the frame-based picture and the field-based picture, we also propose a one-pass field/frame decision scheme. This one-pass scheme calculates the variance of each macroblock in a field and estimates
the correlation between two fields. Based on the correlation, a decision to code the picture as a frame or as fields is made. A rate control method for the proposed one-pass scheme is also presented. Simulation results demonstrate that our scheme outperforms frame-only and field-only coding for several sequences coded at a wide range of bit-rates, and the proposed one-pass scheme obtains similar performance as the proposed two-pass scheme.
The MPEG-2 compressed digital video content is being used in a number of products including the DVDs, Camcorders, digital TV, and HDTV. The ability to access this widely available MPEG-2 content on low-power end-user devices such as PDAs and mobile phones depends on effective techniques for transcoding the MPEG-2 content to a more appropriate, low bitrate, video format such as MPEG-4. In this paper we present the software and algorithmic optimizations performed in developing a real time MPEG-2 to MPEG-4 video transcoder. A brief overview of the transcoding architectures is also provided. The details of the transcoding architectures for MPEG-2 to MPEG-4 video transcoding can be found in. The transcoder was targeted and optimized for Windows PCs with the Intel Pentium-4 processors. The optimizations performed exploit the SIMD parallelism offered by the Intel Pentium-4 processors. The transcoder consists of two distinct components: the MPEG-2 video decoder and the MPEG-4 video transcoder. The MPEG-2 video decoder is based on the MPEG-2 Software Simulation Group’s reference implementation while MPEG-4 transcoder is developed from scratch with portions taken from the MOMUSYS implementation of the MPEG-4 video encoder. The optimizations include: 1) generic block-processing optimizations that affected both the MPEG-2 decoder and the MPEG-4 transcoder and 2) optimizations specific to the MPEG-2 video decoder and the MPEG-4 video transcoder. The optimizations resulted in significant improvements both in MPEG-2 decoding as well as the MPEG-4 transcoding. With optimizations, the total time spent by the transcoder was reduced by over 82% with MPEG-2 decoding reduced by over 56% and MPEG-4 transcoding reduced by over 86%.
In this paper, we present a real-time adaptive streaming video platform. This platform is fully compliant with the Internet Streaming Media Alliance Implementation Specification. It has been used for experiments of Real-time video streaming and transcoding via unicast and multicast over heterogeneous networks. One of the examples of streaming video over a lossy channel is given, and a simple and efficient scheme for the packet loss recovery is presented.
This paper proposes an optimal rate allocation scheme for Fine-Granular Scalability (FGS) coded bitstreams that can achieve constant quality reconstruction of frames under a dynamic rate budget constraint. In doing so, we also aim to minimize the overall distortion at the same time. To achieve this, we propose a novel R-D labeling scheme to characterize the R-D relationship of the source coding process. Specifically, sets of R-D points are extracted during the encoding process and linear interpolation is used to estimate the actual R-D curve of the enhancement layer signal. The extracted R-D information is then used by an enhancement layer transcoder to determine the bits that should be allocated per frame. A sliding window based rate allocation method is proposed to realize constant quality among frames. This scheme is first considered for a single FGS coded source, then extended to operate on multiple sources. With the proposed scheme, the rate allocation can be performed in a single pass, hence the complexity is quite low. Experimental results confirm the effectiveness of the proposed scheme under static and dynamic bandwidth conditions.
This paper discusses the problem of reduced-resolution transcoding of compressed video bitstreams. An analysis of drift errors is provided to identify the sources of quality degradation when transcoding to a lower spatial resolution. Two types of drift error are considered: a reference picture error, which has been identified in previous works, and error due to the non-commutative property of motion compensation and down-sampling, which is unique to this work. To overcome these sources of error, four novel architectures are presented. One architecture attempts to compensate for the reference picture error in the reduced resolution, while another architecture attempts to do the same in the original resolution. We present a third architecture that attempts to eliminate the second type of drift error and a final architecture that relies on an intra block refresh method to compensate all types of errors. In all these architectures, a variety of macroblock level conversions are required, such as motion vector mapping and texture down-sampling. These conversions are discussed in detail. Another important issue for the transcoder is rate control. This is especially important for the intra refresh architecture since it must find a balance between number of intra blocks used to compensate errors and the associated rate-distortion characteristics of the low-resolution signal. The complexity and quality of the architectures are compared. Based on the results, we find that the intra refresh architecture offers the best trade-off between quality and complexity, and is also the most flexible.
We describe a technique for video summarization that uses motion descriptors computed in the compressed domain to speed up conventional color based video summarization technique. The basic hypothesis of the work is that the intensity of motion activity of a video segment is a direct indication of its 'summarizability.' We present experimental verification of this hypothesis. We are thus able to quickly identify easy to summarize segments of a video sequence since they have a low intensity of motion activity. Moreover, the compressed domain extraction of motion activity intensity is much simpler than the color-based calculations. We are able to easily summarize these segments by simply choosing a key-frame at random from each low- activity segment. We can then apply conventional color-based summarization techniques to the remaining segments. We are thus able to speed up color-based summarization techniques by reducing the number of segments on which computationally more expensive color-based computation is needed.
In this paper we present a new descriptor for spatial distribution of motion activity in video sequences. We use the magnitude of the motion vectors as a measure of the intensity of motion cavity in a macro-block. We construct a matrix Cmv consisting of the magnitudes of the motion vector for each macro-block of a given P frame. We compute the average magnitude of the motion vector per macro-block Cavg, and then use Cavg as a threshold on the matrix C by setting the elements of C that are less than Cavg to zero. We classify the runs of zeros into three categories based on length, and count the number of runs of each category in the matrix C. Our activity descriptor for a frame thus consists of four parameters viz. the average magnitude of the motion vectors and the numbers of runs of short, medium and long length. Since the feature extraction is in the compressed domain and simple, it is extremely fast. We have tested it on the MPEG-7 test content set, which consists of approximately 14 hours of MPEG-1 encoded video content of different kinds. We find that our descriptor enables fast and accurate indexing of video. It is robust to noise and changes in encoding parameters such as frame size, frame rate, encoding bit rate, encoding format etc. It is a low-level non-semantic descriptor that gives semantic matches within the same program, and is thus very suitable for applications such as video program browsing. We also find that indirect and computationally simpler measures of the magnitude of the motion vectors such as bits taken to encode the motion vectors, though less effective, also can be used in our run-length framework.
In this paper, we present a fade detection technique for indexing of MPEG-2 and MPEG-4 compressed video sequences. We declare a fade-in if the number of positive residual dc coefficients in P frames exceeds a certain percentage of the total number of non-zero dc coefficients consistently over several consecutive frames. Our fade-detection technique has fair accuracy and the advantage of high simplicity since it uses only entropy decoding and does not use computationally expensive inverse DCTs.
In this paper, we make use of true motion vectors for better error concealment. Error concealment in video is intended to recover the loss due to channel noise by utilizing available picture information. In our work, we do not change the syntax and thus no additional bits are required. This work focuses on improving the error concealment with transmitted true motion vectors. That is, we propose a 'true' motion estimation at the encoder while using a post-processing error concealment scheme that exploits motion interpolation at the decoder. Given the location of the lost regions and various temporal error concealment techniques, we demonstrate that our true motion vectors perform better than the motion vectors found by minimal-residue block-matching. Additionally, we propose a new error concealment technique that improves reconstruction quality when the previous frame has been heavily damaged. It has been observed that in the case of a heavily damaged frame, better predictions can be made from the past reference frame, rather than the current reference frame which is damaged. This is accomplished by extrapolating the decoded motion vectors so that they correspond to the past reference frame.
In this paper, we present a new, computationally efficient, effective technique for detection of abrupt scene changes in MPEG-4/2 compressed video sequences. We combine the dc image-based approach of Feng, Lo, and Mehrpour. The bit allocation-based approach has the advantage of computational simplicity, since it only requires entropy decoding of the sequence. Since extraction of dc images from I- Frames/Objects is simple, the dc image-based technique of Yeo is a good alternative for comparison of I- Frames/Objects. For P-Frames/Objects, however, Yeo's algorithm requires additional computation. We find that the bit allocation-change based approach is prone to false detection in comparison to intracoded objects in MPEG-4 sequences. However, if a suspected scene/object change has been located accurately in a group of consecutive frames/objects, the bit allocation-based technique quickly and accurately locates the cut point therein. This motivates us to use dc image-based detection between successive I- Frames/Objects, to identify the subsequences with scene/object changes, and then use bit allocation-based detection to find the cut point therein. Our technique thus has only a marginally greater complexity than the completely bit allocation-based technique, but has greater accuracy. It is applicable to both MPEG-2 sequences and MPEG-4 multiple- object sequences. In the MPEG-4 multiple object case, we use a weighted sum of the change in each object of the frame, using the area of the object as the weight.
The most straight-forward approach in obtaining a down- converted image sequence is to decimate each frame after it has been fully encoded. To reduce memory requirements and other costs incurred by this approach, a down-conversion decoder would perform a decimation within the decoding loop. In this way, predictions are made from a low-resolution reference which has experienced a considerable loss of information. Additionally the prediction s must be made from a set of motion vectors which correspond to the full- resolution image sequence. Given these conditions, it is desirable to optimize the performance of the motion compensation process. In this paper we show that the optimal set of filters for performing the low-resolution motion compensation are dependent on the choice of down-conversion are provided. To demonstrate the usefulness of these results, a sample set of motion compensation filters for each class of down-conversion are calculated. The results are incorporated into a low-resolution decoder and comparisons of each down-conversion class are made. Simulation results reveal that the filters which were based on multiple block down-conversion can reduce the amount of prediction drift found in the single block down-conversion by as much as 35 percent.
A novel method of developing test patterns for digital video coding systems is presented. This method is illustrated for an MPEG-2 encoder where, subject to fixed encoding parameters, the input frames are identically the expected output frames of reconstructed video. The method is based upon a method of developing test bitstreams for MPEG decoders that is also described. Both methods rely upon bit-accurate modeling of the system under test by a software codec, usually a C program. Natural video may be used to test digital video systems; however, encoded natural images seldom provide adequate coverage of either fixed-length or variable- length binary codeword spaces. Test bitstreams may be readily constructed for decoders to cover specific codeword subspaces, but they decode as noticeably artificial video frames. 'Artificial' test patterns are obtained for an encoder by first decoding a specially constructed test bitstream, then holding its higher-level encoding parameters fixed and iteratively encoding the 'artificial' video frames until convergence is reached, i.e., input frames equal output frames. Convergence occurs after a few or dozens of iterations, and the resulting 'artificial' test patterns retain substantial coverage.
This paper presents a general procedure for determining the optimal MPEG coding strategy in terms of the selection of macroblock coding modes and quantizer scales. The two processes of coding mode decision and rate control are intimately related to each other and should be determined jointly in order to achieve optimal coding performance. We formulate the constrained optimization problem and present solutions based upon rate- distortion characteristics, or R(D) curves, for all the macroblocks that compose the picture being coded. Distortion of the entire picture is assumed to be decomposable and expressible as a function of individual macroblock distortions, with this being the objective function to minimize. The determination of the optimal solution is complicated by the MPEG differential encoding of motion vectors and dc coefficients, which introduces dependencies that carry over from macroblock to macroblock for a duration equal to the slice length. Once the upper bound in performance is calculated, it can be used to assess how well practical sub-optimum methods perform.
The paper presents an algorithm attempting to combine both intra-frame and interframe information to reconstruct lost macroblocks due to imperfect communication channels when decoding a MPEG bitstream. The algorithm is a POCS-based (Projection Onto the Convex Set) iterative restoration algorithm incorporating both the temporal and spatial constraints derived from a set of analysis performed on the picture sequence. Often the use of the temporal information in the restoration process is complicated by the scene changes or large random motion activities. To reliably utilize the temporal information, we formulate a series of test to determine the usefulness of the temporal information. In addition, the tests yield a temporal constraint if the temporal information is deemed good. Along with the spatial constraints as described in , the temporal constraint is used in the proposed iterative restoration algorithm.
MPEG (Moving Picture Experts Group) video coding standard has emerged to facilitate the fast growth of full-motion compression on digital storage media and digital communication. As new applications arise, the problems of arising of noisy channels need to be solved. Some error resilience techniques have been proposed to address this problem. However, in MPEG compressed video there are some data elements within the picture-header which are absolutely crucial to decoding. Without them no decoding can be accomplished. Previous proposed error resilience techniques can only handle this kind of loss by replacing whole frame with the previously decoded frame. In this paper, an error concealment strategy is proposed for the case of losing picture-header information during transmission of the compressed MPEG bit- stream. This proposal is motivated by the fact that all bits are not equal important within the video bit-stream. The basic idea of this strategy is the use of the redundant picture-header concept which allows redundant transmission of these very sensitive data within the MPEG-2 video headers. This strategy has to be supported by appropriate ATM-type transport structure. The redundant picture header will be assigned to the different cell with the original picture- header to protect the loss of picture-header for the purpose of improving the noise channel performance.
This paper presents an adaptive error concealment technique for MPEG (Moving Picture Experts Group) compressed video. Error concealment algorithms are essential for many practical video transmission scenarios characterized by occasional data loss due to thermal noise, channel impairments, network congestion, etc.. Such scenarios of current importance include terrestrial (simulcast) HDTV, teleconferencing via packet networks, TV/HDTV over fiber-optic ATM (asynchronous transfer mode) systems, etc. In view of the increasing importance of MPEG video for many of these applications, a number of error concealment approaches for MPEG have been developed, and are currently being evaluated in terms of their complexity vs. performance trade-offs. Here, we report the results of recent work on a specific adaptive algorithm that provides excellent robustness properties for MPEG-1 video transmitted on either one- or two-tier transmission media. Receiver error concealment is intended to ameliorate the impact of lost video data by exploiting available redundancy in the decoded picture. The concealment process must be supported by an appropriate transport format which helps to identify the image pixel regions which correspond to lost video data. Once the image region (i.e., macroblocks, slices, etc.) to be concealed are identified, a combination of temporal and spatial replacement techniques may be applied to fill in the lost picture elements. The specific details of the concealment procedure will depend upon the compression algorithm being used, and on the level of algorithmic complexity permissible within the decoder. Simulation results obtained from a detailed end-to-end model that incorporates MPEG compression/decompression and a custom cell-relay (ATM type) transport format are reported briefly.
An intraframe block coding scheme which is devoted to of using interblock redundancy and reducing the block effect is proposed. The scheme is based upon a combination of a nonlinear interpolation with transform and vector quantization techniques. Interblock redundancy is removed by extracting a pixel from each block as a common seed to interpolate the rest of the pixels of the neighboring blocks. The estimation error signals are reduced by nonlinear interpolation, which often results in no further processing requirements for some blocks. This saving compensates for the cost of transmitting the overhead of the estimation parameters.