The Grand Alliance was formed to define and contrast a system for the delivery of HDTV using terrestrial broadcast channels. This system is composed of the best components from previously competing systems before the FCC. MPEG-2 syntax is used with novel encoding techniques to deliver a set of video scanning formats for a variety of applications. This paper describes the important features and concepts embodied in the Grand Alliance system.
Differential Vector Quantization (DVQ) is a variable-length lossy compression technique for digital video. An Artificial Neural Network (ANN) is used to develop entropy-biased codebooks which yield substantial data compression without entropy coding and are very robust with respect to transmission channel errors. Huffman coding is a variable-length lossless compression technique where data with a high probability of occurrence is represented with short codewords, while data with lower probability of occurrence is assigned longer codewords. We discuss how these codebooks can be used to realize variable bit-rate coders for the DVQ case and also we address Huffman coding in its extreme effect when data is highly predictable and differential coding can be applied. Two methods are presented for variable bit-rate coding using two different approaches. In the first method, we use DVQ algorithm and both the encoder and the decoder have multiple codebooks of different sizes. In the second, we address the issues of real-time transmission using Huffman coding. The algorithm is based on differential pulse code modulation (DPCM), but additionally utilizes a non-uniform and multi-level Huffman coder to reduce the data rate substantially below that achievable with conventional DPCM. We compare the performance of these two approaches under conditions of error-free and error-prone channels. Our results show that one coding technique is resistant to channel errors than the other, and yields no visible degradation in picture quality at moderate compression rate.
The number of operations in the coding part of adaptive arithmetic coding is independent of the number of symbols. The number of operations in a traditional implementation of the adaptive part, however, increases linearly with the number of symbols. therefore, the adaptive updating of the model consumes the vase majority of computational operations if the number of symbols is large, as is typical in image coding. This paper presents a fast alternative of implementing the adaptive part in a hierarchical fashion so that the number of operations depends only logarithmically on the number of symbols.
In implementation of hybrid compression algorithms, the Huffman or Modified Huffman codec can consume a significant portion of silicon real-estate or CPU cycles. To reduce this cost, several schemes have been published that take advantage of one or more inherent properties of the variable length code tables. This paper examines some of these properties and their corresponding architectural components which can be pieced together to form custom hybrids suited to specific applications. Hardware architectural classifications include: serial and parallel Trees, Content Addressable Memory, Programmable Logic Arrays, and parallel comparators schemes that resemble flash A/D architectures. Assessment criteria include: bit rate vs. symbolic rate performance, clock cycle ratios, latencies, pre-buffering and post- buffering, codebook and source channel statistical dependencies, encoder and decoder circuitry sharing, pre-processing of codebooks, critical path, register use in software, breakdown between memory and logical operators, custom vs. standard cells, and code word order. Finally, the performance and size of current industrial implementations for specific application (JPEG, MPEG) are summarized.
Huffman coding is one of the most common forms of lossless data compression. Many lossy image compression standards, for example the MPEG and JPEG, use Huffman coding as the back end entropy compressor because of its relatively good compression performance and simple hardware implementation. However, the decoding speed is limited by a feedback loop. For applications that require high speed decoding, such as High Definition Television at about 100 Mbyte/s, this feedback loop can be prohibitively slow. The highest speed conventional `parallel' Huffman decoders decode one complete codeword per look-up table memory cycle. This paper describes three different hardware designs that break through this limit. All three depend on probabilistic modeling of the coded data stream to predict, or speculate, on the values of adjacent codewords. One design uses a single fully specified memory with enough width for two, or more, output tokens. The other two designs use multiple memories each fed by a different portion of the code stream. This superscalar approach leads to average decode rates twice, or more, that of a conventional `parallel' decoder for a simulation of JPEG Huffman token data. The relative performance versus hardware cost is described for each design.
The major artifacts of JPEG compressed images are blocking and ringing, which are mainly due to the quantization of low frequency and high frequency DCT components respectively. The ringing artifacts are particularly dominant in document images, where sharp edges present commonly in text and graphics are more likely to be encountered. In this paper we describe a decompression algorithm with reduction in both ringing and blocking artifacts.
Post-processing can alleviate or remove artifacts introduced by compression. However without a priori information, image enhancement schemes may fail. What is noise in one image may be important data in another. Fortunately, in image compression, we have an advantage. Before an image is stored or transmitted, we have access to the original and the distorted versions. The enhanced codec is compared to the original block by block to determine which blocks have been improved by the enhancement. These blocks are then flagged for post- processing in a way that is compliant with the JPEG standard and adds nothing to the compressed images` bandwidth. A single JPEG coefficient is adjusted so that the sum of the coefficients contains the flag for post-processing as the parity of the block. Half of the blocks already have the correct parity. In the other blocks, a coefficient that is close to being half way between two values will be chosen and rounded in the other direction. This distorts the image by a very tiny amount. The end result is a compressed image that can be decompressed on any standard JPEG decompressor, but that can be enhanced by a sophisticated decompressor.
In this paper, we propose to use a class of nonlinear filters, called weighted median filters, as preprocessors to JPEG-based image coding. A theory to design weighted median filters which maximally smooth the image subject to a set of important image details is presented. An efficient algorithm is provided and simulations are given to demonstrate the performance of these preprocessors.
The block DCT transform plays an important role in the JPEG still image compression standard. In this research, we investigate the replacement of the DCT block transform in the JPEG with the fully-decomposed wavelet packet transform while keeping all other building blocks the same with the objective to improve the performance of the JPEG. Different wavelet packet bases have been used in experiments. Most of them give better performance than that of the DCT-based JPEG scheme in the mean square error measure as well as visual appearance.
An efficient image source coding technique gives good compression performance at low computational complexity. This research introduces an efficient coding technique, based on pyramid coding, that involves transforming an image into an equivalent, lower entropy form prior to lossless coding. The proposed method is also a multiresolution technique that facilitates progressive image transmission.
We present a method for progressive lossless compression of still grayscale images that combines the speed of our earlier FELICS method with the progressivity of our earlier MLP method. We use MLP's pyramid-based pixel sequence, and image and error modeling and coding based on that of FELICS. In addition, we introduce a new prefix code with some advantages over the previously used Golomb and Rice codes. Our new progressive method gives compression ratios and speeds similar to those of non-progressive FELICS and those of JPEG lossless mode, also a non-progressive method. The image model in Progressive FELICS is based on a simple function of four nearby pixels. We select two of the four nearest known pixels, using the two with the middle (non-extreme) values. Then we code the pixel's intensity relative to the selected pixels, using single bits, adjusted binary codes, and simple prefix codes like Golomb codes, Rice codes, or the new family of prefix codes introduced here. We estimate the coding parameter adaptively for each context, the context being the absolute value of the difference of the predicting pixels; we adjust the adaptation statistics of the beginning of each level in the progressive pixel sequence.
A simple yet efficient image data compression method is presented in this paper. This method is based on coding only those segments of the image that are perceptually significant to the reconstruction of the image. Sequences of image pixels whose gray level differences from the pixels of the previous row exceed two prespecified thresholds are considered significant. These pixels are coded using a DPCM scheme with recursively indexed nonuniform quantizers. Simulation results show that this scheme can get subjectively satisfactory reconstructed images at low bit rates. It is also computationally very simple which makes it amenable to fast implementation.
The Partitioned Iterated Function Systems (PIFS) involves partitioning a image into non-overlapping range blocks and overlapping domain blocks. For each range block, a matching domain block is found by applying a contractive affine transformation. The set of transformations uniquely defines an estimation of the original image. In this paper, we investigate the quadtree partition technique of generating range blocks. In this technique, a range block is recursively partitioned into 4 equally sized sub-squares, when it is not covered well enough by a domain. But this method has a inherent penalty in which the existence of range blocks of different sizes requires the storage of the sizes and coordinates of the range blocks. We present a scheme that stores the required decoding information with very little overhead. One of the properties offractals is that detail exists at every scale. This has led us to investigate into the possibility of exploiting the scalability property of fractals to allow us to improve the compression time and ratio.
This paper describes a segmentation-based approach for image compression. The image to be compressed is represented as regions and a contour map, with each coded separately. The proposed method is applicable to monochrome, color, and mixed image data.
The huge amount of storage needed for document images is a major hindrance to widespread use of document image processing (DIP) systems. Although current DIP systems store document images in a compressed form, there is much room for improvement. In this paper, a nearly-lossless document image compression method is investigated which preserves the relevant information of a document. The proposed approach is based on the segmentation of a document image into different blocks that are classified into one of several block classes and compressed by a block-class-specific data compression method. Whereas image and graphics blocks are compressed using standard image compression methods, text blocks are fed into a text and font recognition module and converted into their textual representation. Finally, text blocks are compressed by encoding their textual representation and enough formatting information to be able to render them as faithfully as possible to the original document. Preliminary results show that (1) the achievable compression ratios compare favorably with standard document image compression methods for all document images tested and (2) the quality of the decompressed image depends on the recognition accuracy of the text recognition module.
The discrete cosine transform (DCT) is well known for highly efficient coding performance, and it is widely used in many image compression applications. However, in low-bit rate coding, it produces undesirable block artifacts that are visually not pleasing. In addition, in many applications, faster compression and easier VLSI implementation of DCT coefficients are also important issues. The removal of the block artifacts and faster DCT computation are therefore of practical interest. In this paper, we outline a modified DCT computation scheme that provides a simple efficient solution to the reduction of the block artifacts while achieving faster computation. We also derive a similar solution for the efficient computation of the inverse DCT. We have applied the new approach for the low-bit rate coding and decoding of images. Initial simulation results on real images have verified the improved performance obtained using the proposed method over the standard JPEG method.
It is popular to use multiresolution systems for image coding and compression. However, general-purpose techniques such as filter banks and wavelets are linear. While these systems are rigorous, nonlinear features in the signals cannot be utilized in a single entity for compression. Linear filters are known to blur the edges. Thus, the low-resolution images are typically blurred, carrying little information. We propose and demonstrate that edge- preserving filters such as median filters can be used in generating a multiresolution system using the Laplacian pyramid. The signals in the detail images are small and localized in the edge areas. Principal component vector quantization (PCVQ) is used to encode the detail images. PCVQ is a tree-structured VQ which allows fast codebook design and encoding/decoding. In encoding, the quantization error at each level is fed back through the pyramid to the previous level so that ultimately all the error is confined to the first level. With simple coding methods, we demonstrate that images with PSNR 33 dB can be obtained at 0.66 bpp without the use of entropy coding. When the rate is decreased to 0.25 bpp, the PSNR of 30 dB can still be achieved. Combined with an earlier result, our work demonstrate that nonlinear filters can be used for multiresolution systems and image coding.
In this paper, we present the results of image compression using smooth block transforms. It is shown that such smooth block transforms solve the blocking effect problem very well, especially for high compression ratio. We also discuss the quasi-optimal smooth block transform, but it leaves an open question--computation complexity.
We study the tradeoffs involved in choosing the bit allocation in a multiresolution remote image retrieval system. Such a system uses a multiresolution image coding scheme so that a user accessing the database will first see a coarse version of the images and will be able to accept or discard a given image faster, without needing to receive all the image data. We formalize the problem of choosing the bit allocation (e.g., in the two resolution case, how many bits should be given to the coarse image and the additional information, respectively?) so that the overall delay in the query is minimized. We provide analytical methods to find the optimal solution under different configurations and show how a good choice of the bit allocation results in a reduction of the overall delay in the query.
In this paper we present an overview of techniques used to implement various image and video compression algorithms using parallel processing. Approaches used can largely be divided into four areas. The first is the use of special purpose architectures designed specifically for image and video compression. An example of this is the use of an array of DSP chips to implement a version of MPEG1. The second approach is the use of VLSI techniques. These include various chip sets for JPEG and MPEG1. The third approach is algorithm driven, in which the structure of the compression algorithm describes the architecture, e.g. pyramid algorithms. The fourth approach is the implementation of algorithms on high performance parallel computers. Examples of this approach are the use of a massively parallel computer such as the MasPar MP-1 or the use of a coarse-grained machine such as the Intel Touchstone Delta.
Segmenting a block motion field into two regions was proposed to reduce prediction errors in block-based video coding algorithm. Because the segmentation cannot be computed at the decoder without the frame to code, it was also proposed to compute a predicted segmentation from the previous coded frames. The performance of such an approach highly depends on the difference between the predicted and the true segmentation. Although under translational motion the predicted and true segmentation are identical, coding noise introduces distortions in the past images which result in a predicted segmentation with errors along the boundaries between the different regions. Small deviations from the hypothesis of translational motion also have a similar effect. In this paper, we propose to take into account the uncertainty in the predicted segmentation and thereby improve the motion compensated prediction. This is accomplished by optimizing the predicted values of the pixels in a segmented region with respect to the segmentation uncertainty. By also taking into account the uncertainty on the region boundaries when the segmentation is computed, we further improve the motion compensated prediction. The proposed algorithm significantly reduces the MSE and bit rate when compared to a standard block matching algorithm.
For a long time, most codes for moving images have been of the hybrid DCT type. However, the use of DCT as a method to code notion compensated difference images is largely based on tradition rather than proof. DCT proved to be nearly optimal for a certain kind of statistics that matches the real statistics of images rather well. But motion compensated difference images do not fail under this category, as their local spatial correlation is much lower. Therefore, there is no solid reason to use DCT for coding them, other than the fact that DCT is well known and that hardware for it is readily available. In this paper, we propose a straightforward spatial domain way of coding images, here called optimal level allocation. It proves to be suitable for both intra and inter images.
In most motion compensated video coding schemes, the compression factor deteriorates very rapidly as the signal to noise ratio (SNR) of the decoded data stream is increased. In this research, we propose an alternative approach to block based motion compensation in which both a dense motion vector field and the compensation residual are coded at different resolutions. The new approach offers several advantages over some of the well known compression techniques such as the MPEG. First, since the signal is coded at different resolutions, a low resolution signal can be extracted from the data stream without decoding the fine resolution signal. This scalable feature offers real time decoding by a wide range of different receivers with limited computational capabilities and fast forward preview at a full frame rate. Second, for moderately low SNR range, we obtain much better compression than is possible with the conventional techniques.
We propose a spatio-amplitude scalable extension to the MPEG-2 algorithm for coding 10-bit video. The proposed scalable scheme utilizes the existing MPEG-2 tools and VLC tables. It has a base and an enhancement layer, where both layers employ 8-bit DCT/IDCT. In the proposed scheme, an 8-bit version of a given 10-bit video is coded in the base layer; the difference between the given 10-bit video and its prediction formed from the decoded 8-bit base layer data (and optionally from the decoded 10-bit data) is coded in the enhancement layer. Our results show that if a given bitrate is properly allocated between the base and enhancement layers, the 10-bit amplitude scalable scheme can significantly outperform the usual 8-bit scheme operated at the given bitrate.
Studio-quality video cassette recorders used for editing need intraframe compression schemes that can compress/decompress an image many times without appreciable degradation. They also require fast-forward playback of a low-quality image using only a small part of the compressed data. A system with these features is reported, based on image segmentation. A segmented approximation to the original image is losslessly encoded and a residual image is created. The lossless coding scheme is motivated by examining the constraints caused by channel errors and trick play modes. The residual image is coded by a conventional data compression technique. By preserving the main image features with a lossless code, the multi- generation performance is stabilized. Because segmentation is shift-invariant, edits that include x-y shifting between generations do not degrade performance. This paper reports a new segmentation algorithm. It has high resolution, obtained in part by a newly designed edgeness measurement. Using this, images are segmented to pixel accuracy by an algorithm which is only O[n] in complexity. Multi-generation simulations of sequences with 4:1 compression ratios are presented at both normal and shuttle speeds, using both sub-band and DCT residual coders.
A novel approach to video coding at very low bitrates is presented, which differs significantly from most of previous approaches, as it uses a spline-like interpolation scheme in a spatiotemporal domain. This operator is applied to a non-uniform 3D grid (built on sets of consecutive frames) so as to allocate the information adaptively. The proposed method allows a full exploitation of intra/inter-frame correlations and a good objective and visual quality of the reconstructed sequence.
Keywords: image processing; trilinear interpolation; 3D splitting; splines; adaptive approximation; image data compression; video coding.
This paper presents a new motion compensated subband video coding algorithm with scene adaptive motion interpolation. The work builds on temporal segmentation for determining the reference frame positions, and multi-resolution motion estimation in the subband domain. In the proposed approach, the reference frames for motion estimation are adaptively selected using the temporal segmentation of the lowest spatial subband. Motion compensation is used after subband filtering because it produces better performance than subband filtering after motion compensation. The proposed scene adaptive scheme, temporally adaptive motion interpolation (TAMI), determines the number and the positions of the reference frames for motion estimation using two types of temporal segmentation algorithms. The input video is split into the 7 spatial subbands by using a pair of low-pass and high-pass biorthogonal filters, and the TAMI algorithm is applied on the lowest of the subbands. Motion vectors for each subband are generated by a hierarchical motion estimation approach. Block-wise DPCM and a uniform quantizer are used only for the lowest subband of an intra frame, and all the other subbands are coded by PCM with a dead-zone quantizer. Simulation results show that the scene adaptive scheme compares favorably with the fixed interpolation structure.