The difficulty of parallelizing entropy coding is increasingly limiting the data throughputs achievable in media compression. In this work we analyze what are the fundamental limitations, using finite-state-machine models for identifying the best manner of separating tasks that can be processed independently, while minimizing compression losses. This analysis confirms previous works showing that effective parallelization is feasible only if the compressed data is organized in a proper way, which is quite different from conventional formats. The proposed new formats exploit the fact that optimal compression is not affected by the arrangement of coded bits, but it goes further in exploiting the decreasing cost of data processing and memory. Additional advantages include the ability to use, within this framework, increasingly more complex data modeling techniques, and the freedom to mix different types of coding. We confirm the parallelization effectiveness using coding simulations that run on multi-core processors, and show how throughput scales with the number of cores, and analyze the additional bit-rate overhead.
Buffer or counter-based techniques are adequate for dealing with carry propagation in software implementations of arithmetic coding, but create problems in hardware implementations due to the difficulty of handling worst-case scenarios, defined by very long propagations. We propose a new technique for constraining the carry propagation, similar to “bit-stuffing,” but designed for encoders that generate data as bytes instead of individual bits, and is based on the fact that the encoder and decoder can maintain the same state, and both can identify the situations when it desired to limit carry propagation. The new technique adjusts the coding interval in a way that corresponds to coding an unused data symbol, but selected to minimize overhead. Our experimental results demonstrate that the loss in compression can be made very small using regular precision for arithmetic operations.
Light field (multi-view) displays are often designed to support horizontal parallax only (HPO) since this significantly
reduces complexity compared to full parallax, and is commonly assumed to only cause small losses
in 3D perceptual quality. In reality all HPO displays can produce severe geometric distortions because they
use different projections in the horizontal and vertical directions. These distortions depend on the observer's
position, and can only be eliminated in pre-defined viewing distances. In this paper we extend previous work on
the theoretical analysis of the problem to create tools to manage the problem, enabling creators of multi-view
3D content to keep the distortion within acceptable ranges for all objects in a 3D scene, and all expected viewing
positions. We present examples of simulated views of HPO displays, which demonstrate how the distortions can
affect visual appearance, and how they are managed.
From a signal processing perspective, we examine the main factors defining the visual quality of autostereoscopic
3-D displays, which are beginning to reproduce the plenoptic function with increasing accuracy. We propose using intuitive visual tools and ray-tracing simulations to gain insight into the signal processing aspects, and we demonstrate the advantages of analyzing what we call mixed spatial-angular spaces. With this approach we
are able to intuitively demonstrate some basic limitations of displays using anisotropic diffusers or lens arrays.
Furthermore, we propose new schemes for improved performance.
Part 7 of MPEG-21 entitled Digital Item Adaptation (DIA), is an emerging metadata standard defining protocols and descriptions enabling content adaptation for a wide variety of networks and terminals, with attention to format-independent mechanisms. The descriptions standardized in DIA provide a standardized interface not only to a variety of format-specific adaptation engines, but also to format-independent adaptation engines for scalable bit-streams. A fully format-independent engine contains a decision-taking module operating in a media-type and context independent manner, cascaded with a bit-stream adaptation module that models the bit-stream adaptation process as an XML transformation operating on a high-level syntax description of the bit-stream, with parameters derived from decisions taken. In this paper, we describe the DIA descriptions and underlying mechanisms that enable such fully format-independent scalable bit-stream adaptation. Further, a new model-based, compact and lightweight transformation language for scalable bit-streams is described for use in the bit-stream adaptation module. Fully format-independent adaptation mechanisms lead to universal adaptation engines that substantially reduce adoption costs for new media types and formats because the same delivery and adaptation infrastructure can be used for different types of scalable media, including proprietary and encrypted content.
It is possible to improve the features supported by devices with embedded systems by increasing the processor computing power, but this always results in higher costs, complexity, and power consumption. An interesting alternative is to use the growing networking infrastructures to do remote processing and visualization, with the embedded system mainly responsible for communications and user interaction. This enables devices to behave as if much more “intelligent” to users, at very low costs and power. In this article we explain how compression can make some of these solutions more bandwidth-efficient, enabling devices to simply decompress very rich graphical information and user interfaces that had been rendered elsewhere. The mixture of natural images and video with text, graphics, and animations simultaneously in the same frame is called compound video. We present a new method for compression of compound images and video, which is able to efficiently identify the different components during compression, and use an appropriate coding method. Our system uses lossless compression for graphics and text, and, on natural images and highly detailed parts, it uses lossy compression with dynamically varying quality. Since it was designed for embedded systems with very limited resources, and it has small executable size, and low complexity for classification, compression and decompression. Other compression methods (e.g., MPEG) can do the same, but are very inefficient for compound content. High-level graphics languages can be bandwidth-efficient, but are much less reliable (e.g., supporting Asian fonts), and are many orders of magnitude more complex. Numerical tests show the very significant gains in compression achieved by these systems.
Recently a methodology for representation and adaptation of arbitrary scalable bit-streams in a fully content non-specific manner has been proposed on the basis of a universal model for all scalable bit-streams called Scalable Structured Meta-formats (SSM). According to this model, elementary scalable bit-streams are naturally organized in a symmetric multi-dimensional logical structure. The model parameters for a specific bit-stream along with information guiding decision-making among possible adaptation choices are represented in a binary or XML descriptor to accompany the bit-stream flowing downstream. The capabilities and preferences of receiving terminals flow upstream and are also specified in binary or XML form to represent constraints that guide adaptation. By interpreting the descriptor and the constraint specifications, a universal adaptation engine sitting on a network node can adapt the content appropriately to suit the specified needs and preferences of recipients, without knowledge of the specifics of the content, its encoding and/or encryption. In this framework, different adaptation infrastructures are no longer needed for different types of scalable media. In this work, we show how this framework can be used to adapt fully scalable video bit-streams, specifically ones obtained by the fully scalable MC-EZBC video coding system. MC-EZBC uses a 3-D subband/wavelet transform that exploits correlation by filtering along motion trajectories, to obtain a 3-dimensional scalable bit-stream combining temporal, spatial and SNR scalability in a compact bit-stream. Several adaptation use cases are presented to demonstrate the flexibility and advantages of a fully scalable video bit-stream when used in conjunction with a network adaptation engine for transmission.
The radio plays a song that you like but that you do not recognize. How do you find the title and the artist? Previous approaches to finding a song in a database are based on pattern recognition. In some of the previous work features are extracted from a hummed song and decision rules are used to retrieve probable candidates from the database. Feature matching has not resulted in reliable searches from microphone samples. In this work, to find the song, we process a short, microphone recorded sample from it. Both a feature vector and a signal are precomputed for each song in a database and also extracted from the recording. The database songs are first sorted by feature distance to the recording. Then, normalized cross-correlation, even though nonlinear, is applied using overlap-save FFT convolution. A decision rule presents likely matches to the user for confirmation but controls the number of false alarms shown. This system, tested using hundreds of recordings, is reliable because signals are matched. The addition of the feature-ordered search and the decision rule result in database searches five times faster than signal matching alone.
This paper motivates and develops an end-to-end methodology for representation and adaptation of arbitrary scalable content in a fully content non-specific manner. Scalable bit-streams are naturally organized in a symmetric multi-dimensional logical structure, and any adaptation is essentially a downward manipulation of this structure. Higher logical constructs are defined on top of this multi-tier structure to make the model more generally applicable to a variety of bit-streams involving rich media. The resultant composite model is referred to as the Structured Scalable Meta-format (SSM). Apart from the implicit bit-stream constraints that must be satisfied to make a scalable bit-stream SSM-compliant, two other elements that need to be formalized to build a complete adaptation and delivery infrastructure based on SSM are: a binary or XML description of the structure of the bit-stream resource and how it is to be manipulated to obtain various adapted versions; and a binary of XML specification of outbound constraints derived from capabilities and preferences of receiving terminals. By interpreting the descriptor and the constraint specifications, a universal adaptation engine can adapt the content appropriately to suit the specified needs and preferences of recipients, without knowledge of the specifics of the content, its encoding and/or encryption. With universal adaptation engines, different adaptation infrastructures are no longer needed for different types of scalable media.
High quality video compression is necessary for reduction of transmission bandwidth and in archiving applications. We propose a compression scheme which, depending on the available bandwidth, can vary from lossless compression to lossy compression, but always with guaranteed quality. In the case of lossless compression, the customer receives the original content without any loss. Even the lower compression ratios obtained with lossless compression can represent significant savings in the communication bandwidth. In the case of lossy compression, the maximum error between recovered and the original video is mathematically bounded. The amount of compression achieved is a function of the error bounds. Furthermore, errors are statistically independent from the video content, and thus guaranteed not to create any type of artifacts. So the recovered video has the same quality, visually indistinguishable from the original, at all times and all motion conditions.
We propose a new low-complexity entropy-coding method to be used for coding waveform signals. It is based on the combination of two schemes: (1) an alphabet partitioning method to reduce the complexity of the entropy-coding process; (2) a new recursive set partitioning entropy-coding process that achieves rates smaller than first order entropy even with fast Huffman adaptive codecs. Numerical results with its application for lossy and lossless image compression show the efficacy of the new method, comparable to the best known methods.
Wavelet-based image compression is proving to be a very effective technique for medical images, giving significantly better results than the JPEG algorithm. A novel scheme for encoding wavelet coefficients, termed set partitioning in hierarchical trees, has recently been proposed and yields significantly better compression than more standard methods. We report the results of experiments comparing such coding to more conventional wavelet compression and to JPEG compression on several types of medical images.
In this paper a new image transformation suited for reversible (lossless) image compression is presented. It uses a simple pyramid multiresolution scheme which is enhanced via predictive coding. The new transformation is similar to the subband decomposition, but it uses only integer operations. The number of bits required to represent the transformed image is kept small through careful scaling and truncations. The lossless coding compression rates are smaller than those obtained with predictive coding of equivalent complexity. It is also shown that the new transform can be effectively used, with the same coding algorithm, for both lossless and lossy compression. When used for lossy compression, its rate-distortion function is comparable to other efficient lossy compression methods.