Significant research has been performed on the use of directional transforms for the compression of still imagery, in
particular on the application of block-based and segmentation driven directional adaptive discrete wavelet transforms.
However, all of the proposed methodologies suffer from the extra side-information that needs to be encoded. This
encoding overhead and added complexity is unfortunately not negligible. This paper describes various considerations
and trade-offs that were made during the search towards a practical solution for using directional adaptive transforms in
still image coding. We propose two codec instantiations respectively based upon quadtree-coding (QT-L) and
JPEG 2000's EBCOT engine and discuss various experimental results.
The H.264/AVC video coding standard currently represents the state-of-the-art in video compression technology. The
initial version of the standard only supported a single quantization step size for all the coefficients in a transformed
block. Later, support for custom quantization tables was added, which allows to independently specify the quantization
step size for each coefficient in a transformed block. In this way, different quantization can be applied to the highfrequency
and low-frequency coefficients, reflecting the human visual system's different sensitivity to high-frequency
and low-frequency spatial variations in the signal.
In this paper, we design custom quantization tables taking into account the properties of the human visual system as well
as the viewing conditions. Our proposed design is based on a model for the human visual system's contrast sensitivity
function, which specifies the contrast sensitivity in function of the spatial frequency of the signal. By calculating the
spatial frequencies corresponding to each of the transform's basis functions, taking into account viewing distance and dot
pitch of the screen, the sensitivity of the human visual system to variations in the transform coefficient corresponding to
each basis function can be determined and used to define the corresponding quantization step size. Experimental results,
whereby the video quality is measured using VQM, show that the designed quantization tables yield improved
performance compared to uniform quantization and to the default quantization tables provided as a part of the reference
In this paper, we examine the rate-distortion performance in terms of perceptual quality of JPEG XR (ISO/IEC 29199-2 |
ITU-T Rec. T.832)1 and earlier standardized image compression algorithms such as JPEG (ISO/IEC 10918-1 | ITU-T
Rec. T.81)2 and JPEG 2000 (ISO/IEC 15444-1 | ITU-T Rec. T.800)3. Unfortunately, objective visual quality metrics (like
MSE, PSNR, VQM, SSIM, etc.) do not always correlate well with the actual perceived image quality. In some specific
cases, it is even possible that certain visible coding artifacts remain undetectable by these objective visual quality tests.
As such, we conducted a series of subjective visual quality assessment tests to measure the visual performance of JPEG
XR, JPEG 2000 and JPEG. This paper describes the design of the subjective visual quality assessment experiments,
addressing the encountered difficulties and potential pitfalls. Our results indicate that for high bit-rates (i.e. more than 1
bpp) all three codecs more or less have an equal overall performance. However, as expected, at lower bit-rates JPEG
performs significantly weaker for every tested image than JPEG 2000 and JPEG XR. On the other hand, both JPEG 2000
and JPEG XR appear to be very competitive at these low bit-rate ranges. Only for specific image content types (e.g.
smooth gradient surfaces), JPEG XR appears to have some difficulties. Nevertheless, discarding the fact that JPEG 2000
offers more functionality features than JPEG XR, the latter performed very good for most images and almost in par with
JPEG 2000. As a conclusion, the results of the subjective visual quality assessment tests show that JPEG XR
successfully passed our verification experiments for low dynamic range imagery.
The rapid dissemination of media technologies has lead to an increase of unauthorized copying and distribution
of digital media. Digital watermarking, i.e. embedding information in the multimedia signal in a robust and
imperceptible manner, can tackle this problem. Recently, there has been a huge growth in the number of
different terminals and connections that can be used to consume multimedia. To tackle the resulting distribution
challenges, scalable coding is often employed. Scalable coding allows the adaptation of a single bit-stream to
varying terminal and transmission characteristics. As a result of this evolution, watermarking techniques that
are robust against scalable compression become essential in order to control illegal copying. In this paper, a
watermarking technique resilient against scalable video compression using the state-of-the-art H.264/SVC codec
is therefore proposed and evaluated.
In medical networked applications, the server-generated application view, consisting of medical image content and
synthetic text/GUI elements, must be compressed and transmitted to the client. To adapt to the local content
characteristics, the application view is divided into rectangular patches, which are classified into content classes: medical
image patches, synthetic image patches consisting of text on a uniform/natural/medical image background and synthetic
image patches consisting of GUI elements on a uniform/natural/medical image background. Each patch is thereafter
compressed using a technique yielding perceptually optimal performance for the identified content class. The goal of this
paper is to identify this optimal technique, given a set of candidate schemes. For this purpose, a simulation framework is
used which simulates different types of compression and measures the perceived differences between the compressed
and original images, taking into account the display characteristics. In a first experiment, JPEG is used to code all
patches and the optimal chroma subsampling and quantization parameters are derived for different content classes. The
results show that 4:4:4 chroma subsampling is the best choice, regardless of the content type. Furthermore, frequency
dependant quantization yields better compression performance than uniform quantization, except for content containing a
significant number of very sharp edges. In a second experiment, each patch can be coded using JPEG, JPEG XR or JPEG
2000. On average, JPEG 2000 outperforms JPEG and JPEG XR for most medical images and for patches containing text.
However, for histopathology or tissue patches and for patches containing GUI elements, classical JPEG compression
outperforms the other two techniques.
The Joint Photographic Experts Group (JPEG) committee is a joint working group of the International Standardization
Organization (ISO) and the International Electrotechnical Commission (IEC). The word "Joint" in JPEG however does
not refer to the joint efforts of ISO and IEC, but to the fact that the JPEG activities are the result of an additional
collaboration with the International Telecommunication Union (ITU). Inspired by technology and market evolutions, i.e.
the advent of wavelet technology and need for additional functionality such as scalability, the JPEG committee launched
in 1997 a new standardization process that would result in 2000 in a new standard: JPEG 2000. JPEG 2000 is a
collection of standard parts, which together shape the complete toolset. Currently, the JPEG 2000 standard is composed
out of 13 parts. In this paper, we review these parts and additionally address recent standardization initiatives within the
JPEG committee such as JPSearch, JPEG-XR and AIC.
In the past decade the use of digital data has increased significantly. The advantages of digital data are, amongst others, easy editing, fast, cheap and cross-platform distribution and compact storage. The most crucial disadvantages are the unauthorized copying and copyright issues, by which authors and license holders can suffer
considerable financial losses. Many inexpensive methods are readily available for editing digital data and, unlike analog information, the reproduction in the digital case is simple and robust. Hence, there is great interest in developing technology that helps to protect the integrity of a digital work and the copyrights of its owners. Watermarking, which is the embedding of a signal (known as the watermark) into the original digital data, is one method that has been proposed for the protection of digital media elements such as audio, video and images. In this article, we examine watermarking schemes for still images, based on selective quantization of the coefficients of a wavelet transformed image, i.e. sparse quantization-index modulation (QIM) watermarking. Different grouping schemes for the wavelet coefficients are evaluated and experimentally verified for robustness against several attacks. Wavelet tree-based grouping schemes yield a slightly improved performance over block-based
grouping schemes. Additionally, the impact of the deployment of error correction codes on the most promising configurations is examined. The utilization of BCH-codes (Bose, Ray-Chaudhuri, Hocquenghem) results in an improved robustness as long as the capacity of the error codes is not exceeded (cliff-effect).
Using the scalable video coding (SVC) extension of the H.264/AVC video coding standard, encoding a video sequence
yields a quality, resolution, and frame-rate scalable bit-stream. This means that a version of the video sequence with a
lower resolution, quality and/or frame rate can be obtained by extracting selected parts of the scalable bit-stream, without
the need for re-encoding. In this way, easy adaptation of the video material to the end-users' device (computational
power and display resolution) and channel characteristics is possible. In this paper, the use of unequal error protection
(UEP) for error-resilient transmission of H.264/SVC-encoded video sequences is studied. By using unequal error
protection, graceful degradation of the video quality is achieved when the targeted packet loss probability is exceeded. In
contrast, using equal error protection (EEP), an immediate and dramatic drop in quality is observed under these
In this paper, a novel compressed-domain motion detection technique, operating on MPEG-2-encoded video, is
combined with H.264 flexible macroblock ordering (FMO) to achieve efficient, error-resilient MPEG-2 to H.264
transcoding. The proposed motion detection technique first extracts the motion information from the MPEG-2-encoded
bit-stream. Starting from this information, moving regions are detected using a region growing approach. The
macroblocks in these moving regions are subsequently encoded separately from those in background regions using FMO.
This can be used to increase error resilience and/or to realize additional bit-rate savings compared to traditional
In this paper, two systems for low-complexity MPEG-2 to H.264 transcoding are presented. Both approaches reuse the
MPEG-2 motion information in order to avoid computationally expensive H.264 motion estimation. In the first approach,
inter- and intra-coded macroblocks are treated separately. Since H.264 applies intra-prediction, while MPEG-2 does not,
intra-blocks are completely decoded and re-encoded. For inter-coded macroblocks, the MPEG-2 macroblock types and
motion vectors are first converted to their H.264 equivalents. Thereafter, the quantized DCT coefficients of the
prediction residuals are dequantized and translated to equivalent H.264 IT coefficients using a single-step DCT-to-IT
transform. The H.264 quantization of the IT coefficients is steered by a rate-control algorithm enforcing a constant bit-rate.
While this system is computationally very efficient, it suffers from encoder-decoder drift due to its open-loop
The second transcoding solution eliminates encoder-decoder drift by performing full MPEG-2 decoding followed by
rate-controlled H.264 encoding using the motion information present in the MPEG-2 source material. This closed-loop
solution additionally allows dyadic resolution scaling by performing downscaling after the MPEG-2 decoding and
appropriate MPEG-2 to H.264 macroblock type and motion vector conversion.
Experimental results show that, in terms of PSNR, the closed-loop transcoder significantly outperforms the open-loop
solution. The latter introduces drift, mainly as a result of the difference in sub-pixel interpolation between H.264 and
MPEG-2. Complexity-wise, the closed-loop transcoder requires on average 30 % more processing time than the openloop
system. The closed-loop transcoder is shown to deliver compression performance comparable to standard MPEG-2
Modern video coding applications require transmission of video data over variable-bandwidth channels to a variety of terminals with different screen resolutions and available computational power. Scalable video coding is needed to optimally support these applications. Recently proposed wavelet-based video codecs employing spatial domain motion compensated temporal filtering (SDMCTF) provide quality, resolution and frame-rate scalability while delivering compression performance comparable to that of the state-of-the-art non-scalable H.264-codec. These codecs require scalable coding of the motion vectors in order to support a large range of bit-rates with optimal compression efficiency. Scalable motion vector coding algorithms based on the integer wavelet transform followed by embedded coding of the wavelet coefficients were recently proposed. In this paper, a new and fundamentally different scalable motion vector codec (MVC) using median-based motion vector prediction is proposed. Extensive experimental results demonstrate that the proposed MVC systematically outperforms the wavelet-based state-of-the-art solutions. To be able to take advantage of the proposed scalable MVC, a rate allocation mechanism capable of optimally dividing the available rate among texture and motion information is required. Two rate allocation strategies are proposed and compared. The proposed MVC and rate allocation schemes are incorporated into an SDMCTF-based video codec and the benefits of scalable motion vector coding are experimentally demonstrated.
Error protection and concealment of motion vectors are of prime concern when video is transmitted over variable-bandwidth error-prone channels, such as wireless channels. In this paper, we investigate the influence of corrupted motion vectors in video coding based on motion-compensated temporal filtering, and develop various error protection and concealment mechanisms for this class of codecs. The experimental results show that our proposed motion vector coding technique significantly increases the robustness against transmission errors and generates performance gains of up to 7 dB compared with the original coding technique at the cost of less than 4% in terms of rate. It is also shown that our proposed spatial error-concealment mechanism leads to additional performance gains of up to 4 dB.
Recently, scalable video codecs based upon motion compensated temporal filtering (MCTF) have received a lot of attention. These video coding schemes perform on par with H.264, the current state-of-the-art video coding standard, while providing quality, resolution and frame-rate scalability. In this paper we aim to evaluate the rate-distortion performance of MCTF-based codecs when applied to volumetric data sets. In our experiments the lossy coding efficiency of an MCTF-based codec is compared to that of the 3D QT-L codec, which represents the state-of-the-art in volumetric coding. The results show that the MCTF-based coder does not provide better PSNR performance than the 3D QT-L codec. However, if the rate spent to code the motion vector information is not taken into account, the performance of the MCTF-based codec at low rates is on par or better than the performance of the 3D QT-L. This leads us to conclude that possible distortion improvements obtained by using MCTF instead of a regular wavelet-transform in the temporal direction do not outweigh the extra rate needed to encode the motion vector information.
Video transmission over variable-bandwidth networks requires instantaneous bit-rate adaptation at the server site to provide an acceptable decoding quality. For this purpose, recent developments in video coding aim at providing a fully embedded bit-stream with seamless adaptation capabilities in bit-rate, frame-rate and resolution. A new promising technology in this context is wavelet-based video coding. Wavelets have already demonstrated their potential for quality and resolution scalability in still-image coding. This led to the investigation of various schemes for the compression of video, exploiting similar principles to generate embedded bit-streams. In this paper we present scalable wavelet-based
video-coding technology with competitive rate-distortion behavior compared to standardized non-scalable technology.
Recently, the JPEG2000 committee (ISO/IEC JTC1/SC29/WG1) decided to start up a new standardization activity to support the encoding of volumetric and floating-point data sets: Part 10 - Coding Volumetric and Floating-point Data (JP3D). This future standard will support functionalities like resolution and quality scalability and region-of-interest coding, while exploiting the entropy in the additional third dimension to improve the rate-distortion performance. In this paper, we give an overview of the markets and application areas targeted by JP3D, the imposed requirements and the considered algorithms with a specific focus on the realization of the region-of-interest functionality.
Techniques for full scalability with motion-compensated temporal filtering (MCTF) in the wavelet-domain (in-band) are presented in this paper. The application of MCTF in the wavelet domain is performed after the production of the overcomplete discrete wavelet transform from the critically-sampled decomposition, a process that occurs at both the encoder and decoder side. This process, which is a complete-to-overcomplete discrete wavelet transform, is critical for the efficiency of the system with respect to scalability, coding performance and complexity. We analyze these aspects of the system and set the necessary constraints for drift-free video coding with in-band MCTF. As a result, the proposed architecture permits the independent operation of MCTF within different resolution levels or even different subbands of the transform and allows the successive refinement of the video information in resolution, frame-rate and quality.
The increasing use of three-dimensional imaging modalities triggers the need for efficient techniques to transport and store the related volumetric data. Desired properties like quality and resolution scalability, region-of-interest coding, lossy-to-lossless coding and excellent rate-distortion characteristics for as well low as high bit-rates are inherently supported by wavelet-based compression tools. In this paper a new 3D wavelet-based compression engine is proposed and compared against a classical 3D JPEG-based coder and a state-of-the-art 3D SPIHT coder for different medical imaging modalities. Furthermore, we evaluate the performance of a selected set of lossless integer lifting kernels. We demonstrate that the performance of the proposed coder is superior for lossless coding, and competitive with 3D SPIHT at lower bit-rates.