Codecs such as H.264/AVC involve computationally intensive tasks that often prohibit the real-time implementation. It
has been observed that the complexity of such video encoders can be tuned gracefully to a desired level through the use
of a smaller set of macroblock types in mode decision and a lower motion vector precision in motion estimation. The
rate-distortion performance, however, will be affected consequently. In this paper, we propose a flexible syntax
mechanism (FSM) to tune the encoder complexity while maintaining a sufficient rate-distortion performance. The key
idea inherit in the proposed FSM consists of two folds: first is the specification at the higher level of the bitstream syntax
both the subset of macroblock types and the precision of motion vectors to be evaluated by the encoder, and second is
the redesign of the entropy coders accordingly to effectively represent the selected macroblock types and the motion
vectors. Since the entropy coding is optimized in terms of the bitrate consumption specifically for the subset of
macroblock modes and the motion vector precision, the rate-distortion performance will be enhanced compared to the
scenario where identical entropy codes are adopted regardless. Another advantage of our approach is the intrinsic
scalability in complexity for the application of video encoding under different complexity constraints. The proposed
approach may be considered for the next generation of video codecs with flexible complexity profiles.
In this paper, we investigate the efficacy of the weighted prediction feature provided within H.264/AVC video coding
standard for error resilient video streaming. Leaky prediction has been proposed for scalable and non-scalable video
coding to effectively combat transport errors. However, all prior results are based on non standard coding methods and
no results have been available on the effectiveness of leaky prediction that is supported by a video coding standard. The
weighted prediction feature in H.264/AVC was originally designed for improving the coding efficiency especially in
presence of fading in video sequences. This paper presents a performance analysis of H.264/AVC weighted prediction
feature that balances the trade-off between coding efficiency and error resilience. A theoretical analysis of rate-distortion
performance of leaky prediction is provided and closed-form rate-distortion functions are derived for the error free and
error drift scenarios. The theoretical results conform well to the operational results with respect to different choices of
the leaky factor.
Motion estimation is the most important step in the video compression. Most of the current video compression
systems use forward motion estimation, where motion information is derived at the encoder and sent to the
decoder over the channel. Backward motion estimation does not derive an explicit representation of motion
at the encoder. Instead, the encoder implicitly embeds the motion information in an alternative subspace.
Most recently, an algorithm that adopts least-square prediction (LSP) for backward motion estimation has
shown great potential to further improve coding efficiency. Forward motion estimation and backward motion
estimation have both their advantages and disadvantages. Each is suitable for handling some specific category of
patterns. In this paper, we propose a novel approach that combines both forward motion estimation and backward
motion estimation in one framework to adaptively exploit the local motion characteristics in an arbitrary video
sequence, thus achieving better coding efficiency. We refer to this as Content-Adaptive Motion Estimation
(CoME). The encoder in the proposed system is able to adjust the motion estimation method in a rate-distortion
optimized manner. According to the experimental results, CoME reduces the data rate in both lossless and lossy
A robust, invisible watermarking scheme is proposed for digital images, where the watermark is embedded using the block-based lapped orthogonal transform (LOT). The embedding process follows a spread spectrum watermarking approach. In contrast to the use of transforms such as discrete cosine transform, our LOT watermarking scheme allows larger watermark embedding energy while maintaining the same level of subjective invisibility. In particular, the use of LOT reduces block artifacts caused by the insertion of the watermark in a block-by-block manner, hence obtaining a better balance between invisibility and robustness. Moreover, we use a human visual system (HVS) model to adaptively adjust the energy of the watermark during embedding. In our HVS model, each block is categorized into one of four classes (texture, fine-texture, edge, and plain-area) by using a feature known as the texture masking energy. Blocks with edges are also classified according to the edge direction. The block classification is used to adjust the watermark embedding parameters for each block.
Leaky prediction layered video coding (LPLC) incorporates a scaled
version of the enhancement layer in the motion compensated prediction (MCP) loop, by using a leaky factor between 0 and 1, to
balance between coding efficiency and error resilience performance. In this paper, we address the theoretic analysis of LPLC using two different approaches: the one using rate distortion theory and the one using quantization noise modeling. In both approaches, an alternative block diagram of LPLC is first developed, which significantly simplifies the theoretic analysis. We consider two scenarios of LPLC, with and without prediction drift in the enhancement layer, and obtain two sets of rate distortion functions in closed form for both scenarios. We evaluate both closed form expressions, which are shown to conform with the operational results.
Low complexity video encoding shifts the computational complexity
from the encoder to the decoder, which is developed for applications characterized by scarce resources at the encoder. Wyner-Ziv and Slepian-Wolf theorems have provided the theoretic bases for low complexity video encoding. In this paper, we propose a low complexity video encoding using B-frame direct modes. We extend the direct-mode idea that was originally developed for encoding B frames, and design new B-frame direct modes. Motion vectors are obtained for B-frames at the decoder and transmitted back to the encoder using a feedback channel, hence no motion estimation is needed at the encoder to encoding any B frame. Experimental results implemented by modifying ITU-T H.26L software show that our approach can obtain a competitive rate distortion performance compared to that of conventional high complexity video encoding.
In some video applications such as video surveillance, a simple encoder is preferred and the computational intensive jobs shall be left to the decoder. It is pointed out in Wyner and Ziv’s paper that this goal is achievable by exploiting video source statistics only at the decoder. In many existing Wyner-Ziv video coding schemes, a lot of frames have to be intra coded so that the decoder can derive sufficiently accurate side information from the I frames. In this paper we present a new network-drive Wyner-Ziv method using forward prediction. The basic idea is to perform motion estimation at the decoder and send motion information back to the encoder through the feedback channel. We implement our approach by modifying the H.264 video codec JM8.0 with different configurations. The results show that our proposed approach can improve coding efficiency compared to other Wyner-Ziv video coding schemes.
In this paper, we propose a new region-based method for detecting mass tumors in digital mammograms. Our method uses principal component analysis (PCA) techniques to reduce the image data into a subspace with significantly reduced dimensionality using an optimal linear transformation. After the transformation, classification in the subspace is performed using a nearest neighbor classifier. We consider the detection of only mass abnormalities in this study. Micro calcifications, spiculated lesions, and other abnormalities are not considered. We implemented our method and achieved a 93% correct detection rate for mass abnormalities in our tests.
A robust, invisible watermarking scheme is proposed for digital images, where the watermark is embedded
using the block-based Lapped Orthogonal Transform (LOT). The embedding process follows a spread spectrum
watermarking approach. In contrast to the use of transforms such as DCT, our LOT watermarking scheme allows
larger watermark embedding energy while maintaining the same level of subjective invisibility. In particular, the
use of LOT reduces block artifacts caused by the insertion of the watermark in a block-by-block manner, hence
obtaining a better balance between invisibility and robustness. Moreover, we use a human visual system (HVS)
model to adaptively adjust the energy of the watermark during embedding. In our HVS model, each block is
categorized into one of four classes (texture, fine-texture, edge, and plain-area) by using a feature known as the
Texture Masking Energy (TME). Blocks with edges are also classified according to the edge direction. The block
classification is used to adjust the watermark embedding parameters for each block.
Leaky prediction layered video coding (LPLC) partially includes the enhancement layer in the motion compensated prediction loop, by using a leaky factor between 0 and 1, to balance the coding efficiency and error resilience performance. In this paper, rate distortion functions are derived for LPLC from rate distortion theory. Closed form expressions are obtained for two scenarios of LPLC, one where the enhancement layer stays intact and the other where the enhancement layer suffers from data rate truncation. The rate distortion performance of LPLC is then evaluated with respect to different choices of the leaky factor, demonstrating that the theoretical analysis well conforms with the operational results.
Two types of scalabilities exist in current scalable video streaming schemes: (1) nested scalability, in which different representations (i.e., descriptions) of each frame are generated using layered scalable coding and have to be decoded in a fixed sequential order, and (2) parallel scalability, which is used in multiple description coding (MDC) where different descriptions are mutually refinable and independently decodable. In this paper, we present a general framework that includes both scalabilities and demonstrate the similarity between the leaky prediction based layered coding and an MDC scheme that uses motion compensation. Based on this framework, we introduce nested scalability into each description of the MDC stream and propose a fine granularity scalability (FGS) based MDC approach. We also develop a scalable video coding structure that is characterized by the dual-leaky prediction to balance the trade-off between coding efficiency and the error resilience performance of the coded bit stream.