The latest video compression standard, high-efficiency video coding (HEVC), provides quad-tree structures of coding units (CUs) and four coding tree depths to facilitate coding efficiency. The HEVC encoder considerably increases the computational complexity to levels inappropriate for video applications of power-constrained devices. This work, therefore, proposes a complexity control method for the low-delay P-frame configuration of the HEVC encoder. The complexity control mechanism is among the group of pictures layer, frame layer, and CU layer, and each coding layer provides a distinct method for complexity allocation. Furthermore, the steps in the prediction unit encoding procedure are reordered. By allocating the complexity to each coding layer of HEVC, the proposed method can simultaneously satisfy the entire complexity constraint (ECC) for entire sequence encoding and the instant complexity constraint (ICC) for each frame during real-time encoding. Experimental results showed that as the target complexity under both the ECC and ICC was reduced to 80% and 60%, respectively, the decrease in the average Bjøntegaard delta peak signal-to-noise ratio was ∼0.1 dB with an increase of 1.9% in the Bjøntegaard delta rate, and the complexity control error was ∼4.3% under the ECC and 4.3% under the ICC.
The new video coding standard, high-efficiency video coding, adopts a quadtree structure to provide variable transform sizes in the transform coding process. The heuristic examination of transform unit (TU) modes substantially increases the computational complexity, compared to previous video coding standards. Thus, efficiently reducing the TU candidate modes is crucial. In the proposed similarity-check scheme, sub-TU blocks are categorized into a strongly similar case or a weakly similar case, and the early TU termination or early TU splitting procedure is performed. For the strongly similar case, a property called zero-block inheritance combined with a zero-block detection technique is applied to terminate the TU search process early. For the weakly similar case, the gradients of residuals representing the similarity of coefficients are used to skip the current TU mode or stop the TU splitting process. In particular, the computation time is further reduced because all the required information for the proposed mode decision criteria is derived before performing the transform coding. The experimental results revealed that the proposed algorithm can save ~64% of the TU encoding time on average in the interprediction, with a negligible rate-distortion loss.
Down-sampling coding, which sub-samples the image and encodes the smaller sized images, is one of the solutions to
raise the image quality at insufficiently high rates. In this work, we propose an Adaptive Down-Sampling (ADS) coding
for H.264/AVC. The overall system distortion can be analyzed as the sum of the down-sampling distortion and the
coding distortion. The down-sampling distortion is mainly the loss of the high frequency components that is highly
dependent of the spatial difference. The coding distortion can be derived from the classical Rate-Distortion theory. For a
given rate and a video sequence, the optimum down-sampling
resolution-ratio can be derived by utilizing the optimum
theory toward minimizing the system distortion based on the models of the two distortions. This optimal resolution-ratio
is used in both down-sampling and up-sampling processes in ADS coding scheme. As a result, the rate-distortion
performance of ADS coding is always higher than the fixed ratio coding or H.264/AVC by 2 to 4 dB at low to medium
Media encryption technologies actively play the first line of defense in securing the access of multimedia data. Traditional cryptographic encryption can achieve provable security but is unfortunately sensitive to a single bit error, which will cause an unreliable packet to be dropped to create packet loss. In order to achieve robust
media encryption, error resilience in media encryption can be treated to be equivalent to error resilience in media transmission. This study proposes an embedded block hash searching scheme at the decoder side to achieve motion estimation and recover the lost packets, while maintaining format compliance and cryptographic provable security. It is important to note that the proposed framework is a kind of joint error-resilient video transmission/encryption and copyright protection.
In this paper, a selective weighting method is used for data embedding to achieve blind watermark detection. In the proposed system, block polarity and activity index modulation are used for the selective weighting. The block polarity is determined based on the number of coefficients that are larger than the median value. The block activity index is the pseudo-quantized block activity that is represented by the sum of absolute differences (SAD) of each coefficient to the median value. The block activity index modulation is performed based on the XOR operation of the randomized watermark and the randomized wavelet blocks polarity. In the block activity index modulation, if any coefficient is located very close to the median, it is vulnerable to attacks because its polarity can easily be changed. In such cases, the coefficient is forced to shift, by the just-noticeable-difference (JND) amount, toward the positive or negative end to enhance the robustness. The watermark embedding is actually performed by the activity index modulation that will modify each coefficient value by a small amount to force the activity to be quantized into a specific region. Simulation results show that the proposed method performs extremely well for Checkmark with non-geometric attacks, such as linear filtering, remodulation, denoising, and compression. The proposed scheme is also robust against image cropping, downsampling, rotation, and columns removal attacks.
The watermarking methods resistant to geometric attacks can be divided into three categories: the first category is to embed the watermark into the geometric invariant domain, the second category proposed to use template or insert periodic watermark pattern for the re-synchronization purpose, and the third category is called “feature-based watermarking scheme” in which the feature points detected in the original image are used to form local regions for both embedding and detection. However, the major weakness is their limited resistance to both extensive geometric distortions and watermark-estimation attack (WEA). In view of this, we propose a mesh-based content-dependent image watermarking method that can withstand geometric distortions and WEA. Because the first category is restricted to be affine invariant and the periodic patterns are easily removed in the second category, we have investigated to find that the third category seems to be the best choice. Our method is mainly composed of three components: (i) robust mesh generation and mesh-based embedding for resisting geometric distortions; (ii) improvement of fidelity using modified Noise Visibility Function (NVF); and (iii) construction of hash-based content-dependent watermark (CDW) for resisting WEA. Experimental results obtained from standard benchmark confirm the robustness of our method.
The objective of this paper is to develop a robust error-resilient algorithm, called the Synchronous Backward Error Tracking (SBET), to completely terminate the error propagation effects in the error-prone environment for H.264 video coding. The motivation is that if the state of the decoder is available to the encoder, i.e., the state of the encoder can synchronize to the state of the decoder, the effect of error propagation can be entirely terminated because all predictions are based on the same references. Therefore, we assume that a feedback channel is available and the encoder can be aware of the decoder's error concealment by any external means. The pixel-based Precise Backward Error Tracking (PBET) is modified and utilized to track the error locations and reconstruct the state of the decoder in the encoder. The proposed method only involves memory access, simple addition and multiplication operations for the error-contaminated pixels to achieve encoder-decoder synchronization. By observing simulation results, the rate-distortion performance of the proposed algorithm is always better than that of the conventional algorithms. Specifically, SBET outperforms PBET up to 1.21 dB under 3% slice error rate for the QCIF format Foreman sequence. In addition, instead of forced INTRA refreshing, the phenomenon of burst bit rate can be avoided.
In this paper, we present a novel energy compaction method, the selective block reordering, which is used with SPIHT (SBR-SPIHT) coding for low rate video coding to enhance the coding efficiency for motion-compensated residuals. The inter-frame coding basically includes three major parts - motion estimation, motion compensation, and motion-compensated residual coding. The motion estimation and overlapped block motion compensation (OBMC) methods of H.263 are used to reduce the temporal redundancy. The motion-compensated residuals are encoded in the wavelet domain. The block-mapping reorganization utilizes the wavelet zerotree relationship that jointly presents the
wavelet coefficients from the lowest subband to high frequency subbands at the same spatial location, and allocates each wavelet tree with all descendents to form a wavelet block. The block reordering based on the threshold scan rearranges the significant blocks in the descending order of the energy. Then, the block reordering technique reorders the wavelet sub-blocks recrusively, according to the energy of each sub-block, to yield the maximum energy
compaction that allows the SPIHT coding to operate efficiently on the motion-compensated residuals. Simulation results demonstrate that SBR-SPIHT outperforms H.263 by 1.28~0.69 dB on average for various video sequences at very low bit-rates, ranging from 48 to 10 kbps.
In this paper, we propose a key-based video watermarking system in which the watermark embedding and the video encoding are processed at the same time on MPEG-2. Since the watermark information would propagate to inter-frames through the motion compensated coding, the watermark is embedded in a single intra-frame but can be extracted from all frames in the same group of pictures (GOP). The watermark is embedded in the low frequency DCT coefficients of the intra-frames based on the block polarity. The block polarity is Tri-state Exclusive-Or (TXOR) with the watermark to generate the secret key, which labels the block locations of the embedded watermark. In the decoding end, the block polarity over a GOP is calculated by a weighted voting procedure according to the frame weighting. Finally, the watermark over a GOP can be obtained by TXOR operation of the key and the block polarity. The simulation results show that the system has great imperceptibility that the PSNRs of the watermarked frames are almost the same as the un-watermarked ones and more accurate normalized correlation (NC) can be obtained as well.
This paper presents an error resilient H.263 video compression system over noisy channels. We develop a video segment regulation algorithm at the decoder to efficiently identify and correct erroneous start codes and block addresses. In addition, a parity-embedded error detection technique is also implemented to enhance the error detection capability of the decoder at the macroblock-layer. After performing above two approaches, the decoder can report the accurate addresses of detected corrupt blocks back to the encoder via a feedback channel. With these negative acknowledgments, the precise error tracking algorithm is developed at the encoder to precisely calculate and trace the propagated errors for INTRA refreshing the contaminated blocks. Simulation results show that the proposed system yields significant video quality improvements over the motion compensated concealment by PSNR gains of 4 to 6 dB at bit rate around 32 kbps in error-prone DECT environments. In particular, this system complies with the H.263 standard and has the advantages of low memory requirement and computation complexity that are suitable for practical real-time implementation.
In this paper, we propose an image watermarking system that is highly robust against various attacks without perceivable image degradation. The cover image is first discrete wavelet transformed (DWT), and then the low and middle subbands are divided into wavelet blocks. A selective watermark embedding method is used in which a DWT block is chosen for watermark embedding only when its coefficients clearly indicate the block polarity. Instead of the original image, a key is used in the watermark extraction to indicate the locations where watermark bits are embedded. The key is generated by a Tri-state Exclusive OR (TXOR) operation on the randomized watermark and the randomized DWT coefficients of the original image. Finally, a deadzone evacuation procedure is performed to ensure an adequate noise margin. If a DWT coefficient is very close to the polarity threshold, e.g., the median, then it will be forced to shift to the positive or the negative end of the deadzone depending on its polarity. Simulation results show that the key method proposed herein achieves excellent performance for Checkmark non-geometric attacks, such as filtering, compression, and copy attacks. The proposed scheme is also robust for image cropping at different positions.
The objective of this work is to reconstruct high quality gray-level images from bi-level halftone images. We develop optimal inverse halftoning methods for several commonly used halftone techniques, which include dispersed-dot ordered dither, clustered-dot ordered dither, and error diffusion. At first, the least-mean-square (LMS) adaptive filtering algorithm is applied in the training of inverse halftone filters and the optimal mask shapes are computed for various halftone techniques. In the next step, we further reduce the computational complexity by using lookup tables designed by the minimum mean square error (MMSE) method. The optimal masks obtained from the LMS method are used as the default filter masks. Finally, we propose the enhanced MMSE inverse halftone algorithm. It normally uses the MMSE table lookup method for its fast speed. When an empty cell is referred, the LMS method is used to reconstruct the gray-level value. Consequently, the proposed method has the advantages of both excellent reconstructed quality and fast speed. In the experiments, the error diffusion yields the best reconstruction quality among all three halftone techniques.
We present an error-concealed embedded wavelet (ECEW) video coding system for transmission over Internet or wireless networks. This system consists of two types of frames: intra (I) frames and inter, or predicted (P), frames. Inter frames are constructed by the residual frames formed by variable block-size multiresolution motion estimation (MRME). Motion vectors are compressed by arithmetic coding. The image data of intra frames and residual frames are coded by error-resilient embedded zerotree wavelet (ER-EZW) coding. The ER-EZW coding partitions the wavelet coefficients into several groups and each group is coded independently. Therefore, the error propagation effect resulting from an error is only confined in a group. In EZW coding any single error may result in a totally undecodable bitstream. To further reduce the error damage, we use the error concealment at the decoding end. In intra frames, the erroneous wavelet coefficients are replaced by neighbors. In inter frames, erroneous blocks of wavelet coefficients are replaced by data from the previous frame. Simulations show that the performance of ECEW is superior to ECEW without error concealment by 7 to approximately 8 dB at the error-rate of 10-3 in intra frames. The improvement still has 2 to approximately 3 dB at a higher error-rate of 10-2 in inter frames.
WE present a robust real-time video coding scheme that complies with the H.263 standard. By utilizing a feedback channel, the corrupted macroblocks (MBs) due to transmission errors are accurately evaluated and precisely tracked in the encoder. Without dependency trees wide-spanning to unnecessary areas, the error propagation effects are terminated completely by INTRA refreshing the affected MBs. Our simulations show significant video quality improvements in error prone environments.
The objective of this work is to reconstruct high quality gray-level images from halftone images, or the inverse halftoning process. We develop high performance halftone reconstruction methods for several commonly used halftone techniques. For better reconstruction quality, image classification based on halftone techniques is placed before the reconstruction process so that the halftone reconstruction process can be fine tuned for each halftone technique. The classification is based on enhanced 1D correlation of halftone images and processed with a three- layer back propagation neural network. This classification method reached 100 percent accuracy with a limited set of images processed by dispersed-dot ordered dithering, clustered-dot ordered dithering, constrained average, and error diffusion methods in our experiments. For image reconstruction, we apply the least-mean-square adaptive filtering algorithm which intends to discover the optimal filter weights and the mask shapes. As a result, it yields very good reconstruction image quality. The error diffusion yields the best reconstructed quality among the halftone methods. In addition, the LMS method generates optimal image masks which are significantly different for each halftone method. These optimal masks can also be applied to more sophisticated reconstruction methods as the default filter masks.
The error propagation caused by cell loss in MPEG video over ATM network may seriously deteriorate the video quality and reduce the error concealment effect. We present an efficient block interleaving and error concealment method for burst cell loss. At the transmitter, video information is interleaved and then packetized separately to reduce error damage. At the receiver, effective error concealment techniques are used for I, P, B frames respectively. Simulation results show satisfactory performance.
Packet-switching based video conferencing has emerged as one of the most important multimedia applications. Lip synchronization can be disrupted in the packet network as the result of the network properties: packet delay jitters at the capture end, network delay jitters, packet loss, packet arrived out of sequence, local clock mismatch, and video playback overlay with the graphic system. The synchronization problem become more demanding as the real time and multiparty requirement of the video conferencing application. Some of the above mentioned problem can be solved in the more advanced network architecture as ATM having promised. This paper will present some of the solutions to the problems that can be useful at the end station terminals in the massively deployed packet switching network today. The playback scheme in the end station will consist of two units: compression domain buffer management unit and the pixel domain buffer management unit. The pixel domain buffer management unit is responsible for removing the annoying frame shearing effect in the display. The compression domain buffer management unit is responsible for parsing the incoming packets for identifying the complete data blocks in the compressed data stream which can be decoded independently. The compression domain buffer management unit is also responsible for concealing the effects of clock mismatch, lip synchronization, and packet loss, out of sequence, and network jitters. This scheme can also be applied to the multiparty teleconferencing environment. Some of the schemes presented in this paper have been implemented in the Multiparty Multimedia Teleconferencing (MMT) system prototype at the IBM watson research center.