Light Field (LF) image/video data provides both spatial and angular information of scene but at the cost of tremendous data volume for their storage and transmission. At the moment, the MPEG Multi-view Video Coding (MVC) is one of promising compression solutions for LF video data, so it deserves much investigation for better prediction structure to effectively reduce the redundancy in LF video data. Several prediction structures have been investigated but only with limited experimental evaluations due to lack of dataset and non-identical testing configurations. This practical problem can be mitigated now by availability of new datasets and common test condition recently proposed by MPEG. As the first step for designing a good compression method for LF video data, in this paper, we evaluate the performance of existing prediction structures for MVC-based LF video coding methods following the MPEG common test condition and its dataset.
Motion blurs inevitably occur in an image photographing fast moving objects, and its removal, known as motion deblurring, is one of the most well-known ill-posed problems. In this paper, we investigate the deblurring problem of motion blurs by using a modulated external light. Noting that the motion blurs depend both on ambient light and modulated external light, we investigate how to design a motion deblurring method considering not only the external light but also the ambient light. The deblurring performance of the proposed method is compared to that by the conventional method which only considers the external modulated light.
In order to make a movable display projecting onto high-rise walls or arbitrary objects in physical space as a digital signage, a system, called a drone-projector, has been investigated with a beam projector mounted on a drone. The vibration during hovering caused by the motors of propellers in the drone brings distortion in the projected image. In this paper, we extend an existing sensor-based stabilization method by compensating different scaling of the projected image due to different distance of the drone-projector to the projected surface. Our experimental results show that the distortion of the projected image is made much attenuated by using the proposed stabilization method.
The Versatile Video Coding (VVC) is a new state-of-the art video compression technology that is being under standardization. It targets for about two times higher coding efficiency against the existing HEVC, supporting HD/UHD/8K video and high dynamic range (HDR) video. It also targets for versatilities such as screen content coding, adaptive resolution change, and independent sub-pictures. To develop an effective coding method for chroma intra prediction mode, in this paper, we investigate its binarization process in CABAC (context adaptive binary arithmetic coding) and test a method which assigns shorter bins to more frequent chroma intra modes and longer bins to the less frequent ones based on the chroma mode statistics.
High-quality depth estimation from light field (LF) image is an important and challenging task for which many algorithms have been developed so far. While compression is inevitably required in practice for LF data due to its huge data amount, most depth estimation methods have not yet paid sufficient attention to the effect of compression on it. In this paper, we investigate various LF depth estimation methods to design a LF compression method in the context of good depth estimation. By noting that building the data cost is a very first step in most depth estimation algorithms and the data cost computation has a great impact on eventual quality of the depth image, in this paper, we present an in-depth analysis of data cost computation in LF depth estimation problem in the context of compression. Our results show that the data cost building on Epipolar Plane Image (EPI) outperforms other tested methods in this paper and is more robust to compression.
In video-based light field coding, sub-aperture images (SAIs) are ordered to form a pseudo video sequence, and the sequence is encoded by a video compression algorithm, for example, by HEVC. When the size of SAI is not divisible by the minimum size of coding tree unit, proper boundary handling method is required. This paper investigates several boundary handling methods. To maintain high quality of the central SAIs, we combine rotation and u scans to have a new hybrid scan order. Random access configuration is used instead of low-delay one for better coding efficiency. The proposed methods are evaluated with the latest coding tool.
Limited computing resources in portable multimedia devices are an obstacle in real-time video decoding of high resolution and/or high quality video contents. Ordinary H.264/AVC video decoders cannot decode video contents that exceed the limits set by their processing resources. However, in many real applications especially on portable devices, a simplified decoding with some acceptable degradation may be desirable instead of just refusing to decode such contents. For this purpose, a complexity-scalable H.264/AVC video decoding scheme is investigated in this paper. First, several simplified methods of decoding tools that have different characteristics are investigated to reduce decoding complexity and consequential degradation of reconstructed video. Then a complexity scalable H.264/AVC decoding scheme is designed by selectively combining effective simplified methods to achieve the minimum degradation. Experimental results with the H.264/AVC main profile bitstream show that its decoding complexity can be scalably controlled, and reduced by up to 44% without subjective quality loss.
The quarter-pel motion vector accuracy supported by H.264/advanced video coding (AVC) in motion estimation (ME) and compensation (MC) provides high compression efficiency. However, it also increases the computational complexity. While various well-known fast integer-pel ME methods are already available, lack of a good, fast subpel ME method results in problems associated with relatively high computational complexity. This paper presents one way of solving the complexity problem of subpel ME by making adaptive motion vector (MV) accuracy decisions in inter-mode selection. The proposed MV accuracy decision is made using inter-mode selection of a macroblock with two decision criteria. Pixels are classified as stationary (and/or homogeneous) or nonstationary (and/or nonhomogeneous). In order to avoid unnecessary interpolation and processing, a proper subpel ME level is chosen among four different combinations, each of which has a different MV accuracy and number of subpel ME iterations based on the classification. Simulation results using an open source x264 software encoder show that without any noticeable degradation (by −0.07 dB on average), the proposed method reduces total encoding time and subpel ME time, respectively, by 51.78% and by 76.49% on average, as compared to the conventional full-pel pixel search.
This paper proposes a motion vector coding scheme which uses the optimal predictive motion vector from the surrounding causal motion vectors in the minimum rate-distortion sense. The signaling overhead for the selected predictive motion vector is reduced by a contradiction testing that operates under a predefined criterion at both encoder and decoder for pruning the candidate predictive motion vectors.
The H.264/AVC deblocking filter pays little attention to intracoded blocks. We enhance this filter by extending it to use intraprediction mode information in its adaptive application to the intracoded block. Experiments show its higher coding efficiency, with blocking artifacts sufficiently minimized in intracoded blocks.
A new motion vector coding method with optimal predictive motion vector selection is proposed. To improve compression performance, the proposed encoder selects an optimal predictive motion vector that produces minimum bits for motion vector coding. The proposed decoder estimates the optimal predictive motion vector without additional information for indicating which predictor is to be used at the encoder side. Experimental results show that compared to the H.264/AVC standard, the proposed scheme improves coding efficiency for various video sequences.
Motion vectors correlate very well with other neighboring motion vectors. Thus, many macroblocks have zero residual motion vectors within their blocks after differential pulse coded modulation using their individually predicted motion vectors. Motivated by this observation, we develop a new joint encoding scheme of motion vectors by defining a new macroblock coding mode called pooled zero motion vector difference coding to jointly code such cases more efficiently. Experimental results with several well-known video test sequences verify that the proposed method improves the coding efficiency up to 6.2% compared to the H.264|advanced video coding (AVC).
In order to achieve high computational performance and low power consumption, modern microprocessors are usually equipped with special multimedia instructions, multi-threading, and/or multi-core processing capabilities. Therefore, parallelizing H.264/AVC algorithm is crucial in implementing real-time encoder on multi-thread
(or -core) processor. Also, there is a significant need for investigation on complexity reduction algorithms such as fast inter mode selection.
Multi-core system makes it possible to uniformly distribute workloads of H.264/AVC over a number of slower and simpler processor cores each consisting of single high performance processor. Therefore, in this paper, we propose a new adaptive slice size selection technique for efficient slice-level parallelism of H.264/AVC encoder on multi-core (or multi-thread) processor using fast inter mode selection as a
pre-processing. The simulation results show that the proposed adaptive slice-level parallelism has a good parallel performance compared to fixed slice size parallelism. The experiment methods and results can be applied to many multi-processor systems for real-time H.264 video encoding.
In this paper, we propose the macroblock-level adaptive dynamic resolution conversion (DRC) technique usable by encoder to decide to reduce the resolution of input image on block-by-block basis for better compression efficiency. By reducing the spatial resolution of the block in the proposed scheme, it provides additional compression. As a proper resolution of a block is selected adaptively in the rate-distortion optimized way, more flexible coding is supported to adapt to the feature of image. Simulation based on the state of the art codec H.264 standard demonstrates that the proposed scheme has better performance than H.264 in terms of rate-distortion.
The flickering effect is a serious problem of intra-only coding and is caused by different accuracy loss of transform
coefficients by the quantization process from frame to frame. Nevertheless, the study for its solution has never been
sufficient. In this paper, we analyze why flickering effect happens and illustrate our results of observation using intra-only
coding scheme of the H.264/AVC standard. Based on our analysis, we propose a flickering effect reduction scheme
which is a pre-processing method using the Kalman filtering algorithm. Simulation results show that the proposed
scheme increases subjective visual quality by removing flickering effect.
Compared to conventional video standards, the main features of H.264 standard are its high coding efficiency and its network friendliness. In spite of these outstanding merits, it is not easy to implement H.264 codec as a real-time system, due to its requirements of large memory bandwidth and intensive computation. Although the variable-block-size motion compensation using multiple reference frames is one of the key coding tools to bring about its main performance gain, its optimal use demands substantial computation for the rate-distortion calculation of all possible combinations of coding modes and estimation of the best motion vector. Many existing fast motion estimation algorithms are not suitable for H.264, which employs variable motion block sizes. We propose an adaptive motion search scheme utilizing the hierarchical block structure based on the deviation of subblock motion vectors. The proposed fast scheme adjusts the search center and search pattern according to the subblock motion-vector distribution.
The standardization for the scalable extension of H.264 has called
for additional functionality based on H.264 standard to support the
combined spatio-temporal and SNR scalability. For the entropy coding
of H.264 scalable extension, Context-based Adaptive Binary
Arithmetic Coding (CABAC) scheme is considered so far. In this
paper, we present a new context modeling scheme by using inter layer
correlation between the syntax elements. As a result, it improves
coding efficiency of entropy coding in H.264 scalable extension. In
simulation results of applying the proposed scheme to encoding the
syntax element mb_type, it is shown that improvement in
coding efficiency of the proposed method is up to 16% in terms of
bit saving due to estimation of more adequate probability model.
In this paper, we propose a multiple description coder for motion vector (MV-MDC) based on data partitioned bitstream of the H.264/AVC standard. The proposed multiple description (MD) encoder separates the motion vector (MV) into two parts having the same priority and transmits each part through an independent packet. The proposed MD decoding scheme utilizes two matching criteria to find the accurate MV estimate when one of the MV descriptions is lost. Simulation results show that compared to simply duplicated bitstream transmission, the proposed MV-MDC scheme reduces a large amount of data without serious visual quality loss of reconstructed picture.
Most of fast block motion estimation algorithms reported so far in literatures aim to reduce the computation in terms of the number of search points, thus do not fit well with multimedia processors due to their irregular data flow. For multimedia processors, proper reuse of data is more important than reducing number of absolute difference operations because the execution cycle performance strongly depends on the number of off-chip memory access. Therefore, in this paper, we propose a sub-sampling predictive line search (SS-PLS) algorithm using line search pattern which can increase data reuse from on-chip local buffer, and check sub-sampling points in line search pattern to reduce unnecessary SAD operation. Our experimental results show that the prediction error (MAE) performance of the proposed SS-PLS is similar to that of the full search block matching algorithm (FSBMA), while compared with the hexagonal-based search (HEXBS), the SS-PLS outperforms. Also the proposed SS-PLS requires much lower off-chip memory access than the conventional fast motion estimation algorithm such as the hexagonal-based search (HEXBS) and the predictive line search (PLS). As a result, the proposed SS-PLS algorithm requires lower number of execution cycles in multimedia processor.
Recent advances in video coding technology have resulted in rapid growth of application in mobile communication. With this explosive growth, reliable transmission and error resilient technique become increasingly necessary to offer high quality multimedia service. This paper discusses the error resilient performances of the MPEG-4 simple profile under the H.324/M and the H.264 baseline under the IP packet networks. MPEG-4 simple profile has error resilient tools such as resynchronization marker insertion, data partitioning, and reversible VLC. H.264 baseline has the flexible macroblock ordering scheme, and others. The objective and subjective quality of decoded video is measured under various random bit and burst error conditions.
The adaptive coding schemes in H.264 standard provide a significant coding efficiency and some additional features like error resilience and network friendliness. The variable block size motion compensation using multiple reference frames is one of the key H.264 coding elements to provide notable performance gain. However it is also the main culprit that increases the overall computational complexity. For this reason, this paper proposes a fast algorithm for variable block size motion estimation in H.264. In addition, we also propose a fast mode decision scheme by classifying modes based on rate-distortion cost. The experimental results show that the combined proposed methods provide significant improvement in processing speed without noticeable coding loss.
In this paper, we propose a RST-robust watermarking algorithm which exploits the orientation feature of a host image by using 2D Gabor kernels. From the viewpoint of watermark detection, host images are usually regarded as noise. However, since geometric manipulations affect watermarks as well as the host images simultaneously, evaluating host image can be helpful to measure the nature of distortion. To make most use of this property, we first hierarchically find the orientation of the host image with 2D Gabor kernels and insert a modified reference pattern aligned to the estimated orientation in a selected transformed domain. Since the pattern is generated in a repetitive manner according to the orientation, in its detection step, we can simply project the signal in the direction of image orientation and average the projected value to obtain a 1-D average pattern. Finally, correlation of the 1-D projection average pattern with watermark identifies periodic peaks. Analyzed are experimental results against geometric attacks including aspect ratio changes and rotation.
This paper proposes a method of error detection and recovery by hiding specific information into video bitstream using fragile watermarking and checking it later. The proposed method requires no additional bits into compressed bitstream since it embeds a user-specific data pattern in the least significant bits of LEVEL's in VLC codewords. The decoder can extract the information to check whether there is an error in the received bitstream. We also propose to use this method to embed essential data such as motion vectors that can be used for error recovery. The proposed method can detect corrupted MB's that usually escape the conventional syntax-based error detection scheme. This proposed method is quite simple and of low complexity.
Under the typical video communication configuration in which a camera is placed on top or at lateral side of a monitor, the face-to-face video communication has an inherent difficulty of poor eye contacts since the users stare at the monitor screen rather than directly seeing the camera lens. In this paper, we propose an image warping technique for gaze-correction which performs 3D warping of face object in the given image by a certain correction angle. The correction angle which is the angle between the direction of eye gaze and that to the camera is estimated in an unsupervised way by using eye tracking technique. Experimental results with real image data shows much enhanced naturalness which the face-to-face video communication has to offer.
A camera used in video communication over Internet is usually placed on top of a monitor, therefore it is hard for an user to make a natural eye contact with the peer communicator since the user gazes at the monitor, not the camera lens. In this paper, we propose a single 3D mesh warping technique for gaze-correction. It performs 3D rotation of face image by a certain correction angle to obtain a gaze-corrected image. The correction angle is estimated in an unsupervised way by using invariant face feature, and a very simple face section model is used in 3D rotation instead of precise, but not easily attainable in most cases, 3D face models. The method is computationally simple enough to implement for real-time casual video communication applications.
Several new approaches are being investigated in conjunction with the low bit rate coding, such as MPEG-4, to overcome the limitation imposed by block-based image compression. One solution is to use 'warping' prediction (or spatial transformation) based on a set of control points where one of the most important issues is how to adequately place the control points not destroying salient features such as edges and corners. In this paper, we propose a new image representation scheme based on irregular triangular mesh structure in which, considering the salient features, a considerably reduced number of control points are adaptively selected out of initial uniformly distributed control points. A new criterion based on local representation error is defined to be used in successive control point removal exploiting global image features, thus providing better image representation. Computer simulation has shown that the proposed scheme gives significantly improved image representation performance compared with the conventional scheme based on regular meshes in both objective and subjective qualities.
This paper proposes a novel blocking artifacts reduction method which is based on the notion that the blocking artifacts are present in images due to heavy accuracy loss of transform coefficients in the quantization process. We define the block boundary discontinuity measure as the sum of the squared differences of pixel values along the block boundary. The proposed method makes correction of the selected transform coefficients so that the resultant image has minimum block boundary discontinuity. It does not specify a transform domain where the correction should take place, therefore an appropriate transform domain can be selected at user's discretion. In the experiments, the scheme is applied to DCT- based compressed images to show its performance.
In many image sequence compression applications, Huffman coding is used to reduce statistical redundancy in quantized transform coefficients. The Huffman codeword table is often pre-defined to reduce coding delay and table transmission overhead. Local symbol statistics, however, may be much different from the global one manifested in the pre-defined table. In this paper, we propose a dynamic Huffman coding method which can adaptively modify the given codeword and symbol association according to the local statistics. Over a certain set of blocks, local symbol statistics is observed and used to re-associate the symbols to the codewords in such a way that shorter codewords are assigned to more frequency symbols. This modified code table is used to code the next set of blocks. A parameter is set up so that the relative degree of sensitivity of the local statistics to the global one can be controlled. By performing the same modification to the code table using the decoded symbols, it is possible to keep up with the code table changes in receiving side. The code table modification information need not be transmitted to the receiver. Therefore, there is no extra transmission overhead in employing this method.