Scene flow is a three-dimensional (3D) vector field with depth directional motion and optical flow and it can be applied to inter prediction in 3D video coding. The conventional method regularizes the scene flow so that it locally approaches a constant function, so there is the problem that it is difficult to handle spatial changes and motion boundaries. Regularizations called thin-plate spline or deformable model have been introduced in variational optical flow estimation because they find the solution locally as a linear function and may alleviate the problem. However, because the partial differential equation derived by thin-plate spline regularization includes the fourth-order partial differential, it is not easy to derive an analytical solution or solve it by a numerically stable iterative method in terms of numerical analysis. Previous researches have proposed a numerically stable iterative method that does not include thin plate spline regularization for scene flow estimation. Therefore, while making use of the framework of the previous researches, I derive a partial differential equation using thin plate spline regularization for scene flow estimation.
This paper describes a method of designing a 2D post filter for reducing coding artifacts caused by lossy image compression. Though Mean Squared Error (MSE) has been typically used in such filter design, it is not necessarily a good quality measure in terms of consistency with subjective perception. In this paper, we employ a more reliable quality measure called Structural SIMilarity (SSIM), and derive filter coefficients that can maximize the SSIM score for each image.
Recently, convolutional neural network-based generative models of image signals have been proposed mainly for the purpose of image generation, restoration and compression. For example, PixelCNN++ approximates probability distribution of the image intensity value as a parametric function pel-by-pel, and can be used for lossless image coding tasks. However, such an approach cannot work well for specific images which have statistical properties different from the image dataset used for the network training. In this paper, we improve the coding efficiency by introducing a few parameters for adjusting the probability model generated by PixelCNN++. These parameters are numerically optimized to minimize coding rates of the given image and then encoded as side-information to enable same adjustment at the decoder side.
Seam carving and its variants are popular as content-aware image resizing methods. However, they often suffer from the problem that excessive downscaling causes perceptually annoying distortions. This is mainly because penetration of the seams into some important objects becomes unavoidable at the latter stage of the processing. As a solution for this problem, we previously proposed a nonlinear downscaling technique which iteratively performed a DCT-based locally linear scaling operator within ‘belt-like seams’, i.e. seams with a certain width. To enhance this idea, in this paper, we replace the latter processing stage with a global linear scaling operator. A transition point between the nonlinear and linear processing stages is automatically determined based on a preservation measurement for the important objects. Simulation results show that our approach can produce subjectively better results than the conventional nonlinear downscaling methods.
This paper describes an efficient lossless coding method for HDR color images stored in a floating point format called radiance RGBE. In this method, three mantissa parts of RGB components as well as a common exponent part, each of which is represented in 8-bit depth, are encoded by the block-adaptive prediction technique. In order to improve the prediction accuracy, mantissa parts of RGB components used in the prediction are adjusted so that their exponent parts can be regarded as same. Moreover, not only the same color but also already encoded other color components are used in the prediction to exploit inter-color correlations. Simulation results indicate that introduction of the above exponent equalization as well as inter-color prediction can considerably improve the coding efficiency.
We previously proposed a lossless video coding method based on intra/inter-frame example search and probability model optimization. In this method, several examples, i.e. a set of pels whose neighborhoods are similar to a local texture of the target pel to be encoded, are searched from already encoded areas of the current and previous frames with integer pel accuracy. Probability distribution of an image value at the target pel is then modeled as weighted sum of the Gaussian functions whose peaked positions are given by the individual examples. Furthermore, model parameters that control shapes of the Gaussian functions are numerically optimized so that the resulting coding rate can be a minimum. In this paper, the above example search process is enhanced to allow fractional-pel positions for more accurate probability modeling.
OpenCV is an open source programming library for computer vision and image processing, and has been used worldwide in industry and academia as a de facto standard for more than 10 years, from large-scale computers to embedded devices such as smartphones. OpenCV has image data representation classes cvMat and cvUMat as image processing APIs, and the latter can implement parallel computation using heterogeneous computing such as the OpenCL framework. Operator overloading is a technique that enables intuitive programming using arithmetic operations and assignment operators, and is defined in cvMat. On the other hand, operator overloading is not defined in cvUMat, and it is necessary to call functions appropriately. Therefore, programming using cvUMat is difficult, and there is a problem that the code is not compatible with cvMat using operator overloading. In addition, cvMat operator overloading has a problem of extra memory reallocation at runtime, so it is not appropriate to apply it directly to cvUMat. Therefore, in this paper, we propose a method to realize operator overloading that does not require extra memory reallocation at runtime and can be equivalently converted to cvUMat function calls at compile time. This method enables programming that achieves intuitive operator overloading without any overhead at runtime.
This paper describes a novel lossless video coding method that directly estimates a probability distribution of image values pel-by-pel. In the estimation process, several examples, i.e. a set of pels whose neighborhoods are similar to a local texture of the target pel to be encoded, are gathered from search windows located on an already encoded area of the current frame as well as those of the previous frames. Then the probability distribution is modeled as a weighted sum of the Gaussian functions whose center positions are given by the individual examples. Furthermore, model parameters that control shapes of the Gaussian functions are numerically optimized so that the resulting coding rate can be a minimum. Simulation results indicate that the coding performance can be improved by increasing the number of reference frames.
In general, "drawing collapse" is a word used when very low quality animated contents are broadcast. For example, perspective of the scene is unnaturally distorted and/or sizes of people and buildings are abnormally unbalanced. In our research, possibility of automatic discrimination of drawing collapse is explored for the purpose of reducing a workload for content check typically done by the animation director. In this paper, we focus only on faces of animated characters as a preliminary task, and distances as well as angles between several feature points on facial parts are used as input data. By training a support vector machine (SVM) using the input data extracted from both positive and negative example images, about 90% of discrimination accuracy is obtained when the same character is tested.
This paper describes a method for creating cel-style CG animations of waving hair. In this method, gatherings of air are considered as virtual circles moving at a constant velocity, and hair bundles are modeled as elastic bodies. Deformation of the hair bundles is then calculated by simulating collision events between the virtual circles and the hair bundles. Since the method is based on the animator's technique used in creation of the traditional cel animations, it is expected to suppress a feeling of strangeness that is often introduced by the conventional procedural animation techniques.
We previously proposed a machine learning based post filtering method for reducing image artifacts caused by lossy compression. The method classifies reconstructed image samples into three categories using a support vector machine (SVM) to roughly discriminate magnitude of the reconstruction errors. Then, an optimum offset value is added to the samples belonging to each category in a similar way to the post filtering technique called sample adaptive offset (SAO) used in the H.265/HEVC standard. In this paper, two kinds of SVM classifiers are adaptively switched according to information on block boundaries of transform units (TUs) in H.265/HEVC intra-frame coding. Furthermore, samples used for a feature vector, which will be fed to the SVM classifier, are rotated at the block boundary to properly capture local characteristics of the reconstruction errors.