This PDF file contains the front matter associated with SPIE Proceedings Volume 9029, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.
The High Efficiency Video Coding has a significant compression performance benefit versus previous standards. Thanks to the high efficiency prediction tools, blocks with all-zero quantized transform coefficients are quite common in HEVC. The computation load of transform and quantization can be remarkably reduced if the all-zero blocks can be detected prior to transform and quantization. Based on the theoretical analysis of the integer transform and quantization process in HEVC, we propose some SAD thresholds under which all-zero block can be detected. Simulation results show that with our proposed method, nearly 37% time saving for computation time of transform and quantization can be saved.
In conventional motion compensation, prediction block is related only with one motion vector for P frame. Multihypothesis motion compensation (MHMC) is proposed to improve the prediction performance of conventional motion compensation. However, multiple motion vectors have to be searched and coded for MHMC. In this paper, we propose a new low-cost multi-hypothesis motion compensation (LMHMC) scheme. In LMHMC, a block can be predicted from multiple-hypothesis with only one motion vector to be searched and coded into bit-stream, other motion vectors are predicted from motion vectors of neighboring blocks, and so both the encoding complexity and bit-rate of MHMC can be saved by our proposed LMHMC. By adding LMHMC as an additional mode in MPEG internet video coding (IVC) platform, the B-D rate saving is up to 10%, and the average B-D rate saving is close to 5% in Low Delay configure. We also compare the performance between MHMC and LMHMC in IVC platform, the performance of MHMC is improved about 2% on average by LMHMC.
The template matching prediction is an established approach to intra-frame coding that makes use of previously coded pixels in the same frame for reference. It compares the previously reconstructed upper and left boundaries in searching from the reference area the best matched block for prediction, and hence eliminates the need of sending additional information to reproduce the same prediction at decoder. In viewing the image signal as an auto-regressive model, this work is premised on the fact that pixels closer to the known block boundary are better predicted than those far apart. It significantly extends the scope of the template matching approach, which is typically followed by a conventional discrete cosine transform (DCT) for the prediction residuals, by employing an asymmetric discrete sine transform (ADST), whose basis functions vanish at the prediction boundary and reach maximum magnitude at far end, to fully exploit statistics of the residual signals. It was experimentally shown that the proposed scheme provides substantial coding performance gains on top of the conventional template matching method over the baseline.
Recent development in hardware and software allowed a new generation of video quality. However, the development in networking and digital communication is lagging behind. This prompted the establishment of the Joint Collaborative Team on Video Coding (JCT-VC), with an objective to develop a new high-performance video coding standard. A primary reason for developing the HEVC was to enable efficient processing and transmission for HD videos that normally contain large smooth areas; therefore, the HEVC utilizes larger encoding blocks than the previous standard to enable more effective encoding, while smaller blocks are still exploited to encode fast/complex areas of video more efficiently. Hence, the implementation of the encoder investigates all the possible block sizes. This and many added features on the new standard have led to significant increase in the complexity of the encoding process. Furthermore, there is not an automated process to decide on when large blocks or small blocks should be exploited. To overcome this problem, this research proposes a set of optimization tools to reduce the encoding complexity while maintaining the same quality and compression rate. The method automates this process through a set of hierarchical steps yet using the standard refined coding tools.
This paper presents a two layer CODEC architecture for high dynamic range video compression. The base layer contains the tone mapped video stream encoded with 8 bits per component which can be decoded using conventional equipment. The base layer content is optimized for rendering on low dynamic range displays. The enhancement layer contains the image difference, in perceptually uniform color space, between the result of inverse tone mapped base layer content and the original video stream. Prediction of the high dynamic range content reduces the redundancy in the transmitted data while still preserves highlights and out-of-gamut colors. Perceptually uniform colorspace enables using standard ratedistortion optimization algorithms.
We present techniques for efficient implementation and encoding of non-uniform tone mapping operators with low overhead in terms of bitstream size and number of operations. The transform representation is based on human vision system model and suitable for global and local tone mapping operators. The compression techniques include predicting the transform parameters from previously decoded frames and from already decoded data for current frame. Different video compression techniques are compared: backwards compatible and non-backwards compatible using AVC and HEVC codecs.
A large number of health-related applications are being developed using web infrastructure. Video is increasingly used in healthcare applications to enable communications between patients and care providers. We present a video conferencing system designed for healthcare applications. In face of network congestion, the system uses role-based adaptation to ensure seamless service. A new web technology, WebRTC, is used to enable seamless conferencing applications. We present the video conferencing application and demonstrate the usefulness of role based adaptation.
Established work in the literature has demonstrated that with accurate knowledge of the corresponding blur kernel (or point spread function, PSF), an unblurred prior image can be reliably estimated from one or more blurred observations. It has also been demonstrated, however, that an incorrect PSF specification leads to inaccurate image restoration. In this paper, we present a novel metric which relates the discrepancy between a known PSF and a choice of approximate PSF, and the resulting effect that this discrepancy will have on the reconstruction of an unblurred image. Such a metric is essential to the accurate development and application of a parameterized PSF model.
Several error measures are proposed, which quantify the inaccuracy of image deblurring using a particular incorrect PSF. Using a set of simulation results, it is shown that the desired metric is feasible even without specification of the unblurred prior image or the radiometric response of the camera. It is also shown that the proposed metric accurately and reliably predicts the resulting deblurring error from the use of an approximate PSF in place of an exact PSF.
Camera motion blur is a common problem in low-light imaging applications. It is diffcult to apply image restoration techniques without an accurate blur kernel. Recently, inertial sensors have been successfully utilized to estimate the blur function. However, the effectiveness of these restoration algorithms has been limited by lack of access to unprocessed raw image data obtained directly from the Bayer image sensor.
In the work, raw CFA image data is acquired in conjunction with 3-axis acceleration data using a custom-built imaging system. The raw image data records the redistribution of light but is effected by camera motion and the rolling shutter mechanism. Through the use of acceleration data, the spread of light to neighboring pixels can be determined. We propose a new approach to jointly perform deblurring and demosaicking of the raw image. This approach adopts edge-preserving sparse prior in a MAP framework. The improvements brought by our algorithm is demonstrated by processing the data collected from the imaging system.
In the ultrasound imaging system, blurring which occurs after passing through ultrasound scanner system, represents point spread function (PSF) that describes the response of the ultrasound imaging system to a point source distribution. So, de-blurring can be achieved by de-convolving the ultrasound images with an estimated of corresponding PSF. However, it is hard to attain an accurate estimation of PSF due to the unknown properties of the tissues of the human body through the ultrasound signal propagates. In this paper, we present a new method for PSF estimation in the Fourier domain (FD) based on parametric minimum phase information, and simultaneously, it performs fast 2D de-convolution in the ultrasound imaging system. Although most of complex cepstrum methods , are obtained using complex 2D phase unwrapping   in order to estimate the FD-phase information of PSF, our algorithm estimates the 2D PSF using 2D FD-phase information with the parametric weighting factor α and β. They affect the feature of PSF shapes.This makes the computations much simpler and the estimation more accurate. Our algorithm works on the beam-formed uncompressed radio-frequency data, with pre-measured and estimated 2D PSFs database from actual probe used. We have tested our algorithm with vera-sonic system and commercial ultrasound scanner (Philips C4-2), in known speed of sound phantoms and unknown speeds in vivo scans.
Presence of shadow degrades performance of any computer vision system as a number of shadow points are always misclassified as object points. Various algorithms for shadow detection and removal exist for still images but very few algorithms have been developed for moving objects. This paper introduces a new method for shadow detection and removal from moving object which is based on Dual tree complex wavelet transform. We have chosen Dual tree complex wavelet transform as it is shift invariant and have a better edge detection property as compared to real valued wavelet transform. In the present work, shadow detection and removal has been done by thresholding wavelet coefficients of Dual tree complex wavelet transform of difference of reference frame and the current frame. Standard deviation of wavelet coefficients is used as an optimal threshold. Results after visual and quantitative performance metrics computation shows that the proposed method for shadow detection and removal is better than other state-of-theart methods.
This paper presents a method for tracking human poses in real-time from depth image sequences. The key idea is to adopt recognition for generating the model to be tracked. In contrast to traditional methods utilizing a single-typed 3D body model, we directly define the human body model based on the body part recognition result of the captured depth image, which leads to the reliable tracking regardless of users' appearances. Moreover, the proposed method has the ability to efficiently reduce the tracking drift by exploiting the joint information inserted into our body model. Experimental results on real-world environments show that the proposed method is effective for estimating various human poses in real-time.
Prematurely born infants receive special care in the Neonatal Intensive Care Unit (NICU), where various physiological parameters, such as heart rate, oxygen saturation and temperature are continuously monitored. However, there is no system for monitoring and interpreting their facial expressions, the most prominent discomfort indicator. In this paper, we present an experimental video monitoring system for automatic discomfort detection in infants’ faces based on the analysis of their facial expressions. The proposed system uses an Active Appearance Model (AAM) to robustly track both the global motion of the newborn’s face, as well as its inner features. The system detects discomfort by employing the AAM representations of the face on a frame-by-frame basis, using a Support Vector Machine (SVM) classifier. Three contributions increase the performance of the system. First, we extract several histogram-based texture descriptors to improve the AAM appearance representations. Second, we fuse the outputs of various individual SVM classifiers, which are trained on features with complementary qualities. Third, we improve the temporal behavior and stability of the discomfort detection by applying an averaging filter to the classification outputs. Additionally, for a higher robustness, we explore the effect of applying different image pre-processing algorithms for correcting illumination conditions and for image enhancement to evaluate possible detection improvements. The proposed system is evaluated in 15 videos of 8 infants, yielding a 0.98 AUC performance. As a bonus, the system offers monitoring of the infant’s expressions when it is left unattended and it additionally provides objective judgment of discomfort.
A method for extracting onomatopoeia characters from comic images was developed based on stroke width feature of characters, since they nearly have a constant stroke width in a number of cases. An image was segmented with a constrained Delaunay triangulation. Connected component grouping was performed based on the triangles generated by the constrained Delaunay triangulation. Stroke width calculation of the connected components was conducted based on the altitude of the triangles generated with the constrained Delaunay triangulation. The experimental results proved the effectiveness of the proposed method.
This paper addresses the problem of natural image matting in which the goal is to softly-segment a foreground from a background. Given an input image and some known foreground (FG) and background (BG) pixels, an alpha value indicating a partial foreground coverage is calculated for every other pixel in the image. The proposed algorithm is affiliated to the sampling-based matting techniques where the alpha of every unknown pixel is calculated using some FG / BG pairs that are sampled according to certain criteria. Current sampling based matting techniques suffer from critical disadvantages, leaving the problem open for further development. By adopting a novel FG / BG pair-selection strategy, we propose a technique that overcomes critical pitfalls in the state-of-the-art methods with a performance that is comparable (and superior in certain cases) to them. Our results were evaluated according to the matting online benchmark.
We explore the relevance of Heterogeneous System Architecture (HSA) in Computer Vision, both as a long term vision, and as a near term emerging reality via the recently ratified OpenCL 2.0 Khronos standard. After a brief review of OpenCL 1.2 and 2.0, including HSA features such as Shared Virtual Memory (SVM) and platform atomics, we identify what genres of Computer Vision workloads stand to benefit by leveraging those features, and we suggest a new mental framework that replaces GPU compute with hybrid HSA APU compute. As a case in point, we discuss, in some detail, popular object recognition algorithms (part-based models), emphasizing the interplay and concurrent collaboration between the GPU and CPU. We conclude by describing how OpenCL has been incorporated in OpenCV, a popular open source computer vision library, emphasizing recent work on the Transparent API, to appear in OpenCV 3.0, which unifies the native CPU and OpenCL execution paths under a single API, allowing the same code to execute either on CPU or on a OpenCL enabled device, without even recompiling.