With the advent of light field acquisition technologies, the captured information of the scene is enriched by having both angular and spatial information. The captured information provides additional capabilities in the post processing stage, e.g. refocusing, 3D scene reconstruction, synthetic aperture etc. Light field capturing devices are classified in two categories. In the first category, a single plenoptic camera is used to capture a densely sampled light field, and in second category, multiple traditional cameras are used to capture a sparsely sampled light field. In both cases, the size of captured data increases with the additional angular information. The recent call for proposal related to compression of light field data by Joint Picture Expert Group (JPEG), also called JPEG Pleno, reflects the need of a new and efficient light field compression solution. In this paper, we propose a compression solution for sparsely sampled light field data. Each view of multi-camera system is interpreted as a frame of multi-view sequences. The pseudo multi-view sequences are compressed using state-ofart Multiview-extension of High Efficiency Video Coding (MV-HEVC). A subset of four light field images from Stanford dataset are compressed, on four bit-rates in order to cover the low to high bit-rates scenarios. The comparison is made with state-of-art reference encoder HEVC and its real-time implementation x265. The rate distortion analysis shows that the proposed compression scheme outperforms both reference schemes in all tested bit-rate scenarios for all the test images. The average BD-PSNR gain of 1.36 dB over HEVC and 2.15 dB over x265 is achieved using the proposed compression scheme.
The ongoing success of three-dimensional (3D) cinema fuels increasing efforts to spread the commercial success of 3D to new markets. The possibilities of a convincing 3D experience at home, such as three-dimensional television (3DTV), has generated a great deal of interest within the research and standardization community. A central issue for 3DTV is the creation and representation of 3D content. Acquiring scene depth information is a fundamental task in computer vision, yet complex and error-prone. Dedicated range sensors, such as the Time of-Flight camera (ToF), can simplify the scene depth capture process and overcome shortcomings of traditional solutions, such as active or passive stereo analysis. Admittedly, currently available ToF sensors deliver only a limited spatial resolution. However, sophisticated depth upscaling approaches use texture information to match depth and video resolution. At Electronic Imaging 2012 we proposed an upscaling routine based on error energy minimization, weighted with edge information from an accompanying video source. In this article we develop our algorithm further. By adding temporal consistency constraints to the upscaling process, we reduce disturbing depth jumps and flickering artifacts in the final 3DTV content. Temporal consistency in depth maps enhances the 3D experience, leading to a wider acceptance of 3D media content. More content in better quality can boost the commercial success of 3DTV.
Multi-view three-dimensional television relies on view synthesis to reduce the number of views being transmitted. Arbitrary views can be synthesized by utilizing corresponding depth images with textures. The depth images obtained from stereo pairs or range cameras may contain erroneous values, which entail artifacts in a rendered view. Post-processing of the data may then be utilized to enhance the depth image with the purpose to reach a better quality of synthesized views. We propose a Partial Differential Equation (PDE)-based interpolation method for a reconstruction of the smooth areas in depth images, while preserving significant edges. We modeled the depth image by adjusting thresholds for edge detection and a uniform sparse sampling factor followed by the second order PDE interpolation. The objective results show that a depth image processed by the proposed method can achieve a better quality of synthesized views than the original depth image. Visual inspection confirmed the results.
Multi-view three-dimensional television requires many views, which may be synthesized from two-dimensional images with accompanying pixel-wise depth information. This depth image, which typically consists of smooth areas and sharp transitions at object borders, must be consistent with the acquired scene in order for synthesized views to be of good quality. We have previously proposed a depth image coding scheme that preserves significant edges and encodes smooth areas between these. An objective evaluation considering the structural similarity (SSIM) index for synthesized views demonstrated an advantage to the proposed scheme over the High Efficiency Video Coding (HEVC) intra mode in certain cases. However, there were some discrepancies between the outcomes from the objective evaluation and from our visual inspection, which motivated this study of subjective tests. The test was conducted according to ITU-R BT.500-13 recommendation with Stimulus-comparison methods. The results from the subjective test showed that the proposed scheme performs slightly better than HEVC with statistical significance at majority of the tested bit rates for the given contents.
Integral Imaging is a technique to obtain true color 3D images that can provide full and continuous motion parallax for several viewers. The depth of field of these systems is mainly limited by the numerical aperture of each lenslet of the microlens array. A digital method has been developed to increase the depth of field of Integral Imaging systems in the reconstruction stage. By means of the disparity map of each elemental image, it is possible to classify the objects of the scene according to their distance from the microlenses and apply a selective deconvolution for each depth of the scene. Topographical reconstructions with enhanced depth of field of a 3D scene are presented to support our proposal.
Depth-Image-Based Rendering (DIBR) of virtual views is a fundamental method in three dimensional 3-D video applications to produce different perspectives from texture and depth information, in particular the multi-view-plus-depth (MVD) format. Artifacts are still present in virtual views as a consequence of imperfect rendering using existing DIBR methods. In this paper, we propose an alternative DIBR method for MVD. In the proposed method we introduce an edge pixel and interpolate pixel values in the virtual view using the actual projected coordinates from two adjacent views, by which cracks and disocclusions are automatically filled. In particular, we propose a method to merge pixel information from two adjacent views in the virtual view before the interpolation; we apply a weighted averaging of projected pixels within the range of one pixel in the virtual view. We compared virtual view images rendered by the proposed method to the corresponding view images rendered by state-of-theart methods. Objective metrics demonstrated an advantage of the proposed method for most investigated media contents. Subjective test results showed preference to different methods depending on media content, and the test could not demonstrate a significant difference between the proposed method and state-of-the-art methods.
Complex multidimensional capturing setups such as plenoptic cameras (PC) introduce a trade-off between various
system properties. Consequently, established capturing properties, like image resolution, need to be described
thoroughly for these systems. Therefore models and metrics that assist exploring and formulating this trade-off
are highly beneficial for studying as well as designing of complex capturing systems. This work demonstrates the
capability of our previously proposed sampling pattern cube (SPC) model to extract the lateral resolution for
plenoptic capturing systems. The SPC carries both ray information as well as focal properties of the capturing
system it models. The proposed operator extracts the lateral resolution from the SPC model throughout an
arbitrary number of depth planes giving a depth-resolution profile. This operator utilizes focal properties of the
capturing system as well as the geometrical distribution of the light containers which are the elements in the SPC
model. We have validated the lateral resolution operator for different capturing setups by comparing the results
with those from Monte Carlo numerical simulations based on the wave optics model. The lateral resolution
predicted by the SPC model agrees with the results from the more complex wave optics model better than both
the ray based model and our previously proposed lateral resolution operator. This agreement strengthens the
conclusion that the SPC fills the gap between ray-based models and the real system performance, by including
the focal information of the system as a model parameter. The SPC is proven a simple yet efficient model for
extracting the lateral resolution as a high-level property of complex plenoptic capturing systems.
New display technologies enable the usage of 3D-visualization in a medical context. Even though user performance seems to be enhanced with respect to 2D thanks to the addition of recreated depth cues, human factors, and more particularly visual comfort and visual fatigue can still be a bridle to the widespread use of these systems. This study aimed at evaluating and comparing two different 3D visualization systems (a market stereoscopic display, and a state-of-the-art multi-view display) in terms of quality of experience (QoE), in the context of interactive medical visualization. An adapted methodology was designed in order to subjectively evaluate the experience of users. 14 medical doctors and 15 medical students took part in the experiment. After solving different tasks using the 3D reconstruction of a phantom object, they were asked to judge their quality of the experience, according to specific features. They were also asked to give their opinion about the influence of 3D-systems on their work conditions. Results suggest that medical doctors are opened to 3D-visualization techniques and are confident concerning their beneficial influence on their work. However, visual comfort and visual fatigue are still an issue of 3D-displays. Results obtained with the multi-view display suggest that the use of continuous horizontal parallax might be the future response to these current limitations.
Autostereoscopic multi view displays require multiple views of a scene to provide motion parallax. When an observer
changes viewing angle different stereoscopic pairs are perceived. This allows new perspectives of the scene to be seen
giving a more realistic 3D experience. However, capturing arbitrary number of views is at best cumbersome, and in some
occasions impossible. Conventional stereo video (CSV) operates on two video signals captured using two cameras at two
different perspectives. Generation and transmission of two views is more feasible than that of multiple views. It would be
more efficient if multiple views required by an autostereoscopic display can be synthesized from these sparse set of views.
This paper addresses the conversion of stereoscopic video to multiview video using the video effect morphing. Different
morphing algorithms are implemented and evaluated. Contrary to traditional conversion methods, these algorithms disregard
the physical depth explicitly and instead generate intermediate views using sparse sets of correspondence features
and image morphing. A novel morphing algorithm is also presented that uses scale invariant feature transform (SIFT) and
segmentation to construct robust correspondences features and qualitative intermediate views. All algorithms are evaluated
on a subjective and objective basis and the comparison results are presented.
Presentations on multiview and lightfield displays have become increasingly popular. The restricted number of views
implies an unsmooth transition between views if objects with sharp edges are far from the display plane. The
phenomenon is explained by inter-perspective aliasing. This is undesirable in applications where a correct perception of
the scene is required, such as in science and medicine. Anti-aliasing filters have been proposed in the literature, and are
defined according to the minimum and maximum depth present in the scene. We suggest a method that subdivides the
ray-space and adjusts the anti-aliasing filter to the scene contents locally. We further propose new filter kernels based on
the ray space frequency domain that assures no aliasing, yet keeping maximum information unaltered. The proposed
method outperforms filters of earlier works. Different filter kernels are compared. Details of the output are sharper using
a proposed filter kernel, which also preserves the most information.
Accurate depth maps are a pre-requisite in three-dimensional television, e.g. for high quality view synthesis, but
this information is not always easily obtained. Depth information gained by correspondence matching from two or
more views suffers from disocclusions and low-texturized regions, leading to erroneous depth maps. These errors
can be avoided by using depth from dedicated range sensors, e.g. time-of-flight sensors. Because these sensors
only have restricted resolution, the resulting depth data need to be adjusted to the resolution of the appropriate
texture frame. Standard upscaling methods provide only limited quality results. This paper proposes a solution
for upscaling low resolution depth data to match high resolution texture data. We introduce We introduce
the Edge Weighted Optimization Concept (EWOC) for fusing low resolution depth maps with corresponding
high resolution video frames by solving an overdetermined linear equation system. Similar to other approaches,
we take information from the high resolution texture, but additionally validate this information with the low
resolution depth to accentuate correlated data. Objective tests show an improvement in depth map quality in
comparison to other upscaling approaches. This improvement is subjectively confirmed in the resulting view
Broadcasting of high definition (HD) stereobased 3D (S3D) TV are planned, or has already begun, in Europe, the US,
and Japan. Specific data processing operations such as compression and temporal and spatial resampling are commonly
used tools for saving network bandwidth when IPTV is the distribution form, as this results in more efficient recording
and transmission of 3DTV signals, however at the same time it inevitably brings quality degradations to the processed
video. This paper investigated observers quality judgments of state of the art video coding schemes (simulcast
H.264/AVC or H.264/MVC), with or without added temporal and spatial resolution reduction of S3D videos, by
subjective experiments using the Absolute Category Rating method (ACR) method. The results showed that a certain
spatial resolution reduction working together with high quality video compressing was the most bandwidth efficient way
of processing video data when the required video quality is to be judged as "good" quality. As the subjective experiment
was performed in two different laboratories in two different countries in parallel, a detailed analysis of the interlab
differences was performed.
Different compression formats for stereo and multiview based 3D video is being standardized and software players capable
of decoding and presenting these formats onto different display types is a vital part in the commercialization and evolution
of 3D video. However, the number of publicly available software video players capable of decoding and playing multiview
3D video is still quite limited. This paper describes the design and implementation of a GPU-based real-time 3D
video playback solution, built on top of cross-platform, open source libraries for video decoding and hardware accelerated
graphics. A software architecture is presented that efficiently process and presents high definition 3D video in real-time
and in a flexible manner support both current 3D video formats and emerging standards. Moreover, a set of bottlenecks in
the processing of 3D video content in a GPU-based real-time 3D video playback solution is identified and discussed.
Autostereoscopic multiview 3D displays have been available for number of years, capable of producing a perception of
depth in a 3D image without requiring user-worn glasses. Different approaches to compress these 3D images exist. Two
compression schemes, and how they affect the 3D image with respect to induced distortion, is investigated in this paper:
JPEG 2000 and H.264/AVC. The investigation is conducted in three parts: objective measurement, qualitative subjective
evaluation, and a quantitative user test. The objective measurement shows that the Rate-Distortion (RD) characteristic
of the two compression schemes differ in character as well as in level of PSNR. The qualitative evaluation is performed
at bitrates where the two schemes have the same RD fraction and a number of distortion characteristics are found to be
significantly different. However, the quantitative evaluation, performed using 14 non-expert viewers, indicates that the
different distortion types do not significantly contribute to the overall perceived 3D quality. The used bitrate, and the
content of the original 3D image, is the two factors that most significantly affect the perceived 3D image quality. In
addition, the evaluation results suggest that viewers prefer less apparent depth and motion parallax when being exposed to
compressed 3D images on an autostereoscopic multiview display.
The two-dimensional quality metric Peak-Signal-To-Noise-Ratio (PSNR) is often used to evaluate the quality of coding schemes for different types of light field based 3D-images, e.g. integral imaging or multi-view. The metric results in a single accumulated quality value for the whole 3D-image. Evaluating single views -- seen from specific viewing angles -- gives a quality matrix that present the 3D-image quality as a function of viewing angle. However, these two approaches do not capture all aspects of the induced distortion in a coded 3D-image. We have previously shown coding schemes of similar kind for which coding artifacts are distributed differently with respect to the 3D-image's depth. In this paper we propose a novel metric that captures the depth distribution of coding-induced distortion. Each element in the resulting quality vector corresponds to the quality at a specific depth. First we introduce the proposed full-reference metric and the operations on which it is based. Second, the experimental setup is presented. Finally, the metric is evaluated on a set of differently coded 3D-images and the results are compared, both with previously proposed quality metrics and with visual inspection.
To provide sufficient 3D-depth fidelity, integral imaging (II) requires an increase in spatial resolution of several orders of
magnitude from today's 2D images. We have recently proposed a pre-processing and compression scheme for still II-frames
based on forming a pseudo video sequence (PVS) from sub images (SI), which is later coded using the H.264/MPEG-4
AVC video coding standard. The scheme has shown good performance on a set of reference images. In this paper we first
investigate and present how five different ways to select the SIs when forming the PVS affect the schemes compression
efficiency. We also study how the II-frame structure relates to the performance of a PVS coding scheme. Finally we
examine the nature of the coding artifacts which are specific to the evaluated PVS-schemes. We can conclude that for all
except the most complex reference image, all evaluated SI selection orders significantly outperforms JPEG 2000 where
compression ratios of up to 342:1, while still keeping PSNR > 30 dB, is achieved. We can also confirm that when selecting
PVS-scheme, the scheme which results in a higher PVS-picture resolution should be preferred to maximize compression
efficiency. Our study of the coded II-frames also indicates that the SI-based PVS, contrary to other PVS schemes, tends to
distribute its coding artifacts more homogenously over all 3D-scene depths.
The next evolutionary step in enhancing video communication fidelity over wired and wireless networks is taken by adding scene depth. Three-dimensional video using integral imaging (II) based capture and display subsystems has shown promising results and is now in the early prototype stage. We have created a ray-tracing based interactive simulation environment to generate II video sequences as a way to assist in the development, evaluation and quick adoption of these new emerging techniques into the whole communication chain. A generic II description model is also proposed as the base for the simulation environment. This description model facilitate optically accurate II rendering using MegaPOV, a customized version of the
open-source ray tracing package POV-Ray. By using MegaPOV as a rendering engine the simulation environment fully incorporate the scene description language of POV-Ray to exactly define a virtual scene. Generation and comparability of II video sequences adhering to different II-techniques is thereby greatly assisted, compared to experimental research. The initial development of the simulation environment is focused on generating and visualizing II source material adhering to the optical properties of different II- techniques published in the literature. Both temporally static as well as dynamic systems are considered. The simulation environment's potential for easy deployment of integral imaging video sequences adhering to different II-techniques is demonstrated.