A vastly growing number of productions from the entertainment industry are aiming at 3D movie theatres. These productions
use a two-view format, primarily intended for eye-wear assisted viewing in a well defined environment. To get this
3D content into the home environment, where a large variety of 3D viewing conditions exists (e.g different display sizes,
display types, viewing distance), we need a flexible 3D format that can adjust the depth effect. Such a format is the image
plus depth format in which a video frame is enriched with depth information of all pixels in the video. This format can
be extended with an additional layer for occluded video and associated depth, that contains what is behind objects in the
video. To produce 3D content in this extended format, one has to deduce what is behind objects. There are various axes
along which this occluded data can be obtained. This paper presents a method to automatically detect and fill the occluded
areas exploiting the temporal axis. To get visually pleasing results, it is of utmost importance to make the inpainting globally
consistent. To do so, we start by analyzing data along the temporal axis and compute a confidence for each pixel.
Then pixels from the future and the past that are not visible in the current frame are weighted and accumulated based on
computed confidences. These results are then fed to a generic multi-source framework that computes the occlusion layer
based on the available confidences and occlusion data.
Depth maps are used in many applications, e.g. 3D television, stereo matching, segmentation, etc. Often, depth maps are
available at a lower resolution compared to the corresponding image data. For these applications, depth maps must be
upsampled to the image resolution. Recently, joint bilateral filters are proposed to upsample depth maps in a single step.
In this solution, a high-resolution output depth is computed as a weighted average of surrounding low-resolution depth
values, where the weight calculation depends on spatial distance function and intensity range function on the related
image data. Compared to that, we present two novel ideas. Firstly, we apply anti-alias prefiltering on the high-resolution
image to derive an image at the same low resolution as the input depth map. The upsample filter uses samples from both
the high-resolution and the low-resolution images in the range term of the bilateral filter. Secondly, we propose to
perform the upsampling in multiple stages, refining the resolution by a factor of 2×2 at each stage. We show
experimental results on the consequences of the aliasing issue, and we apply our method to two use cases: a high quality
ground-truth depth map and a real-time generated depth map of lower quality. For the first use case a relatively small
filter footprint is applied; the second use case benefits from a substantially larger footprint. These experiments show that
the dual image resolution range function alleviates the aliasing artifacts and therefore improves the temporal stability of
the output depth map. On both use cases, we achieved comparable or better image quality with respect to upsampling
with the joint bilateral filter in a single step. On the former use case, we feature a reduction of a factor of 5 in
computational cost, whereas on the latter use case, the cost saving is a factor of 50.
Philips is developing a product line of multi-view auto-stereoscopic 3D displays.1 For interfacing, the image-plus-depth format is used.2, 3 Being independent of specific display properties, such as number of views, view mapping on pixel grid, etc., this interface format allows optimal multi-view visualisation of content from many different sources, while maintaining interoperability between display types. A vastly growing number of productions from the entertainment industry are aiming at 3D movie theatres. These productions use a two view format, primarily intended for eye-wear assisted viewing. It has been shown4 how to convert these sequences into the image-plus-depth format. This results in a single layer depth profile, lacking information about areas that are occluded and can be revealed by the stereoscopic parallax. Recently, it has been shown how to compute for intermediate views for a stereo pair.4, 5 Unfortunately, these approaches are not compatible to the image-plus-depth format, which might hamper the applicability for broadcast 3D television.3
In video systems, the introduction of 3D video might be the next revolution after the introduction of color. Nowadays multiview
auto-stereoscopic displays are entering the market. Such displays offer various views at the same time. Depending
on its positions, the viewers' eyes see different images. Hence, the viewers' left eye receives a signal that is different from
what his right eye gets; this gives, provided the signals have been properly processed, the impression of depth.
New auto-stereoscopic products use an image-plus-depth interface. On the other hand, a growing number of 3D productions
from the entertainment industry use a stereo format. In this paper, we show how to compute depth from the
stereo signal to comply with the display interface format. Furthermore, we present a realisation suitable for a real-time
cost-effective implementation on an embedded media processor.
Philips provides autostereoscopic three-dimensional display systems that will bring the next leap in visual experience,
adding true depth to video systems. We identified three challenges specifically for 3D image processing: 1) bandwidth
and complexity of 3D images, 2) conversion of 2D to 3D content, and 3) object-based image/depth processing. We
discuss these challenges and our solutions via several examples. In conclusion, the solutions have enabled the market
introduction of several professional 3D products, and progress is made rapidly towards consumer 3DTV.
In video systems, the introduction of 3D video might be the next revolution after the introduction of color. Nowadays multiview autostereoscopic displays are in development. Such displays offer various views at the same time and the image content observed by the viewer depends upon his position with respect to the screen. His left eye receives a signal that is different from what his right eye gets; this gives, provided the signals have been properly processed, the impression of depth. The various views produced on the display differ with respect to their associated camera positions. A possible video format that is suited for rendering from different camera positions is the usual 2D format enriched with a depth related channel, e.g. for each pixel in the video not only its color is given, but also e.g. its distance to a camera. In this paper we provide a theoretical framework for the parallactic transformations which relates captured and observed depths to screen and image disparities. Moreover we present an efficient real time rendering algorithm that uses forward
mapping to reduce aliasing artefacts and that deals properly with occlusions. For improved perceived resolution, we take the relative position of the color subpixels and the optics of the lenticular screen into account. Sophisticated filtering techniques results in high quality images.