Disparity estimation has been extensively investigated in recent years. Though several algorithms have been reported to achieve excellent performance on the Middlebury website, few of them reach a satisfying balance between accuracy and efficiency, and few of them consider the problem of temporal coherence. In this paper, we introduce a novel disparity estimation approach, which improves the accuracy for static images and the temporal coherence for videos. For static images, the proposed approach is inspired by the adaptive support weight method proposed by Yoon et al. and the dual-cross-bilateral grid introduced by Richardt et al. Principal component analysis (PCA) is used to reduce the color dimensionality in the cost aggregation step. This simple, but efficient technique helps the proposed method to be comparable to the best local algorithms on the Middlebury website, while still allowing real-time implementation. A computationally efficient method for temporally consistent behavior is also proposed. Moreover, in the user evaluation experiment, the proposed temporal approach achieves the best overall user experience among the selected comparison algorithms.
We introduce supervised disparity estimation in which an operator can steer the disparity estimation process.
Instead of correcting errors, we view the estimation process as a constrained process where the constraints are
indicated by the user in the form of control points, scribbles and contours. Control points are used to obtain
accurate disparity estimates that can be fully controlled by the operator. Scribbles are used to force regions
to have a smooth disparity, while contours create a disparity discontinuity in places where diffusion or energy
minimization fail. Control points, scribbles and contours are propagated through the video sequence to create
temporally stable results.
A vastly growing number of productions from the entertainment industry are aiming at 3D movie theaters. These productions
use a two-view format, primarily intended for eye-wear assisted viewing in a well defined environment. To get this
3D content into the home environment, where a large variety of 3D viewing conditions exists (e.g. different display sizes,
display types, viewing distances), we need a flexible 3D format that can adjust the depth effect. This can be provided by
the image plus depth format, in which a video frame is enriched with depth information for all pixels in the video frame.
This format can be extended with additional layers, such as an occlusion layer or a transparency layer. The occlusion layer
contains information on the data that is behind objects, and is also referred to as occluded video. The transparency layer, on
the other hand, contains information on the opacity of the foreground layer. This allows rendering of semi-transparencies
such as haze, smoke, windows, etc., as well as transitions from foreground to background. These additional layers are only
beneficial if the quality of the depth information is high. High quality depth information can currently only be achieved
with user assistance. In this paper, we discuss an interactive method for depth map enhancement that allows adjustments
during the propagation over time. Furthermore, we will elaborate on the automatic generation of the transparency layer,
using the depth maps generated with an interactive depth map generation tool.
From image retrieval to image classification, all research shares one common requirement: a good image database to test or train the algorithms. In order to create a large database of images, we set up a project that allowed gathering a collection of more than 33000 photographs with keywords and tags from all over the world. This project was part of the "We Are All Photographers Now!" exhibition at the Musee de l'Elysee in Lausanne, Switzerland. The "Flux," as it was called, gave all photographers, professional or amateur, the opportunity to have their images shown in the museum. Anyone could upload pictures on a website. We required that some simple tags were filled in. Keywords were optional. The information was collected in a MySQL database along with the original photos. The pictures were projected at the museum in five second intervals. A webcam snapshot was taken and sent back to the photographers via email to show how and when their image was displayed at the museum.
During the 14 weeks of the exhibition, we collected more than 33000 JPEG pictures with tags and keywords. These pictures come from 133 countries and were taken by 9042 different photographers. This database can be used for non-commercial research at EPFL. We present some preliminary analysis here.
If multiple images of a scene are available instead of a single image, we can use the additional information
conveyed by the set of images to generate a higher quality image. This can be done along multiple dimensions.
Super-resolution algorithms use a set of shifted and rotated low resolution images to create a high resolution
image. High dynamic range imaging techniques combine images with different exposure times to generate an
image with a higher dynamic range. In this paper, we present a novel method to combine both techniques and
construct a high resolution, high dynamic range image from a set of shifted images with varying exposure times.
We first estimate the camera response function, and convert each of the input images to an exposure invariant
space. Next, we estimate the motion between the input images. Finally, we reconstruct a high resolution, high
dynamic range image using an interpolation from the non-uniformly sampled pixels. Applications of such an
approach can be found in various domains, such as surveillance cameras, consumer digital cameras, etc.
We present a new algorithm that performs demosaicing and super-resolution jointly from a set of raw images
sampled with a color filter array. Such a combined approach allows us to compute the alignment parameters between the images on the raw camera data before interpolation artifacts are introduced. After image registration, a high resolution color image is reconstructed at once using the full set of images. For this, we use normalized
convolution, an image interpolation method from a nonuniform set of samples. Our algorithm is tested and
compared to other approaches in simulations and practical experiments.
Super-resolution imaging techniques reconstruct a high resolution image from a set of low resolution images that are taken from almost the same point of view. The problem can be subdivided into two main parts: an image registration part where the different input images are aligned with each other, and a reconstruction part, where the high resolution image is reconstructed from the aligned images. In this paper, we mainly consider the first step: image registration. We present three frequency domain methods to accurately align a set of undersampled images. First, we describe a registration method for images that have an aliasing-free part in their spectrum. The images are then registered using that aliasing-free part. Next, we present two subspace methods to register completely aliased images. Arbitrary undersampling factors are possible with these methods, but they have an increased computational complexity. In all three methods, we only consider planar shifts. We also show the results of these three algorithms in simulations and practical experiments.
In this paper, we present a super-resolution method to approximately double image resolution in both dimensions from
a set of four low resolution, aliased images. The camera is shifted and rotated by small amounts between the different
image captures. Only the low frequency, aliasing-free part of the images is used to find the shift and
When the images are registered, it is possible to reconstruct a higher resolution, aliasing-free image from the four
low resolution images using cubic interpolation. We applied our algorithm in a simulation, where all
parameters are known and controlled, as well as in a practical experiment using images taken with a real digital camera.
The results obtained in both tests prove the validity of our method.
In this paper, we present a simple method to almost quadruple the spatial resolution of aliased images. From a set of four low resolution, undersampled and shifted images, a new image is constructed with almost twice the resolution in each dimension. The resulting image is aliasing-free. A small aliasing-free part of the frequency domain of the images is used to compute the exact subpixel shifts. When the relative image positions are known, a higher resolution image can be constructed using the Papoulis-Gerchberg algorithm. The proposed method is tested in a simulation where all simulation parameters are well controlled, and where the resulting image can be compared with its original. The algorithm is also applied to real, noisy images from a digital camera. Both
experiments show very good results.