In this paper, we propose an efficient framework for edge-preserving stereo matching. Local methods for stereo matching are more suitable than global methods for real-time applications. Moreover, we can obtain accurate depth maps by using edge-preserving filter for the cost aggregation process in local stereo matching. The computational cost is high, since we must perform the filter for every number of disparity ranges if the order of the edge-preserving filter is constant time. Therefore, we propose an efficient iterative framework which propagates edge-awareness by using single time edge preserving filtering. In our framework, box filtering is used for the cost aggregation, and then the edge-preserving filtering is once used for refinement of the obtained depth map from the box aggregation. After that, we iteratively estimate a new depth map by local stereo matching which utilizes the previous result of the depth map for feedback of the matching cost. Note that the kernel size of the box filter is varied as coarse-to-fine manner at each iteration. Experimental results show that small and large areas of incorrect regions are gradually corrected. Finally, the accuracy of the depth map estimated by our framework is comparable to the state-of-the-art of stereo matching methods with global optimization methods. Moreover, the computational time of our method is faster than the optimization based method.
In this paper, we propose a generalized framework of cost volume refinement filtering for visual corresponding problems. When we estimate a visual correspondence map, e.g., depth map, optical flow, segmentation and so on, the estimated map often contains a number of noises and blurs. One of the solutions for this problem is post filtering. Edge-preserving filtering, such as joint bilateral filtering, can remove the noises, but it causes blurs on object boundaries at the same time. As an approach to remove noises without blurring, there is cost volume refinement filtering (CVRF) that is an effective solution for the refinement of such labeling of correspondence problems. There are some papers that propose several methods categorized into CVRF for various applications. These methods use various reconstructing metrics functions, which are L1 norm, L2 norm or exponential function, and various edge-preserving filters, which are joint bilateral filtering, guided image filtering and so on. In this paper, we generalize these factors and add range-spacial domain resizing factor for CVRF. Experimental results show that our generalized formulation outperform the conventional approaches, and also show what the format of CVRF is appropriate for various applications of stereo matching and optical flow estimation.
In general, free-viewpoint image is generated by captured images by a camera array aligned on a straight line or
circle. A camera array is able to capture synchronized dynamic scene. However, camera array is expensive and
requires great care to be aligned exactly. In contrast to camera array, a handy camera is easily available and can
capture a static scene easily. We propose a method that generates free-viewpoint images from a video captured by
a handheld camera in a static scene. To generate free-viewpoint images, view images from several viewpoints and
information of camera pose/positions of these viewpoints are needed. In a previous work, a checkerboard pattern
has to be captured in every frame to calculate these parameters. And in another work, a pseudo perspective
projection is assumed to estimate parameters. This assumption limits a camera movement. However, in this
paper, we can calculate these parameters by "Structure From Motion". Additionally, we propose a selection
method for reference images from many captured frames. And we propose a method that uses projective block
matching and graph-cuts algorithm with reconstructed feature points to estimate a depth map of a virtual
We are developing technologies for FTV in which the viewer can freely change the viewpoint. The free-viewpoint image
can be generated by using images captured by an static multi-camera system. However, it is hard to render an object that
moves widely in the scene. In this paper, we address this problem by proposing moving camera array and the free-viewpoint
image synthesis algorithm. In our synthesis method, we use the temporal and spatial information together, in
order to further improve the view generation quality. Experiments using a sequence captured by simulated moving multi-camera
systems demonstrate the improvement of view synthesis quality in comparison with conventional view synthesis
Ray-Space is categorized by Image-Based Rendering (IBR), thus generated views have photo-realistic quality.
While this method has the performance of high quality imaging, this needs a lot of images or cameras. The reason
why that is Ray-Space requires various direction's and position's views instead of 3D depth information. In this
paper, we reduce that flood of information using view-centered ray interpolation. View-centered interpolation
means estimating view dependent depth value (or disparity map) at generating view-point and interpolating
that of pixel values using multi-view images and depth information. The combination of depth estimation and
interpolation realizes the rendering photo-realistic images effectively. Unfortunately, however, if depth estimation
is week or mistake, a lot of artifacts appear in creating images. Thus powerful depth estimation method is
required. When we render the free viewpoint images video, we perform the depth estimation at every frame.
Thus we want to keep a lid on computing cost. Our depth estimation method is based on dynamic programming
(DP). This method optimizes and solves depth images at the weak matching area with high-speed performance.
But scan-line noises become appeared because of the limit of DP. So, we perform the DP multi-direction pass and
sum-up the result of multi-passed DPs. Our method fulfills the low computation cost and high depth estimation
We propose a technique of Imaged-Based Rendering(IBR) using a circular camera array. By the result of having recorded the scene as surrounding the surroundings, we can synthesize a more dynamic arbitrary viewpoint images and a wide angle images like a panorama . This method is based on Ray- Space, one of the image-based rendering,
like Light Field. Ray-Space is described by the position (x, y) and a direction (θ, φ) of the ray's parameter which passes a reference plane. All over this space, when the camera has been arranged circularly, the orbit of the point equivalent to an Epipor Plane Image(EPI) at the time of straight line arrangement draws a sin curve. Although described in a very clear form, in case a rendering is performed, pixel of which position of which camera being used and the work for which it asks become complicated. Therefore, the position (u, v) of the position (s, t) pixel of a camera like Light Filed redescribes space expression. It makes the position of a camera a polar-coordinates system (r, theta), and is making it close to description of Ray-Space. Thereby, although the orbit of a point
serves as a complicated periodic function of periodic 2pi, the handling of a rendering becomes easy. From such space, the same as straight line arrangement, arbitrary viewpoint picture synthesizing is performed only due to a geometric relationship between cameras. Moreover, taking advantage of the characteristic of concentrating
on one circular point, we propose the technique of generating a wide-angle picture like a panorama. When synthesizing a viewpoint, since it is overlapped and is recording the ray of all the directions of the same position, this becomes possible. Having stated until now is the case where it is a time of the camera fully having been
arranged and a plenoptic sampling being filled. The discrete thing which does not fill a sampling is described from here. When arranging a camera in a straight line and compounding a picture, in spite of assuming the pinhole camera model, an effect like a focus shows up. This is an effect peculiar to Light Field when a sampling is not
fully performed, and is called a synthetic aperture. We have compounded all focal images by processing called an "Adaptive Filter" to such a phenomenon. An adaptive filter is the method of making the parallax difference map of perfect viewpoint dependence centering on a viewpoint to make. This is a phenomenon produced even when it has arranged circularly. Then, in circular camera arrangement, this adaptive filter is extended, and all focal pictures are compounded. Although there is a problem that an epipor line is not parallel etc. when it has arranged circularly, extension obtains enough, it comes only out of geometric information, and a certain thing is clarified By taking such a method, it succeeded in performing a wide angle and arbitrary viewpoint image synthesis also from discrete space also from the fully sampled space.
We propose a 3D live video system that generates arbitrary viewpoints in real-time based on the ray-space, one of the image-based rendering. With this system, a remote user can freely change the viewpoint, not only according to the captured camera position, but also can synthesize views where a camera is not physical present using the ray-space interpolation. The basic idea of ray-space rendering is collecting and rearranging the partition of simultaneously captured images according to an arbitrarily specified virtual-view. If hundreds of cameras were arranged in significant density, synthesizing a free viewpoint away from the camera baseline require only camera geometric information. Since we cannot obtain such full information of ray according to plenoptic sampling, arbitrarily view generation necessitate interpolation of slightly missed rays. However, such view interpolation's cost is particularly huge. Therefore, we introduce three novel techniques of view interpolation: first, view centered interpolation framework, second, estimating disparity with smoothing, third, hierarchical searching of correspondences for fast computation. Moreover, we implement the experimental system with those algorithms. This free-view generating system includes sixteen cameras arranged straightforward. All cameras are connected with the consumer computers one by one. Whole the computers connect a server computer via Ethernet categorized star network. This system carries out four processes in real time: capture images, correct position of cameras with projective transformation, interpolate images on baseline, rendering arbitrary viewpoint.
The experimental result shows that this system is rendering arbitrary viewpoint at 12fps (frames per second) set image resolution set to "320x240". We succeeded in synthesizing highly photo-realistic images.