In this paper, we present a free viewpoint video generation system with billboard representation for soccer games. Free viewpoint video generation is a technology that enables users to watch 3-D objects from their desired viewpoints. Practical implementation of free viewpoint video for sports events is highly demanded. However, a commercially acceptable system has not yet been developed. The main obstacles are insufficient user-end quality of the synthesized images and highly complex procedures that sometimes require manual operations. In this work, we aim to develop a commercially acceptable free viewpoint video system with a billboard representation. A supposed scenario is that soccer games during the day can be broadcasted in 3-D, even in the evening of the same day. Our work is still ongoing. However, we have already developed several techniques to support our goal. First, we captured an actual soccer game at an official stadium where we used 20 full-HD professional cameras. Second, we have implemented several tools for free viewpoint video generation as follow. In order to facilitate free viewpoint video generation, all cameras should be calibrated. We calibrated all cameras using checker board images and feature points on the field (cross points of the soccer field lines). We extract each player region from captured images manually. The background region is estimated by observing chrominance changes of each pixel in temporal domain (automatically). Additionally, we have developed a user interface for visualizing free viewpoint video generation using a graphic library (OpenGL), which is suitable for not only commercialized TV sets but also devices such as smartphones. However, practical system has not yet been completed and our study is still ongoing.
This paper presents a method of virtual view synthesis using view plus depth data from multiple viewpoints. Intuitively,
virtual view generation from those data can be easily achieved by simple 3D warping. However, 3D points reconstructed
from those data are isolated, i.e. not connected with each other. Consequently, the images generated by existing methods
have many holes that are very annoying due to occlusions and the limited sampling density. To tackle this problem, we
propose two steps algorithm as follows. In the first step, view plus depth data from each viewpoint is 3D warped to the
virtual viewpoint. In this process, we determine which neighboring pixels should be connected or kept isolated. For this
determination, we use depth differences among neighboring pixels, and SLIC-based superpixel segmentation that
considers both color and depth information. The pixel pairs that have small depth differences or reside in same
superpixels are connected, and the polygons enclosed by the connected pixels are inpainted, which greatly reduces the
holes. This warping process is performed individually for each viewpoint from which view plus depth data are provided,
resulting in several images at the virtual viewpoint that are warped from different viewpoints. In the second step, we
merge those warped images to obtain the final result. Thanks to the data provided from different viewpoints, the final
result has less noises and holes compared to the result from single viewpoint information. Experimental results using
publicly available view plus depth data are reported to validate our method.