FTV (Free-viewpoint Television) is the ultimate 3DTV with an infinite number of views and ranks as the top of visual media. It enables users to view 3D scenes by freely changing the viewpoint. MPEG has been developing FTV standards since 2001. MVC (Multiview Video Coding) is the first phase of FTV, which enables efficient coding of multiview video. 3DV (3D Video) is the second phase of FTV, which enables the efficient coding of multiview video and depth data for multiview displays. Views in between linearly arranged cameras are synthesized from the multiview video and depth data in 3DV. Based on recent development of 3D technology, MPEG has started the third phase of FTV, targeting super multiview and free navigation applications. This new FTV standardization will achieve more flexible camera arrangement, more efficient coding and new functionality. Users can enjoy very realistic 3D viewing and walkthrough/ fly-through experience of 3D scenes in the super multiview and free navigation applications of FTV.
FTV (Free-viewpoint Television) is 3DTV with infinite number of views and ranked as the top of visual media. It
enables to view a 3D world by freely changing the viewpoint. MPEG has been promoting the international
standardization of FTV since 2001. The first phase of FTV is MVC (Multi-view Video Coding) and the second phase of
FTV is 3DV (3D Video). MVC completed in 2009 encodes multiple camera views efficiently and has been adopted by
Blu-ray 3D. 3DV is a standard that targets serving a variety of 3D displays and is currently in progress. 3DV employs
MVD (Multi-View and Depth) for data format. MVD is a set of views and depths at various viewpoints. 3DV sends
MVD data at a few viewpoints and synthesizes many views at other viewpoints to be displayed on various types of
multi-view displays at the receiver side. The 3DV activity moved to the Joint Collaborative Team on 3D Video Coding
Extension Development (JCT-3V) of MPEG and ITU in July 2012. We propose GVD (Global View and Depth) as an
alternative data format. GVD consists of base view, base depth, residual views and residual depths. GVD is a compact
3D expression compared to MVD since redundancy of MVD is removed in GVD. GVD has been accepted in JCT-3V.
In this paper, we discuss a Free viewpoint synthesis with View + Depth format for Multiview applications such as 3DTV and Free View-point Television(FTV)1.When generating a virtual image, 3D warping is used with view and depth of a reference camera. This process includes the problem that holes appear in the virtual image. In conventional method, the holes were dealt with collectively by median filter. There are some different reasons why holes appear through this process. So, it is improper that they are not distinguished particularly and treated all at once like conventional method. We analyze the factors, and recognize that two ones exist, boundary between foreground and background, and reduction of resolution. In this paper, we propose a new hole filling method considering these factors. In the first step, we classify nearby pixels into boundary or same object area according to the gradient of depth value. For boundary case, we hold them and refer to other two real cameras. For another case of same object area, we set up sub-pixels between nearby pixels and warp them if the depth is gradually changing or virtual viewpoint of the warped image is closer to the object than the original view position because they probably cause holes from reduction of resolution. We implement these methods in the simulation. As a result, we prevent boundary in the virtual image from being ambiguous, and confirm the availability of proposed method.
In this paper, we discuss a multiview video and depth coding system for Multiview video applications such as 3DTV
and Free View-point Television (FTV) 1. We target an appropriate multiview and depth compression method. And then
we investigate the effect on free view synthesis quality by changing the transmission rates between multiview and depth
sequences. In the simulations, we employ MVC in parallel to compress the multiview video and depth sequences at
different bitrates, and compare the virtual view sequences generated by decoded data with the original video sequences
taken in the same viewpoint. Our experimental results show that bitrates of multi depth stream has less effect on the view
synthesis quality compared with the multi view stream.
Visible Light Communication (VLC) is a wireless communication method using LEDs. LEDs can respond in high-speed
and VLC uses this characteristics. In VLC researches, there are two types of receivers mainly, one is photodiode receiver
and the other is high-speed camera. A photodiode receiver can communicate in high-speed and has high transmission
rate because of its high-speed response. A high-speed camera can detect and track the transmitter easily because it is not
necessary to move the camera. In this paper, we use a hybrid sensor designed for VLC which has advantages of both
photodiode and high-speed camera, that is, high transmission rate and easy detecting of the transmitter. The light
receiving section of the hybrid sensor consists of communication pixels and video pixels, which realizes the advantages.
This hybrid sensor can communicate in static environment in previous research. However in dynamic environment, high-speed
tracking of the transmitter is essential for communication. So, we realize the high-speed tracking of the transmitter
by using the information of the communication pixels. Experimental results show the possibility of communication in
In general, free-viewpoint image is generated by captured images by a camera array aligned on a straight line or
circle. A camera array is able to capture synchronized dynamic scene. However, camera array is expensive and
requires great care to be aligned exactly. In contrast to camera array, a handy camera is easily available and can
capture a static scene easily. We propose a method that generates free-viewpoint images from a video captured by
a handheld camera in a static scene. To generate free-viewpoint images, view images from several viewpoints and
information of camera pose/positions of these viewpoints are needed. In a previous work, a checkerboard pattern
has to be captured in every frame to calculate these parameters. And in another work, a pseudo perspective
projection is assumed to estimate parameters. This assumption limits a camera movement. However, in this
paper, we can calculate these parameters by "Structure From Motion". Additionally, we propose a selection
method for reference images from many captured frames. And we propose a method that uses projective block
matching and graph-cuts algorithm with reconstructed feature points to estimate a depth map of a virtual
We are developing technologies for FTV in which the viewer can freely change the viewpoint. The free-viewpoint image
can be generated by using images captured by an static multi-camera system. However, it is hard to render an object that
moves widely in the scene. In this paper, we address this problem by proposing moving camera array and the free-viewpoint
image synthesis algorithm. In our synthesis method, we use the temporal and spatial information together, in
order to further improve the view generation quality. Experiments using a sequence captured by simulated moving multi-camera
systems demonstrate the improvement of view synthesis quality in comparison with conventional view synthesis
We have developed a new type of television named FTV (Free-viewpoint TV). FTV is the ultimate 3DTV that enables us
to view a 3D scene by freely changing our viewpoints. We proposed the concept of FTV and constructed the world's
first real-time system including the complete chain of operation from image capture to display. FTV is based on the rayspace
method that represents one ray in real space with one point in the ray-space. We have developed ray capture,
processing and display technologies for FTV. FTV can be carried out today in real time on a single PC or on a mobile
player. We also realized FTV with free listening-point audio. The international standardization of FTV has been
conducted in MPEG. The first phase of FTV was MVC (Multi-view Video Coding) and the second phase is 3DV (3D
Video). MVC was completed in May 2009. The Blu-ray 3D specification has adopted MVC for compression. 3DV is a
standard that targets serving a variety of 3D displays. The view generation function of FTV is used to decouple capture
and display in 3DV. FDU (FTV Data Unit) is proposed as a data format for 3DV. FTU can compensate errors of the
synthesized views caused by depth error.
In this paper, we present a new image acquisition system for FTV (Free-viewpoint TV). The proposed system can
capture the dynamic scene from all-around views. The proposed system consists of two ellipsoidal mirrors, a high-speed
camera, and a rotating aslope mirror. As for two ellipsoidal mirrors, the size and the ellipticity are mutually different.
The object is set in the focus of ellipsoidal mirror. The size of this system is smaller than that of early system since
ellipsoidal mirrors can reduce virtual images. High-speed camera can acquire multi viewpoint images by mirror
scanning. Here, we simulated this system with ray tracing and confirmed the principle.
In this paper, we propose a method for high efficiency acquisition of Ray-Space for FTV (Free viewpoint TV). In this
research, incomplete data is directly captured by a novel device, i.e. photodiode/lens array, and transformed to full
information by Radon transform. We must capture the large amount of data in conventional acquisition of Ray-Space
using multiple cameras. However Ray-space has redundancy because it consists of set of lines which depend on depth of
objects. We use the Radon transform to exploit this redundancy. The Radon transform is set of projection data along
different directions. Thus Ray-space can be reconstructed from projection data in limited range by the inverse Radon
transform. Capturing the part of projection data correspond to capturing sums of several rays by 1 pixel. We have
simulated reconstruction of Ray-space projection data which was computed by computer simulation of capturing device.
As a result, by using fewer pixels than rays, we could reduce the information to reconstruct Ray-space.
The availability of multi-view images of a scene makes possible new and exciting applications, including Free-viewpoint
TV (FTV). FTV allows us to change viewpoint freely in a 3D world, where the virtual viewpoint images are synthesized
by Depth-Image-Based Rendering (DIBR). In this paper, we propose a new method of DIBR using multi-view images
acquired in a linear camera arrangement. The proposed method improves virtual viewpoint images by predicting the
residual errors. For virtual viewpoint image synthesis, it is necessary to estimate the depth maps with multi-view images.
Some algorithms to estimate depth map were proposed, but it is difficult to estimate accurate depth map. As a result,
rendered virtual viewpoint images have some errors due to the depth errors. Therefore, our proposed method takes into
account those depth errors and improves the quality of the rendered virtual viewpoint images. In the proposed method,
the virtual images of each camera position are generated using the real images from each other camera. Then, the
residual errors can be calculated between the generated images and the real images acquired by the actual cameras. The
residual errors are processed and fed back to predict the residual errors that can be happened to virtual viewpoint images
generated by conventional method. In the experiments, PSNR could be improved for few decibels compared with the
A novel 360-degree 3D image acquisition system that captures multi-view images with narrow view interval is proposed.
The system consists of a scanning optics system and a high-speed camera. The scanning optics system is composed of a
double-parabolic mirror shell and a rotating flat mirror tilted at 45 degrees to the horizontal plane. The mirror shell
produces a real image of an object that is placed at the bottom of the shell. The mirror shell is modified from usual
system which is used as 3D illusion toy so that the real image can be captured from right horizontal viewing direction.
The rotating mirror in the real image reflects the image to the camera-axis direction. The reflected image observed from
the camera varies according to the angle of the rotating mirror. This means that the camera can capture the object from
various viewing directions that are determined by the angle of the rotating mirror. To acquire the time-varying reflected
images, we use a high-speed camera that is synchronized with the angle of the rotating mirror. We have used a high-speed
camera which resolution is 256×256 and the maximum frame rate is 10000fps at the resolution. Rotating speed of
the tilted flat mirror is about 27 rev./sec. The number of views is 360. The focus length of parabolic mirrors is 73mm and
diameter is 360mm. Objects which length is less than about 30mm can be acquired. Captured images are compensated
rotation and distortion caused by double-parabolic mirror system, and reproduced as 3D moving images by Seelinder
We have developed a new type of television named FTV (Free-viewpoint TV). FTV is an innovative visual media that
enables us to view a 3D scene by freely changing our viewpoints. We proposed the concept of FTV and constructed the
world's first real-time system including the complete chain of operation from image capture to display. We also realized
FTV on a single PC and FTV with free listening-point audio. FTV is based on the ray-space method that represents one
ray in real space with one point in the ray-space. We have also developed new type of ray capture and display
technologies such as a 360-degree mirror-scan ray capturing system and a 360 degree ray-reproducing display. MPEG
regarded FTV as the most challenging 3D media and started the international standardization activities of FTV. The first
phase of FTV is MVC (Multi-view Video Coding) and the second phase is 3DV (3D Video). MVC was completed in
March 2009. 3DV is a standard that targets serving a variety of 3D displays. It will be completed within the next two
In this paper, we propose a method for compressive acquisition of Ray-Space. Briefly speaking, incomplete data which
directly captured by a specific device is transformed to full information by Radon transform. Ray-Space, which
represents 3D images, describes position and direction of rays on reference plane in real space. Ray-Space has
information of many rays. In conventional acquisition of Ray-Space, multiple cameras are used and 1 pixel on a camera
captures 1 ray. Thus we need many pixels and we must capture the large amount of data. However Ray-Space has
redundancy because Ray-Space consists of set of lines which depend on the depth of objects. We use the Radon
transform to exploit this redundancy. The Radon transform is set of projection data along different directions. The Radon
transform of Ray-Space show uneven distribution. Thus Ray-Space can be reconstructed from projection data in limited
range by the inverse Radon transform. Capturing the part of projection data correspond to capturing sums of several rays
by 1 pixel. A sum of several rays means a sum of brightness of rays. In this paper, we have simulated reconstruction of
Ray-Space projection data which was computed by the Radon Transform of Ray-Space. This experiment showed that
Ray-Space could be reconstructed from the parts of projection data. As a result, using fewer pixels than rays, we could
reduce the amount of data to reconstruct Ray-Space.
This paper presents a novel 3D display using a new principle which has the features of both Integral Imaging (II) and volumetric display. The display we propose consists of two lens arrays, a convex lens array and a concave lens array, and one 2D display moving back and forth. The two lens arrays are placed between the 2D display and observer. The concave lens array forms elemental images, and the convex lens array and the formed elemental images reproduce a depth division image like the II method. When the observer watches the 2D display through the two lens arrays, he feels that the image displayed by the 2D display is reproduced not at the position of 2D display but at a certain depth according to the position of the 2D display. So when the 2D display is moved, the reproduced image also moves to another depth position. Therefore various depth images can be reproduced by the movement of the 2D display. This is how the proposed display reconstructs 3D space. This time we introduce the optics system which can reconstruct a wireframe cube by oscillating the 2D display only a few centimeters. We also show the result of simulation of the proposing display with a ray tracing method to confirm the moving parallax.
This paper presents a novel 3D display using a new principle which has the features of both Integral Imaging (II) and
volumetric display. The proposed display consists of one 2D display and two lens arrays, a convex lens array and a
concave lens array. The two lens arrays are placed between the 2D display and the observer. When the observer watches
the 2D display through the two lens arrays, he feels that the image displayed by 2D display is reproduced at the position
which is different from the position of the 2D display. Furthermore, by changing the position of the 2D display, the
image is reproduced at the different position than before. Therefore the various depth images are reproduced by moving
2D display. This is how the proposed display reconstructs 3D space. Here, we simulated this display with ray tracing and
checked its validity.
In this paper, we introduce a new Ray-Space acquisition system that we developed. The Ray-Space method records the
position and direction of rays that are transmitted in the space as ray data. The composition of arbitrary viewpoint images
using the Ray-Space method enables the generation of realistic arbitrary viewpoint picture. However, acquisition of a
dense Ray-Space is necessary to apply the Ray-Space method. The conventional method of acquiring the ray data uses a
camera array. This method enables capturing a dynamic scene. To acquire a dense Ray-Space by this method, however,
interpolation is necessary. There is another common method for ray data acquisition, which uses a rotating stage. This
method enables capturing images without requiring interpolation. However, only static scenes can be captured by this
method. Therefore, we developed a new Ray-Space acquisition system. This system uses two parabolic mirrors. Incident
rays that are parallel to the axis of a parabolic mirror gather at the focus of the parabolic mirror. Hence, rays that come out
of an object that is placed at the focus of the lower parabolic mirror gather at the focus of the upper parabolic mirror. Then,
the real image of the object is generated at the focus of the upper parabolic mirror, and a rotating aslope mirror scans rays
at the focus of the upper parabolic mirror. Finally, the image from the aslope mirror is captured by a camera. By using this
system, we were able to acquire an all-around image of an object.
The 3D display using light beam reconstruction method has some great advantages. Special glasses are not needed. The observation point is not fixed. Some researcher claims that a viewer may be able to focus on 3D images under the super multi-view condition. However, the 3D display needs to reconstruct a great number of light beams. Usually, the number of light beams is limited by the resolution of a flat-panel display because only the space-division method is used. Therefore, improving the performance of the flat-panel display as a 3D display is difficult. Thus, using the time-multiplexing method is important.
In this paper, we discussed the 3D display using light beam reconstruction method that uses a fast light shutter as the 3D display with the time-multiplexing method. We consider the relationsip between the performance of the 3D display and that of the devices that comprise the 3D display. The simulation results of the super multi-view condition suggest that the number of light beams that enter the pupil of the viewer's eye and the width of the slit are important for the accommodation function.
In this paper, we analyze the distortion of images acquired with a novel Ray-Space acquisition system. In case
an arbitrary viewpoint picture is generated using the Ray-Space method, it is necessary to acquire dense ray
data. Conventional methods for acquiring the Ray-Space data consist of using rotating stages or a camera
array. We developed a system consisting of two parabolic mirrors, a synchronized galvanometric mirror and
a high-speed camera. The principle is as follows; if an object is put in the bottom of the parabolic mirror,
the ray which comes out of the object is imaged in the upper part, and form a real image. The galvanometer
mirror is put on the position of a real image, and is made to scan horizontally. Images of the object of different
angles (directions) are then possible to generate and are captured by the high-speed camera. By capturing
many images at each scan, Ray-Space is therefore acquirable. However, distortion arises in the real image
of the object formed. Consequently, distortion appears in the captured image. Therefore, it is necessary to
correct the captured image to the right image. Here, we examine a method to generate corrected images from
the acquired Ray-Space.
Ray-Space is categorized by Image-Based Rendering (IBR), thus generated views have photo-realistic quality.
While this method has the performance of high quality imaging, this needs a lot of images or cameras. The reason
why that is Ray-Space requires various direction's and position's views instead of 3D depth information. In this
paper, we reduce that flood of information using view-centered ray interpolation. View-centered interpolation
means estimating view dependent depth value (or disparity map) at generating view-point and interpolating
that of pixel values using multi-view images and depth information. The combination of depth estimation and
interpolation realizes the rendering photo-realistic images effectively. Unfortunately, however, if depth estimation
is week or mistake, a lot of artifacts appear in creating images. Thus powerful depth estimation method is
required. When we render the free viewpoint images video, we perform the depth estimation at every frame.
Thus we want to keep a lid on computing cost. Our depth estimation method is based on dynamic programming
(DP). This method optimizes and solves depth images at the weak matching area with high-speed performance.
But scan-line noises become appeared because of the limit of DP. So, we perform the DP multi-direction pass and
sum-up the result of multi-passed DPs. Our method fulfills the low computation cost and high depth estimation
We propose a technique of Imaged-Based Rendering(IBR) using a circular camera array. By the result of having recorded the scene as surrounding the surroundings, we can synthesize a more dynamic arbitrary viewpoint images and a wide angle images like a panorama . This method is based on Ray- Space, one of the image-based rendering,
like Light Field. Ray-Space is described by the position (x, y) and a direction (θ, φ) of the ray's parameter which passes a reference plane. All over this space, when the camera has been arranged circularly, the orbit of the point equivalent to an Epipor Plane Image(EPI) at the time of straight line arrangement draws a sin curve. Although described in a very clear form, in case a rendering is performed, pixel of which position of which camera being used and the work for which it asks become complicated. Therefore, the position (u, v) of the position (s, t) pixel of a camera like Light Filed redescribes space expression. It makes the position of a camera a polar-coordinates system (r, theta), and is making it close to description of Ray-Space. Thereby, although the orbit of a point
serves as a complicated periodic function of periodic 2pi, the handling of a rendering becomes easy. From such space, the same as straight line arrangement, arbitrary viewpoint picture synthesizing is performed only due to a geometric relationship between cameras. Moreover, taking advantage of the characteristic of concentrating
on one circular point, we propose the technique of generating a wide-angle picture like a panorama. When synthesizing a viewpoint, since it is overlapped and is recording the ray of all the directions of the same position, this becomes possible. Having stated until now is the case where it is a time of the camera fully having been
arranged and a plenoptic sampling being filled. The discrete thing which does not fill a sampling is described from here. When arranging a camera in a straight line and compounding a picture, in spite of assuming the pinhole camera model, an effect like a focus shows up. This is an effect peculiar to Light Field when a sampling is not
fully performed, and is called a synthetic aperture. We have compounded all focal images by processing called an "Adaptive Filter" to such a phenomenon. An adaptive filter is the method of making the parallax difference map of perfect viewpoint dependence centering on a viewpoint to make. This is a phenomenon produced even when it has arranged circularly. Then, in circular camera arrangement, this adaptive filter is extended, and all focal pictures are compounded. Although there is a problem that an epipor line is not parallel etc. when it has arranged circularly, extension obtains enough, it comes only out of geometric information, and a certain thing is clarified By taking such a method, it succeeded in performing a wide angle and arbitrary viewpoint image synthesis also from discrete space also from the fully sampled space.
A ray-based cylindrical display is proposed that allows multiple viewers to see 3D images from a 360-degree horizontal arc without wearing 3-D glasses. This technique uses a cylindrical parallax barrier and a one-dimensional light source array constructed from such semiconductor light sources as LEDs aligned in a vertical line. The light source array rotates along the inside of the cylindrical parallax barrier, and the intensity of each light is synchronously modulated with the rotation.
Since this technique is based on the parallax panoramagram, the density of rays is limited by the diffraction at the parallax barrier. In order to solve this problem, we employed revolving parallax barrier. We have developed two protype displays and they showed high presence 3D image. Especially the newer one is capable of displaying color images whose diameter is 200mm, it is suitable for displaying real object like a human head.
Therefore we acquired ray-space data using a video camera rotating around an object and reconstructed the object using the prototype display successfully. In this paper, we describe details of the system and discuss about ray control method to reconstruct object from ray-space data.
We propose a 3D live video system that generates arbitrary viewpoints in real-time based on the ray-space, one of the image-based rendering. With this system, a remote user can freely change the viewpoint, not only according to the captured camera position, but also can synthesize views where a camera is not physical present using the ray-space interpolation. The basic idea of ray-space rendering is collecting and rearranging the partition of simultaneously captured images according to an arbitrarily specified virtual-view. If hundreds of cameras were arranged in significant density, synthesizing a free viewpoint away from the camera baseline require only camera geometric information. Since we cannot obtain such full information of ray according to plenoptic sampling, arbitrarily view generation necessitate interpolation of slightly missed rays. However, such view interpolation's cost is particularly huge. Therefore, we introduce three novel techniques of view interpolation: first, view centered interpolation framework, second, estimating disparity with smoothing, third, hierarchical searching of correspondences for fast computation. Moreover, we implement the experimental system with those algorithms. This free-view generating system includes sixteen cameras arranged straightforward. All cameras are connected with the consumer computers one by one. Whole the computers connect a server computer via Ethernet categorized star network. This system carries out four processes in real time: capture images, correct position of cameras with projective transformation, interpolate images on baseline, rendering arbitrary viewpoint.
The experimental result shows that this system is rendering arbitrary viewpoint at 12fps (frames per second) set image resolution set to "320x240". We succeeded in synthesizing highly photo-realistic images.
This paper proposes a novel multiple-image coding technique using Ray-Space interpolation. Ray-Space, an image-based rendering technique to generate arbitrary views from multiple cameras, describes three- dimensional space based on only ray information from a large number of cameras. Therefore, data compression is needed. We leverage the correlation of time and space aiming for high compression. H.264/AVC is employed for dynamic image coding, and studies have been conducted on using the AVC in time domain. Here we propose a novel algorithm that uses view-interpolation for coding in space domain. Interpolation is a method to generate the middle view in a stereoscopic setup. By generating interpolated images from coded images as reference ones, coding performance should give better results. Therefore, interpolation accuracy is important for coding performance. In this paper, we propose an interpolation technique using geometric information in a linear camera arrangement. By calculating the trace of each point considering camera arrangement, and obtaining its corresponding point, the middle image is generated. In so doing, the interpolation method is an intensity-based scheme, constrained by smoothness in disparity domain. Experiment of coding using interpolation outperforms the standard AVC by 1~2 dB in all bitrates. Moreover, we deal with occlusion regions by means of extrapolation using four images. To detect occlusion regions, we use two criteria, one is minimum error, second is ratio of minimum error between four images. In occlusion region, the intensity of middle image is generated using extrapolated images. This method gives up to 1~3 dB improvement compared to occlusion-ignored algorithm.
In a multiple camera system that consists of a large number of cameras, each camera has to be calibrated in order to use the image information obtained from them effectively. When a target scene is large, the conventional calibration methods using 3D or 2D object are difficult to apply because setting these objects is an elaborate task. Although another approach called self-calibration using only image point correspondences seems to be suitable in such a situation, this method is often susceptible to noise. In this paper, we propose a new camera calibration method for such systems using 1D object, which has three points on a line with known distances of each other. The main reason for using 1D object as a calibration object is because it is more flexible than 3D or 2D object in a large scene. By using the free-moving 1D object without knowledge about its position and only one calibrated camera, we can calibrate multiple cameras simultaneously, so the proposed method presents an easy and practical solution. Experimental results of computer-simulated data are shown in this paper. In the presentation, experimental results of real image data will be presented.
This paper proposes novel Ray-Space acquisition systems that capture dynamic dense Ray-Space in video rate. Most of the previous works on Ray-Space acquisition system targeted “static” Ray-Space because of the difficulty of dealing with dynamic dense Ray-Space. In this paper, we investigate two types of real-time Ray-Space acquisition systems. One uses multiple video cameras. We developed a 16-camera setup and a capturing system using a PC cluster. Interpolation of Ray-Space is introduced to generate dense Ray-Space from sparsely placed video cameras. Another acquisition system captures a “real” dense Ray-Space without interpolation. Using a synchronized galvanometric mirror and a high-speed camera, we succeeded to capture more than 100 view images in 1/30 second. We developed a special hardware that receives digital high-speed camera signal and outputs an arbitrary viewpoint image based on Ray-Space method.
We have been developing a new television system named Free Viewpoint Television (FTV) that can generate free views. We propose a new user interface of FTV using an automultiscopic display (multi-viewing autostereoscopic display) and a head tracking system. We made a head tracking system which used cameras and didn't need to attach any sensors to a person. We succeeded to extend the viewing zone of an automultiscopic display. It brought us to interface with FTV naturally.
Considering nodes energy and channel bandwidth limitations in a multiview images network, avoiding inter-node communication in the coding scheme is necessary and makes the communication efficient, in comparison with conventional coding methods with internode communication. In our system, we consider a multiview images network as an array of nodes on a line with the same distance to each other. Each node in this system includes a camera to capture with a limited processing and communication abilities. We propose a multiview images coding without requiring inter-node communication, to gain the advantage of correlation at decoder side for two network configurations, but the main concept of the proposed coding for both network configurations is the same. The two network configurations are distinguished based on presence of parent node in each cluster, which sends full information to central node (i.e. joint decoder). The other nodes called children nodes and send their partial information to central node. In this paper, a coding scheme is proposed for multiview images network with parent node based on the number of parent nodes (i.e. “one parent node cluster” and “two parent node cluster”) in each cluster. Not that, because we are involved with multiview images, the decoding procedure is searching for correspondence between partially sent multiview images data. In coding scheme with parent node, if each parent node fails the decoding task of children nodes will be failed or the error caused by corresponding search during the decoding of children nodes is increased if network uses other parent nodes to decode the cluster without parent node. To make the network robust in case of node failure, we avoid parent nodes, and developed a network configuration where all nodes are children nodes (i.e. "without parent node cluster"). For such a network, we also propose a coding algorithm for multiview images that the joint decoder at central node is able to decode the partial information of each children node with side information of other children node. So, in case of losing any viewpoint, joint decoder is still able to decode other received partial information of children nodes. Finally, we have compared coding scheme of network with and without inter-node communication for different cluster configuration mentioned herein, and “all parent node cluster” configuration, considering communication rate in the network, symmetry in communication and decoding quality of children node, and their robustness in case of nodes failure.
Free-viewpoint TeleVision (FTV) is a next generation television where users can move their viewpoints freely. In the previous paper, we reported an FTV system based on the Ray-Space representation. In this paper, we focus on acquisition and display system for the FTV. As an acquisition system, we investigated two configurations: (1) multiple cameras with interpolation, and (2) a single high-speed camera with optical scanning system. As a display system, we developed a display with head tracking, where the position of a user is detected by image processing.
In this paper, we proposed a new realtime dynamic ray data acquisition and rendering system named the “Free Viewpoint Television” or “FTV”. With this system, the user can freely control the viewpoint position of any dynamic real-world scene in realtime. The basic idea of this system is based on the ray-space method in which an arbitrary photo-realistic view can be generated from a collection of real view images. Since the system is aimed for realtime operation, the collection of images is obtained through an array of cameras where the generation of the missing ray information can be obtained by interpolation of data between cameras.
The prototype system used here includes 16 CCD cameras forming a camera array. The interpolation is based on the adaptive filtering ray-space data interpolation technique. Between each pair of cameras, up to 15 interpolated views can be generated to ensure that no aliasing occurs. The system fully operates under consumer-class hardware. The results achieved from the system are good in terms of both image quality and rendering speed.
Camera sensor network as a new advent of technology is a network that each sensor node can capture video signals, process and communicate them with other nodes. The processing task in this network is to generate arbitrary view, which can be requested from central node or user. To avoid unnecessary communication between nodes in camera sensor network and speed up the processing time, we have distributed the processing tasks between nodes. In this method, each sensor node processes part of interpolation algorithm to generate the interpolated image with local communication between nodes. The processing task in camera sensor network is ray-space interpolation, which is an object independent method and based on MSE minimization by using adaptive filtering. Two methods were proposed for distributing processing tasks, which are Fully Image Shared Decentralized Processing (FIS-DP), and Partially Image Shared Decentralized Processing (PIS-DP), to share image data locally. Comparison of the proposed methods with Centralized Processing (CP) method shows that PIS-DP has the highest processing speed after FIS-DP, and CP has the lowest processing speed. Communication rate of CP and PIS-DP is almost same and better than FIS-DP. So, PIS-DP is recommended because of its better performance than CP and FIS-DP.
This paper describes a novel Free-Viewpoint TV system based on Ray-Space representation. This system consists of a multi-view camera system for 3-D data capturing, a PC cluster with 16 PCs for data processing such as data compression and view interpolation, input device to specify a viewpoint, and a conventional 2-D display to show an arbitrary viewpoint image. To generate an arbitrary viewpoint image, the Ray-Space method is used. First, the multi-view image is converted to the Ray-Space data. Then, interpolation of the Ray-Space using adaptive filter is applied. Finally, an arbitrary view image is generated from the interpolated dense Ray-Space. This paper also describes various compression methods, such as model-based compression, arbitrary-shaped DCT(Discrete Cosine Transform), VQ(Vector Quantization), and subband coding. Finally, a demonstration of a full real-time system from capturing to display is explained.
In recent years, research on arbitrary view image generation using multi-cameras is attracting wide attention. The arbitrary view image generation technique is classified into Model-Based Rendering (MBR) and Image-Based Rendering (IBR). Here, we propose a new method based on IBR using MBR interpolation. This method has the concept which uses a model properly according to camera density, for interpolation IBR technology. Especially we showed the concrete realization method of the algorithm in the middle camera density. And, the computer simulation of arbitrary view image generation using our algorithm is performed. It is verified that this method is simple and useful to generate arbitrary view image at high speed. Moreover, algorithm was mounted in the system using PC cluster. The system using PC cluster has performed this algorithm in real time.
In this paper, we describe a system that creates a virtual bird's-eye view from multi-camera images in real-time and its application to 'HIR (Human-Oriented Information Restructuring) system for ITS' we have proposed. In recent years, studies on AHS (Advanced Highway Systems) are seen in many fields. However, there still remains many problems when we try to realize the automated driving, the goal of AHS. To overcome these problems of AHS, we have proposed HIR system, which assists drivers by providing them with integrated and restructured images. The striking point of the proposed system compared with the conventional one is that it is human, not the mechanical cars, that recognizes the situation and controls the car, and for that purpose, we just generate and show easy-to-understand images for human. The step is, integrate and restructure numerous camera images from all driving environment, such as cars and roads, and non-image information like VICS (Vehicle Information and Communication Systems) information. Then pick out the most important information according to the situation and show it in the form of 'image.' This paper proposes a bird's-eye view system as an example of HIR. We describe the algorithm and hardware to create a bird's-eye view in real-time. In the experiment, we show that the bird's-eye view is useful as a driver-assisting image under the situation of right turn at the intersection.
Image compression is needed to communicate image effectively. Especially, mobile communication needs high compression. It also needs high error tolerance because highly compressed data re strongly influenced by transmission errors. We found that the relation between the range block and domain block can also be applied well to the extended range blocks and domain blocks in most cases in fractal coding based on Iterated Function System. We use this feature of fractal coding and propose a new robust coding scheme, which has good error tolerance. We perform the fractal compression experiments based on the prosed scheme to verify the effectiveness of this scheme. Computationally experiments show that it has nearly the same performance as conventional scheme when errors do not occur and achieve large improvement of image quality without increasing the amount of the data when errors occur.
The 3D form measurement by using images is used in the various fields, for example a computer vision, a robot vision, CAD, and so on. In the application, the 3D measurement is often required to be fast. For the target which has the diffusion surface reflection characteristic, the active method is effective to a high-speed measurement. The active method includes the slit optical projection method, the method of the space encoding with a pattern projection, and the time series encoding pattern projection method, etc. However, there is a problem in a speed improvement as for each method. It is necessary for measurement to project several times at least. In this paper, the concentric circle is used as a projection pattern, and we can measure an object by projecting one time. It was confirmed in experiment that 3D measurement by using concentric circle is possible.
Fractal image coding based on Iterated Function System (IFS) has been attracting much interest because of possibilities of drastic data compression. It achieves compression by using the self-similarity in an image. It is one of the weak points on IFS that the calculation time is huge. Especially, the amount of calculation on scaling parameter and rmse is very huge. In this paper, we propose two schemes to reduce the calculation time while the quality of the image is kept. The first one reduces calculation time of parameters, affine transform and rmse by using the maximum amplitude ratio which is a ratio between the maximum amplitude range of range block and that of domain block. By using the maximum amplitude radio, domain block which does not seem to choose is excluded before calculating parameters. The second one reduces calculation time of scaling parameters by using the ratio between variance of range block and that of domain block. The variance ratio is used instead of the scaling parameter. We perform the fractal compression experiments based on the proposed scheme to verify the effectiveness of these schemes. Computational experiments show that about 50% of calculation time is reduced by using both of two schemes.
We propose a new ray-space interpolation scheme using an adaptive filter. Unlike the previous works related to view interpolation, which detect disparities, our scheme adopts a simple signal processing method; adaptive filtering to Epipolar Plane Image (EPI). First, the original EPI is up- sampled. Then, for each pixel to be interpolated, the block surrounding the pixel is analyzed and the best filter is selected according to the analysis. The filter set includes various interpolation filters with different directionality. Finally, we apply the filter to up-sampled EPI and generate the intermediate ray-space data. Since our scheme does not need pattern matching, it requires less computation cost than the conventional interpolation schemes, and therefore, it is very fast and suitable for hardware implementation. Experimental results show that the proposed scheme interpolates ray-space data with higher PSNR than other interpolation methods, such as nearest neighbor interpolation, the linear interpolation and block matching interpolation.
Conventional ray-space acquisition system required very precise mechanisms to control the small movement of cameras or objects. Most of them adopted camera with a gantry or a turntable. Although they are good to acquire the ray-space of small objects, it is not suitable for ray-space acquisition of very large structures, such as a building, a tower, etc. This paper proposes a new ray-space acquisition system which consists of a camera and a 3D position and orientation sensor. It is not only a compact, easy-to-handle system, but also free from a limitation of size or shape in principle. It can obtain any ray-space data as far as the camera is located within the coverage of the 3D sensor. This paper describes our system and its specifications. Experimental results are also presented.
We examined the data compression by transform coding for hologram patterns generated on a computer, to realize effective compression of hologram patterns with extremely huge information. We can't apply conventional 2-D image compression techniques directly to hologram patterns since the statistical properties of hologram patterns are quite different from those of 2-D images. Furthermore, it is not a hologram pattern but an image reproduced from the hologram pattern to be essential in the holography. We have to compress hologram patterns by considering the reproduced image. We found that hologram patterns contain a large amount of unnecessary component to reproduce the image. This should be removed for effective coding. The unnecessary component can be distinguished clearly from the necessary component in the frequency domain. We successfully removed it by bandpass filtering hologram patterns. We apply Karhunen-Loeve Transform (KLT) to the hologram patterns after this preprocessing. In the case of high compression, it is better to allocate more bits to the lower order KLT coefficients than the bits determined by the conventional power-based allocation method. Then, effective coding of hologram patterns is realized and better images are reproduced.
The Ra-Space method proposed by us is one of the key tools which can be applied to wide areas of 3D image processing and handling. It will contribute to the creation of 3D spatial communication and virtual societies. The Ray-Space is defined as the intensity function F(P), where P represent 4D ray parameters. In this paper, we present a novel Ray- Space coding scheme using an arbitrary-shaped DCT. This scheme consists of the following two steps. The Ray-Space data is first segmented into quadrilateral-shaped primitives. Then, the texture data in each primitive is coded by the arbitrary-shaped DCT. In the experiment, the coding performance of the proposed scheme was examined. The results shows that the proposed coding scheme shows the better coding performance than the block-based Ray-Space coder.
Fractal image coding based on iterated function system has been attracting much interest because of the possibilities of drastic data compression. It performs compression by using the self-similarity included in an image. In the conventional schemes, under the assumption of the self- similarity in the image, each block is mapped from that larger block which is considered as the most suitable block to approximate the range block. However, even if the exact self-similarity of an image is found at the encoder, it hardly holds at the decoder because a domain pool of the encoder is different from that of the decoder. In this paper, we prose a fractal image coding scheme by using domain pools replaced with decoded or transformed values to reduce the difference between the domain pools of the encoder and that of the decoder. The proposed scheme performs two-stage encoding. The domain pool is replaced with decoded non-contractive blocks first and then with transformed values for contractive blocks. It is expected that the proposed scheme reduces errors of contractive blocks in the reconstructed image while those of non- contractive blocks are kept unchanged. The experimental result show the effectiveness of the proposed scheme.
This paper presents a novel 3-D image coding scheme based on the 'Ray Space' representation of 3-D spatial information. First, we give the definition of Ray Space and show that the Ray Space representation can be a common data format for the integrated 3-D visual communication. Then, we introduce the vector field in the Ray Space and propose a novel compression scheme in which the Ray Space data is compressed into the vector data on the divergent points. Finally, experimental results of the reconstruction of Ray Space data is presented.
This paper proposes a new scheme to reduce the quantization noise in DCT coded images by estimating optimal quantized values. In the proposed scheme, the quantized value of DCT coefficients is shifted from the middle of the quantization step to the mean value of the amplitude distribution of DCT coefficients in the quantization step. The values of the shift minimizing the quantization noise power are obtained experimentally. A simple scheme approximating these values is derived so that the modification factors of the quantized values can be estimated at the decoder. About 0.5 dB improvement of SNR is achieved experimentally by the proposed scheme. By smoothing the boundaries of DCT blocks in the decoded images further, 1.0 dB improvement of SNR is achieved and visually better images are obtained.
To transmit facsimile images through a very low bit-rate channel such as the half-rate mobile channel, a very efficient coding scheme for data compression is required. Lossy coding is expected to perform more data reduction than that achieved by the conventional lossless coding schemes. This paper discusses approximate representation of scanned character patterns for data reduction. First, the quality of character patterns is considered in terms of the size of patterns. According to this consideration, the attributes of scanned character patterns and the quality associated with them are assumed. For preserving quality under approximation, a character pattern is described by a set of strokes in tree data structure.
Lately new types of man-machine interfaces for lightening burdens of users have been studied vigorously. One important example of these interfaces is a system for users to put instruction in the noncontact way. In such systems, users put instruction into virtual environment by gestures through stereo cameras. However, the systems proposed before have several restrictions of the environment where these systems are used. Here, we note two of them, the arrangement of stereo cameras and the background of the user. They are very important in the system applications. Thus we propose a new system to put instruction into virtual environment in the noncontact way through stereo cameras, reducing these two restrictions. First, we describe the outline of the proposed system and then we describe a new algorithm to estimate the 3D motion of user's hand and the position of the head by our system. Experiments to estimate user's hand motion are made and the results are shown.
The optimal analysis/synthesis filters giving the maximum coding gain are derived in subband schemes. The optimal analysis filters consist of the emphasis of the picture signal and ideal band-splitting. The characteristics of the emphasis is determined by the spectrum of the picture signal. A large improvement of coding gain is achieved by the subband scheme with the optimal subband filters obtained here. Approximated emphasis characteristic determined from a spectrum model of picture signals can be used and the ideal band-splitting filters can be replaced by conventional subband filters since the degradation of coding gain due to these approximations is small. Computer simulation of super HD image coding by the proposed scheme is performed. SN ratio of the reconstructed image is increased and edges are reconstructed very well compared to the conventional subband scheme. The proposed scheme is very suited to super HD image coding since the improvement of SN ratio is large for images with high correlation between the neighboring pixels.
The subband scheme is one of the most promissing schemes for super HDTV coding. In this paper, we propose two types of multidimensional multichannel subband scheme. Using the concept of the complementary subsampling, the properties of the analysis and synthesis filters axe analyzed. Coding gain is calculated for the proposed scheme by using a picture model. The proposed scheme shows coding gain higher than the conventional scheme.