We conducted research on highly realistic communication systems using natural three-dimensional (3-D) images that do not require special glasses for viewing. Highly realistic communication systems will enable communication with people in remote locations as if one were actually in their presence. Such systems would allow us to present, appreciate, and learn from valuable cultural assets as if they were actually present, greatly enriching our lives.
In recent years, augmented reality (AR) and virtual reality (VR), which are highly realistic technologies, have spread throughout games, entertainment, and other applications, and new sensors, displays, and other equipment for general consumer use are now being commercialized. At the same time, studies are also underway on next-generation highly realistic 3-D images that do not require special glasses188.8.131.52.–6 to reproduce multiple parallax images and groups of light rays in an attempt to achieve 3-D image displays for practical use. In addition to the technology for capturing and displaying light fields, comprehensive efforts are now under way to proceed with the standardization of image formats, compression coding, and evaluation methods.
Natural 3-D image reconstruction technology without special glasses is undergoing steady progress. An integral 3-D system based on integral photography7 is classified as a 3-D system of the spatial image reconstruction type. An integral 3-D system that features horizontal as well as vertical parallax is being developed for future TV broadcasting.8 An integral 3-D system requires many pixels to achieve high image quality. Therefore, ultrahigh-definition (UHD) video with 8 K resolution is used to capture and display integral 3-D images consisting of .9 Furthermore, multiple display devices are used to enhance the resolution of integral 3-D images.10,11 However, to achieve high presence communication using integral 3-D images, we must enhance the resolution and size of 3-D images that allow multiple viewers to simultaneously observe life-size 3-D images of large objects, such as human beings or cars.
Lee et al.12 studied an optimal configuration design of a simple multiprojection 3-D display using only a vertical diffuser film for the screen. However, uneven brightness in 3-D images is likely to occur, which is reduced by suitable arrangements of the projectors. Efrat et al.13 developed a 3-D display for the viewing environment similar to that at a movie theater. Here, a 3-D image can be displayed to observers at their respective seat positions of different viewing distances using slanted barriers in front of a flat panel display. However, the viewing area is limited to the width of the seat, and it is difficult to display 3-D images with a large viewing angle.
In this study, to realize a highly-realistic communication system based on multiview 3-D video, we proposed a projection-type 3-D display method that can display glasses-free 3-D images on a large screen using multiple projectors. We have designed and fabricated a 200-in. 3-D display system using HD projector units to enlarge the viewing area and to enable many observers to simultaneously see 3-D images from various angles. The resolution and color reproducibility of the 3-D images displayed are of HD image quality.
We have also developed a super multiview 3-D capture system with compact HD camera units that include real-time video signal correction circuits for camera calibration and image rectification. First, we analyzed the optimal camera arrangements for multiprojection-type 3-D displays. Second, based on experiments, we developed a super multiview camera system using HD cameras to capture moving objects. The registrations of captured images were arranged by high-accuracy camera adjustment and geometrically compensated by image processing for the multiprojection 3-D display. Experimental result showed that we successfully captured and displayed 3-D videos of moving objects using our system.
System Design of Multiview Three-Dimensional Video
Basic System Configuration
Figure 1 shows the basic configuration of our multiview 3-D video capture and display system. A projector array is used as an image display device and consists of several small projectors called “projector units” arranged horizontally and vertically. From each projector unit, images with horizontal parallax are projected and superimposed on the plane for image display that combines an optical film that has anisotropic diffusion characteristics with a condenser lens. The diffuser film is a rear screen that has a small diffusion angle in the horizontal direction relative to the incident light, and a wide diffusion angle in the vertical direction. These diffusion characteristics enable the system to produce different images at various horizontal angles, thereby allowing the observer to see parallax images based on the observer’s position. Conversely, in the vertical direction, the incident light becomes widely diffused, thereby eliminating the effects of the projection angle in that direction. One can therefore arrange the projector units vertically, so that increasing the number of units increases the image density, which is defined as the horizontal number of parallax images per unit viewing angle. Increasing the parallax image density can be expected to produce a high-quality 3-D display at high resolution with smooth motion parallax over a wide viewing zone.
A 3-D camera system consists of numerous camera units arranged in a horizontal row, unlike the projector unit, so as not to cause vertical parallax. It is difficult to arrange the camera pitch closely by several cm or less due to limitations imposed by the size of the cameras. Therefore, we generate interpolated images to increase the parallax. Image quality correction using high-accuracy adjustment of camera calibration and geometrical compensation are common problems in the multiview 3-D camera method. In this work, we study suitable 3-D camera conditions for a 3-D display system using multiprojectors and an optical screen consisting of an anisotropic diffusion film and a Fresnel lens.
Optimal Arrangement of Projector Array and Camera Array
First, we study the camera and display system arrangement to reconstruct the correct viewing angle and visual field. Figure 2 shows the viewing angle and visual angle in our multiview 3-D display system. The optical axes of images projected from the projector units converge at one point in the center of the display screen as shown by the dashed lines in the figure. All projector units have a projection lens that moves laterally to shift the optical axes of the light sources. In the figure, the viewing angle is expressed as
The width of viewing area is expressed as
For example, in the case of , , and , we require a minimum of 30 parallax images.
On the other hand, the visual angle in Fig. 2 is expressed as
Next, we consider the capture angle in the 3-D camera system. In Fig. 3, the dashed lines show the optical axes of each camera that converge at point in plane . The convergence angle is thus expressed as
For the condition of when capture angle is equal to visual angle , we obtain
From Eq. (6) to reconstruct the correct viewing angle in the display system, i.e., (ratio of the width of the projector array and one of the camera arrays) must be equal to (ratio of the projector distance and camera-to-object distance). From Eq. (7), to reconstruct the correct visual angle in the display system, i.e., (ratio of the size of the projected image and one of the capture images) must be equal to (ratio of the viewing distance and one of the capture distances).
Resolution Characteristics of Three-Dimensional Video System
Camera units are arranged so that the optical axes converge at one point , as shown in Fig. 4. A grating pattern is placed at a distance from the convergence plane. With one cycle of the grating being , the spatial frequency of the grating from each camera is [cycle/rad]. The spatial frequency [cycle/rad] of the grating viewed from point on the convergence plane is then expressed as
When the camera units are spaced at a pitch , the angular pitch of sampling is rad, and the Nyquist frequency [cycle/rad] of the sampling is . To prevent aliasing with this sampling, the following condition must be met:
From Eqs. (8) and (9), the condition to prevent aliasing is expressed as
The resolution is also limited by the camera’s maximum frequency that depends on the pixel pitch of the capture device.
Next, the sampling effect of the display system, as well as the camera system, is examined. Projector units are arranged to converge and the optical axes converge at the center of the display screen, as shown in Fig. 4. A grating pattern is displayed at a distance from the display screen. The spatial frequency of the grating viewed from the observer’s position is [cycle/rad]. The spatial frequency of the grating viewed from the display screen is expressed as follows:
When the projector units are spaced at a pitch , the angular pitch of sampling is rad, and the Nyquist frequency of the sampling is . To prevent aliasing with this sampling, the following condition must be met:
From Eqs. (11) and (12), the condition to prevent aliasing is expressed as
The resolution is also limited by the display’s maximum frequency , which depends on the pixel pitch of the display device.
Figure 5 shows the relationships between the depth position of the object and the capture spatial frequency and depth position of the reconstructed image and the display spatial frequency. From Fig. 5, we examine the required camera conditions.
The maximum spatial frequency of the camera system must be higher than that of the display system
The depth positions of , , , and shown in Fig. 5 must meet the following conditions:
From Eqs. (15) and (16), we obtain the following conditions that must be met:
Characteristics of Three-Dimensional Display Using Projector Array
Gap of Parallax Images
Figure 6(a) shows an overhead view of the basic arrangement of the 3-D system using projector units, where the number of projector units is almost the same as the number of horizontal parallax images. Thus, the horizontal pitch between the projector units is set to a value that matches the number of parallax images per horizontal unit angle. The viewing angle of the 3-D image displayed at the center of the screen in Eq. (1) will become
Equations (19) to (21) make it possible to determine the number of projector units , arrangement pitch , distance between the projector array and display screen, and other parameters based on the viewing angle for the 3-D image, viewing area , and other factors that are required for the 3-D image. These parameters are used as guidelines during system design.
The arrangement of the projector array is limited by the size of each unit, thereby making it difficult to arrange the projectors densely in the horizontal direction alone. For this reason, the system is composed not only horizontally but also vertically, in the 2-D manner as shown in Fig. 6(b).
The gap between displayed parallax images is expressed as
The ideal gap of parallax images is less than the size of a pupil, but it is difficult to create a practical 3-D system based on this requirement. The gap of parallax images in our previous system was 29.4 mm, which could be used to display 3-D images with natural motion parallax.16 We designed the system parameters in the proposed 200-in. 3-D display system to be , which is approximately one-third of the interpupillary distance, to enhance smooth motion parallax in 3-D images. The number of required projectors was estimated from the gaps of the parallax, the optimal viewing distance, and the required viewing angle.
Uniformity of Three-Dimensional Image’s Brightness
We experienced problems related to the appearance of stripe noise, i.e., nonuniform brightness and color areas in 3-D images, reduced 3-D image resolution, and unnatural images owing to observer movements, which made it difficult to increase the screen size using conventional methods. Stripe noise occurred because of the angle pitch of the parallax images, uniform brightness and color in projected images, and diffusion angle of the diffuser film.
As shown in Fig. 7(a), the incident light from a projector unit is emitted at a horizontal angle relative to the optical axis of the condenser lens. Let the horizontal diffusion be a Gaussian distribution; the optical distribution of the output beams from the ’th projector unit can then be expressed asFig. 7(b), where the optical distribution is given by
The flatness and crosstalk in the brightness distribution between parallax images will be affected by the screen’s horizontal diffusion characteristics in a 3-D image synthesized from parallax images. A small diffusion angle causes irregular brightness between parallax images in terms of the brightness value of the synthesized light shown in Fig. 7(b), resulting in uneven brightness in the image. In contrast, increasing the diffusion angle increases the proportion of the crosstalk area in Fig. 7(b).
For example, let the number of parallax images be 50. First, we evaluate the degree of synthesized light brightness modulation using [%] (where, as shown in Fig. 7(b), denotes the maximum brightness and the minimum brightness), before estimating the crosstalk rate as the proportion of the crosstalk relative to the entire output beam. The degree of modulation [%] of uneven brightness relative to the diffusion angle and crosstalk rate [%] is shown in Fig. 8.
When the horizontal diffusion angle is small, the degree of modulation () of the synthesized beam will be greater, thereby resulting in more uneven brightness. In contrast, when is high, the crosstalk () of the output beam will be higher between parallax images. Based on Fig. 8, we can balance the uneven brightness and crosstalk, where the appropriate diffusion angle can be projected in a range from 0.25 to 0.3. This corresponds to a half-value angle of 0.5 deg to 0.7 deg in the diffusion angle distribution of the beams. Another method involves selecting the optimal diffusion characteristics from these relationships to improve the screen material and structure, thereby optimizing the diffusion angle distribution characteristics of the beams.
In terms of the vertical direction shown in Fig. 9(a), the angle of the beam that enters the diffuser screen varies depending on which projector unit is involved, i.e., upper or lower. Consequently, a difference also arises in the diffusion direction of the beam depending on whether an upper or lower projector unit is involved. When observed from position , as shown in Fig. 9(a), the distribution of the brightness values in the -axis direction of the screen (as viewed by the observer) is as follows:Fig. 9(a), and is the diffusion angle in the vertical direction. For example, if we assume that , , , , and , then the distribution of brightness values in the -axis direction of the display image will be as shown in Fig. 9(b).
As indicated in Eq. (25), the image is shaded depending on location of the image, location of the observer, and location of the projector unit. Note that a comparatively large diffusion angle characteristic is needed to obtain an image with a brightness distribution that provides a certain degree of uniformity in the vertical direction.
Fabrication of 200-in. Three-Dimensional Display
We developed a compact projector unit measuring to pack the projector units densely, as shown in Fig. 10. One of the main impediments to image quality is the stripe noise between parallax images, which is influenced by the uniformity of brightness and color balance in the parallax images. To solve these problems, the brightness and color balance between projector units are compensated for using power-controlled LED light sources in the projector units. The uniformity of brightness in each projected image is also compensated for electrically by using a shading compensation circuit in each projector unit. The specifications of the projector unit are summarized in Table 1.
Specifications of projector unit.
|Display device||0.7 in.,|
|Number of pixel|
|Projection lens||Focal length: 21.4 to 42.8 mm|
|number: 3.2 to 4|
|Light source||RGB LEDs|
|R: 617 nm, G: 525 nm, B: 465 nm|
|Luminous flux||20 lm|
Prototyped Three-Dimensional Display System
The accuracy of the light control on the display screen affects the resolution and natural motion parallax of 3-D images. We selected a suitable diffuser film with a 0.88 deg horizontal diffusion angle and 35 deg vertical diffusion angle and we combined it with an optimally designed 200-in. Fresnel lens with an aspherical surface (Fig. 11). As a result, approximately 200 parallax images could be displayed at a viewing distance of 5.517 m. The stripe noise in the displayed 3-D images, which reduced the image quality in our previous prototype system,16 as shown in Fig. 12(a), was reduced in our newly developed 200-in. display system, as shown in Fig. 12(b).
The specifications and performance of our newly developed 3-D display are listed in Table 2. The on-screen surface resolution of the 3-D images is 1920 pixels horizontally and 1080 pixels vertically. The viewing angle is 40 deg, and the width of the viewing area is 4 m at a viewing distance of 5.517 m. It is possible to display moving images with a frame rate of 60 fps.
Specifications and performance of 200-in. 3-D display.
|Size||200 in. (16:9)|
|Fresnel lens||Aspherical surface|
|Projector units||Interval of units||33 mm|
|Number of units||201 units|
|3-D image||Size||200 in. (16:9)|
|Frame rate||60 fps|
|Number of parallax images||170|
|Interval of parallax images||22.8 mm (at viewing distance 5.517 m)|
|Viewing angle||40 deg (horizontal direction)|
Figure 13 shows a reconstructed 3-D image of a life-size computer graphics (CG) car, and parallax images observed from the left, center, and right views. Motion parallax is present owing to the difference in the positional relationships of the car door and the interior depending on the observer’s location. We can also confirm that the reflections on the door change according to the viewing position just as for a real object as shown in Fig. 14.
Super Multiview Camera for Three-Dimensional Display
Camera Units and Field Test of Multiview Capture
We developed a compact camera unit to capture dense multiview video as shown in Fig. 15. The specifications of the camera unit are listed in Table 3. The width of the camera units is 30 mm to allow for a dense camera array. The output video has an HD resolution of . All camera units can operate with an external synchronization video signal. Each camera has a compact oblique stage to adjust the camera position accuracy as shown in Fig. 15.
Specifications of camera unit.
|Camera unit||Number of pixel|
|Frame rate||60 fps|
|Output video signal||3G-SDI|
|Image sensor size||in.|
|Camera lens||Focal length||6 mm|
|Angular field of view|
|Iris range||F1.2 to F16|
Figure 16 shows the field test setup for capturing actual live moving objects using a camera array of 64 camera units at the Koyasan Kongobuji Temple in Japan. Each camera requires only one cable to control camera gain and white balance, and to transmit the captured video signal to an image processor and camera control unit. Moreover, camera unit pairs are connected by one cable in a daisy-chain fashion. Therefore, we can reduce the number of cables between the camera control units and the camera units to half of the number of camera units. To record 3-D sound, we also place small microphone arrays below and above the camera array, as shown in Fig. 16. The camera array system is mounted on a tripod, which enables changing the shooting angle and direction. The number of camera units can be changed according to the distance from the camera to the objects, as well as the size of the objects.
Real-Time Capture and Display System
Figure 17 shows the system configuration of the real-time capture and 3-D video display system. To capture and display 3-D images in real-time, we developed a dynamic convergence compensation circuit for each camera control unit. The registrations of captured images were arranged by high-accuracy camera adjustment geometrically compensated for using image processing for the multiprojection 3-D display.17 The output 3G-serial digital interface (SDI) signals from the camera control units are connected to the signal sources of the 3-D display system. Experimental results are shown in Fig. 18, and we successfully captured 3-D images by developing multiview camera array, and displayed 3-D videos of life-size moving objects in real-time. In this demonstration, we connected the camera side and display side with multiple metallic video cables. It will be possible to transmit these 3-D videos between distant places using a highly efficient compression method for multiview images.18
Super Multiview Camera System
Next, we designed a real-time capture system using the above-mentioned compact camera units for the dense multiview 3-D display described in Sec. 4. First, we consider the parameters of the camera pitch and projector pitch for the reconstruction required to correct the 3D images. From a geometrical analysis using Fig. 19, to conserve the depth-to-distance ratio of the objects so that , we obtain the following relationship:19
In our developed 3-D display, the pitch of the projector array is 33 mm, the projection distance is 8 m, the distance between the camera and objects, , is 5.517 m, and is equal to the viewing distance. As mentioned previously, a camera pitch of 22.8 mm is required. The 22.8-mm value is less than the width of the camera units, so we developed a synthesized optical system consisting of a mirror and half mirror to combine a two-array camera system as shown in Fig. 20(a). A two-array system consisting of two sets of 96 camera units with 45.6-mm intervals [Fig. 20(b)] was combined with the synthesized optical system, and the pitch of the multiview camera became effectively 22.8 mm.
We carried out social demonstration experiments for the highly realistic communication system by introducing the developed multiview 3-D camera and display systems in a new development area north of Osaka station in Japan from April 2013 to November 2015. In the demonstration experiments, we used the developed 200-in. 3-D display system and 3-D contents captured by the camera array.
Figure 21(a) shows the installed 200-in. 3-D display system, called ray emergent imaging (REI),20 and the super multiview camera array system. The height of the screen center is 2 m, to enable many people to observe 3-D images simultaneously, as shown in Fig. 21(b). The super multiview 3-D camera system described in Sec. 5.3 is installed under the screen, so that the appearance of visitors can be captured and displayed on the screen in real-time.
There were up to 2500 visitors per day to the exhibition during the summer vacation periods. We have examined the reactions and behavior of visitors when viewing several 3-D contents as shown in Fig. 22, and surveyed the visitors about the impressions of the observed 3-D images.21
In the results, the visitors stopped at and showed the most interest in the content of the colorful real objects spread over a wide area, as shown in Fig. 22(a). The visitors also felt the most motion parallax with the content of Fig. 22(b).22 We believe that the observers experienced a high degree of realism owing to the reproduction of natural glasses-free 3-D images on the large screen in life-size dimensions, as if the objects were really in front of them.
To achieve a highly realistic communication system based on a natural 3-D video, we proposed a projection-type 3-D display method that can display glasses-free 3-D images on a large screen using multiple projectors. We designed and fabricated the 200-in. 3-D display system to allow 40 deg viewing angles. The gap between the parallax images was 22.8 mm, which helped us to produce natural HD 3-D images with smooth motion parallax. We also developed the super multiview 3-D capture system with compact HD camera units that have real-time video signal correction circuits for camera calibration and image rectification.
We performed social demonstration experiments by introducing the developed 3-D video system in a public area. The large 3-D image, which does not require observers to wear special glasses, enables several people to observe large 3-D images and share a 3-D space and environment. By reproducing people, vehicles, and other familiar objects and environments in life-size dimensions, the observer can experience a high degree of realism. Such images are expected to find many applications in the verification of industrial designs, publicity, exhibitions of cultural heritage, works of art, and other areas, in addition to being used for digital cinema.
This research was conducted when the authors were employed by Universal Communication Research Labs in the National Institute of Information and Communications Technology. Part of this research was supported by the “Research and Development of Glasses-Free 3-Dimensional Image Technology, 3-Dimensional Image Support Technology” research program of the Ministry of Internal Affairs and Communications, Japan. The authors would like to extend special thanks to Masahisa Sakai, Yasuyuki Haino, and Masahito Sato of JVC KENWOOD, Inc. for their technical support in developing the 3-D display system. Our gratitude is also extended to Kongobuji Temple and NHK Enterprises, Inc. for their support in the capture field test at Koyasan and to ORIX Real Estate Corporation for their support in the capture test at Kyoto Aquarium. The authors would like to thank Knowledge Capital Association and KMO Corporation for their support of the experimental demonstration in GRAND FRONT OSAKA.
Masahiro Kawakita received his BS and MS degrees in physics from Kyushu University and his PhD in electronic engineering from the University of Tokyo in 1988, 1990, and 2005, respectively. In 1990, he joined NHK (Japan Broadcasting Corporation), Tokyo. Since 1993, he has been at the Science and Technical Research Laboratories of NHK, where he has been researching applications of liquid crystal devices and optically addressed spatial modulators, 3-D TV cameras, and display systems.
Shoichro Iwasawa received his PhD in electrical engineering and electronics from Seikei University, Japan. He is a senior researcher at the National Institute of Information and Communications Technology, Japan, and a member of ACM and IEEE. His fields of interest include image-based social listening, 3-D display, pervasive computing, computer graphics, and computer vision.
Roberto Lopez-Gulliver received his PhD of engineering in information and media science from Kobe University. He worked as a senior researcher at ATR and NICT research labs from 1994 to 2014, and he is currently an associate professor at Ritsumeikan University in Kyoto, Japan. His research interests include interactive autostereoscopic 3-D imaging and displays, virtual reality, and human computer interaction.
Naomi Inoue received his DEng, ME, and BE degrees from Kyoto University in 1998, 1984, and 1982, respectively. From 1987 to 1991, he was a researcher at ATR Interpreting Telephony Research Laboratories. In 1991, he joined KDD R&D Laboratories. He joined NICT Universal Media Research Laboratory in 2006, and became its director in 2010. He promoted research on ultrarealistic communication for 10 years. He rejoined KDDI Research, Inc. in 2016.