Light-field displays provide three-dimensional (3-D) perception to the observer by reproducing a light field, i.e., a set of light rays emitted from arbitrary points on a plane in arbitrary directions. These displays have attracted research attention,12.3.4.–5 because they give a natural 3-D sensation by representing not only binocular parallax but also motion parallax. Technically, a light field consists of a dense collection of multiview images. Therefore, light-field displays have to display many views (typically dozens) at the same time. To develop such a display, researchers have devised several approaches, such as using parallax barriers1,67.–8 and lenslet arrays.2,3,910.–11 These approaches require a high-resolution display panel capable of displaying all pixels of all views simultaneously.
In contrast, we have focused on a newly emerging approach using a layered structure,12,13 i.e., a few light-attenuating panels (e.g., LCD panels) stacked in front of a backlight. As proposed by Wetzstein et al.,13 many views can be reproduced with reasonable quality using only a few layers having the same resolution as the displayed image for each view. Therefore, this type of display is called a “compressive display.” This remarkable technique is achieved by applying a light-field factorization, in which the target light field is factorized into a few transmittance patterns for different layers. This layered structure can also be applied for light-field projections14,15 and head-mounted displays.16,17
Although layered light-field displays achieve superior performance in terms of the efficiency of light field representation, they are limited in resolution: an object with larger pop-out is reproduced with stronger blurring due to the upper bound of the spatial frequency for the display. In this paper, to solve this problem, we propose a structure composed of higher resolution (i.e., with finer pixels) binary layers and a lower resolution, multibit backlight. The purpose of this structure is to increase the upper bound of the spatial frequency and simultaneously to decrease the total number of bits for the display. Increasing the upper bound of the spatial frequency expands the range of depth over, which objects are reproduced clearly and thus reduces the blurring of objects with large pop-out. Simply increasing the layer resolution, however, degrades the efficiency of the light-field representation. Hence, limiting the layers’ transmittance to binary (on/off) levels enables us to reduce the total number of bits for the display, i.e., the efficiency of light-field representation is improved. In other words, we argue that we can enjoy the benefit of improved spatial frequency while maintaining the efficiency advantage of the layered display by simultaneously increasing the resolution and reducing the bit depth per pixel of the layers. Although increasing the layer resolution has already been proposed by Hirsch et al.,18 we believe that this paper is the first to also consider the layers’ bit depth and the display efficiency in relation to the total bits. Our research suggests what type of display panels is required for light-field displays: high-resolution binary layers introduce an interesting trade-off between the total bits of the display and the quality of light-field reproduction.
Moreover, we use a low-resolution backlight, whose pixels can take multibit values, to compensate the number of intensity levels, which would be quite limited with only binary layers. [We are not the first to combine a special backlight with layers for a light-field display. Wetzstein et al.13 used a directional backlight (i.e., a low-resolution light-field emitter) composed of a lenslet array and a high-resolution display panel, to expand the display’s field of view. In contrast, we use a simple low-resolution backlight (nondirectional) to compensate the number of intensity levels while maintaining the efficiency of the light-field representation.] With the layers’ bit depth reduced to 1 (binary), the number of intensity levels that can be expressed by the layers is insufficient, causing artifacts around gradation regions of the displayed images. The low-resolution backlight thus efficiently increases the number of intensity levels of the display with only a slight increase in the total number of bits. In this configuration, a light field is effectively factorized into the high-resolution binary layers and the low-resolution, multibit backlight.
Overall, the contributions of this paper are mainly twofold:
1. For a layered light-field display, we show that increasing the layer resolution causes an increase in the upper bound of the spatial frequency, thus expanding the range of depth over, which objects are reproduced clearly and decreasing the blurriness of objects with large pop-out.
2. We propose a structure for a layered light-field display composed of higher resolution binary layers and a lower resolution, multibit backlight, which improves both the upper bound of the spatial frequency and the display’s efficiency in terms of total bits.
This paper is an extension of our previous conference paper,19 where the structure for a light-field display using high-resolution and low-bit-depth layers was analyzed. We have added the idea of using a low-resolution multibit backlight to further improve the quality of displayed light fields.
Principles of Layered Light-Field Display
Figure 1 shows the structure and configuration of a layered light-field display, in which two light-attenuating layers, such as LCD panels, are stacked in front of a backlight. First, we describe a baseline structure consisting of layers with the same resolution as the displayed image for each view, as shown in Fig. 1(a). In this case, an outgoing light ray, a collection of which constitutes the light field produced by the display, can be described by
This optimization is formulated as nonnegative tensor factorization. Specifically, iteration is applied to alternately update the front and rear layers according to a multiplicative update rule. During updating, thresholding is also applied so that the transmittance values of the layers are limited within [0,1].
As proposed by Wetzstein et al.,13 if layer devices that run faster than human visual perception are available, time-multiplexing over several consecutive frames can also improve the quality of the displayed light field, but this is outside the scope of this paper.
High-resolution layers and low-resolution backlight
Figure 1(b) shows the proposed structure composed of two high-resolution layers and a low-resolution backlight. We assume that the resolutions of each layer and the backlight are times finer and times sparser, respectively, in width and height than the displayed image for each view. Then, we model an outgoing light ray by
Although this optimization is formulated in a different form from the conventional one (Sec. 2.1, baseline structure), we can solve it similarly. Specifically, , , and are updated alternately according to the multiplicative update rule.
Reducing layer bit depth
The light-field factorization proposed by Wetzstein et al.13 assumes that the transmittance values of the layers are continuous. In contrast, we propose to reduce the bit depth per pixel to improve the display’s efficiency in terms of the total number of bits. When the transmittance values are not continuous but discrete, the optimization for the layers and backlight becomes a combinatorial optimization, which is NP-hard. Therefore, we use the following empirical method. As described above, the transmittance patterns of the layers are updated iteratively. In this iteration, we gradually tighten the threshold so that the pixels of the layers become discrete values. Figure 2 shows the case of a bit depth of two as an example. We tighten the threshold only for the layers because the backlight can take continuous values.
Upper Bound of Spatial Frequency
Following previous studies,13,20 we analyze the upper bound of the spatial frequency of a display consisting of two layers, as shown in Fig. 3. The horizontal axis represents the depth, with the two layers located at 0 and 1, and the vertical axis represents the spatial frequency. Parameter is a scale factor for the layer resolution; that is, the layer resolution is times finer than the displayed images in width and height. To clearly display an object at a certain depth, the spatial frequency’s upper bound at that depth should be greater than the Nyquist frequency of the displayed images. The blue line in the graph indicates the case of , for which the resolution of the layers is the same as that of the displayed images. The upper bound takes a maximum equidistant between the two layers decreases as it diverges from there.
Here, we are interested in the range of depth over which the upper bound of the spatial frequency is not below the Nyquist frequency of the images to be displayed (we call this the “effective range”), because we assume that the displayed images are band-limited below the Nyquist frequency. In the case of , the “effective range” is from 0 to 1. In other words, an object can be displayed clearly only within the two layers. When the layers have times higher resolution; however, the upper bound is also multiplied by accordingly. As a result, the “effective range” expands, and the object can be displayed clearly even outside the two layers.
Resolution and Bit Depth for Layers
First, we evaluated the effects of the layers’ resolution (indicated by the scale factor ) and bit depth on the reproduced light field. We calculated the transmittance patterns of the layers from a target light field, simulated the light field reproduced by the display, and evaluated the reproduction quality by measuring the peak signal-to-noise ratio (PSNR) for the target light field. A “Lego Truck” dataset21 was used as a target light field after converting it to grayscale and resizing it to . This dataset has views, of which views were used for these experiments. In this experiment, a uniform backlight was used. Figure 4 shows the results. The PSNR values in this figure were calculated from the mean square errors over all pixels () and all views (). With , the display quality greatly deteriorated as the bit depth decreased. With increased, however, reasonable quality could be achieved even with a low bit depth. From this result, we can state that the layers’ bit depth per pixel can be reduced without degrading the display quality if the layer has a higher resolution than the displayed image for each view. We also measured the PSNR values for individual views to evaluate the display quality in terms of view directions, as shown in Fig. 5. With , not all views were correctly reproduced even using eight bits. With , however, we achieved relatively good quality over all views. Directions closer to the center tended to have better quality.
We expect that higher PSNR values might be achieved by further increasing the resolution, but because of the heavy computational cost, we did not test any configurations above . Moreover, note that the quality of the layered display depends not only on the upper bound of the spatial frequency but also on the low-rankness of the target light field.20 We expect that the quality limitation of this compressive display will be further addressed in future work.
Next, to evaluate the efficiency of the light-field representation, we investigated the relation between the compression ratio of the display and the quality of the reproduced light field. The compression ratio of the display is defined as the total bits of the display divided by the total bits of a target light field. (If the target light field is composed of views and each view has and 8 bits/pixel, then the total bits of the target light field are . Then, if the display is composed of only two layers, with each layer having pixels and , then the total bits of the display are . In this case, the compression ratio of the light-field representation is .) We changed the horizontal axis in Fig. 4 to represent the compression ratio of the display and replotted the data as shown in Fig. 6. The baseline layered structure (, 8-bit layers) is plotted as a yellow star. It achieved a high compression ratio, but the upper bound of the spatial frequency was low because ; in other words, the effective range in which the object could be clearly represented was narrow, as described in Sec. 3. With the resolution of each layer set to times that of the displayed images to increase the upper bound of the spatial frequency, the total bits of the display also increased, by a factor of . The total bits, however, could be reduced by decreasing the layers’ bit depth while maintaining quality. Therefore, we conclude that it is possible to achieve a high-quality light-field display while maintaining the high efficiency of light-field representation by simultaneously increasing the resolution and reducing the bit depth of the layers.
The effect of the backlight resolution on the quality of the reproduced light field was also evaluated. The resolution and bit depth of the layers were set to and 1, respectively, and only the resolution of the backlight was varied. Figure 7 shows the results. As expected, as the backlight resolution increased, the display quality improved. Even when the backlight had a much lower resolution () than the displayed image for each view, the display quality was sufficiently higher than the case with the uniform backlight. From this result, we conclude that it is beneficial to use a pixelized backlight instead of a uniform one, even with extremely low resolution.
Subjective Comparison of Different Structures
Finally, we used simulations to subjectively compare different light-field display structures, and Fig. 8 shows the results. The first group of images (a) was obtained with the baseline structure, in which the layers have the same resolution and bit depth as the displayed images. As shown in the green box, a region with large pop-out was reproduced with blurring. The second group (b) shows the results obtained with a structure composed of layers with six times higher resolution in width and height than the displayed images. Because of the higher resolution, the effective range of depth was expanded, and thus, the region with large pop-out could be clearly reproduced. The total bits of the display, however, increased by a factor of 36, and the display’s compression ratio became much worse. The third group (c) shows the results obtained with a structure composed of high-resolution and binary layers. Both a high compression ratio and clear display throughout the depth range could be achieved with this structure. As shown in the red box, however, staircase artifacts appeared in gradation regions because the number of intensity levels that could be expressed by the display was limited. Finally, the fourth group (b) shows the results obtained with the combination of high-resolution binary layers and low-resolution backlight proposed in this paper. The backlight resolution was set to one-eighth that of the displayed image in width and height. The entire displayed image was clear, without blurring, and the gradation appeared natural. In addition, because the backlight had a low resolution, the efficiency decline was negligible compared with that of the structure with the uniform backlight. Figure 9 shows the images displayed by the proposed structure to different viewpoints. We confirmed that the parallax can be reproduced depending on the viewpoints. From these results, we conclude that the combination of the high-resolution binary layers and the low-resolution backlight could efficiently reproduce the target light field with high quality.
For future work, we will develop a prototype of a layered light field display with our proposed structure to demonstrate that it can clearly reproduce an object with large pop-out while keeping the total bits of the display low. In this paper, as a preliminary step, we constructed a display model composed of acrylic transparent sheets on which layer patterns and a low-resolution backlight pattern were printed, as shown in Fig. 10.
We used the “Lego Truck” dataset21 to generate the transmittance patterns of the layers and the luminance of the backlight for the baseline and proposed structures. The results photographed 1 m away from the display models. (For all experiments reported in this paper, we assumed that the images composing the target light field were orthographic, i.e., strictly speaking, that the target light field could be observed if the display was viewed from infinity. Because the reproduced light field was sufficiently dense in the angular direction; however, the images did not deteriorated even when the display was observed from a finite distance. Specifically, we confirmed that we could observe good images both at 0.5 m and farther from the display.) and notable close-ups are shown in the top and bottom rows, respectively, of Fig. 11. Figure 11(a) shows the results for the baseline structure, in which the layer resolution was , the same as that of the displayed image for each view. To see the difference in pixel size between the two structures, we printed these layer patterns onto an -in. transparent sheet after they were resized to a resolution of by nearest neighbor interpolation. Meanwhile, Fig. 11(b) shows the results for the proposed structure composed of high-resolution () binary layers and a low-resolution () backlight. The resolutions of the layers and backlight were and , respectively. The layer patterns were printed directly onto the same size sheet (.), and the backlight pattern was printed after resizing to . The printed sheets were stacked on a uniform backlight with a 5-mm acrylic plate in between. For both structures, the viewing angle of the display was in both the horizontal and vertical directions. This was calculated from the pixel pitch of the layers, the distance between two layers, and the number of views for the target light field. The results demonstrated that the proposed structure (b) could represent the target light field with higher quality. In particular, the blurriness of parts with large pop-out reduced by the proposed structure as compared with the baseline structure.
We have proposed a structure for a layered light-field display with high-resolution binary layers and a low-resolution, multibit backlight. The increased layer resolution increases the upper bound of the spatial frequency while limiting the layers’ transmittance to binary (on/off) values reduces the total number of bits for the display. The low-resolution backlight compensates the number of intensity levels, which would be quite limited with only binary layers. We experimentally validated that the proposed structure can reproduce a light field with high quality and high efficiency.
For future work, we may consider colorization. The simplest structure for colorization would consist of color filter arrays inserted in all layers of the display. As another approach, Hirsch et al.18 proposed a layered structure in which only one layer has a color filter. We could also adopt field-sequential colorization,22 in which RGB channels are alternately displayed over time. In addition, to further validate the effectiveness of the proposed structure, we will develop a display prototype.
Yuto Kobayashi received his BE degree in electrical engineering from Nagoya University, Japan, in 2016. Currently, he is a graduate student in electrical engineering and computer science at Nagoya University. He is working on light field acquisition and rendering for 3-D displays.
Keita Takahashi received his BE, MS, and PhD degrees in information and communication engineering from the University of Tokyo, Tokyo, Japan, in 2001, 2003, and 2006. He was a project assistant professor at the University of Tokyo from 2006 to 2011 and was an assistant professor at the University of Electro-Communications from 2011 to 2013. Currently, he is an associate professor at the Graduate School of Engineering, Nagoya University, Nagoya, Japan. His research interests include computational photography, image-based rendering, and 3-D displays.
Toshiaki Fujii received his BE, ME, and Dr.E degrees in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1990, 1992, and 1995. In 1995, he joined the Graduate School of Engineering, Nagoya University, where currently he is a professor. From 2008 to 2010, he was with the Graduate School of Science and Engineering, Tokyo Institute of Technology. His current research interests include multidimensional signal processing, multicamera systems, multiview video coding and transmission, free-viewpoint television, and their applications.