Using high-resolution binary layers and a low-resolution multibit backlight for a layered light-field display

Abstract. We propose a structure for a layered light-field display composed of high-resolution binary layers and a low-resolution, multibit backlight. This structure aims to increase the upper bound of the spatial frequency while also reducing the total number of bits for the display. The increased layer resolution increases the upper bound of the spatial frequency, meaning that the display can reproduce an object with a large amount of pop-out more clearly than can a conventional light-field display. In contrast, limiting the layers’ transmittance to binary (on/off) levels reduces the total number of bits for the display, thus maintaining the high efficiency of light-field representation. The low-resolution backlight, whose pixels can take multibit values, compensates the number of intensity levels, which would otherwise be quite limited with only the binary layers. Through analytical and experimental results, we show that a display based on the proposed structure can reproduce a light field with high quality and high efficiency as a result of combining the high-resolution binary layers and low-resolution backlight.


Introduction
Light-field displays provide three-dimensional (3-D) perception to the observer by reproducing a light field, i.e., a set of light rays emitted from arbitrary points on a plane in arbitrary directions.These displays have attracted research attention, [1][2][3][4][5] because they give a natural 3-D sensation by representing not only binocular parallax but also motion parallax.Technically, a light field consists of a dense collection of multiview images.Therefore, light-field displays have to display many views (typically dozens) at the same time.
0][11] These approaches require a high-resolution display panel capable of displaying all pixels of all views simultaneously.
In contrast, we have focused on a newly emerging approach using a layered structure, 12,13 i.e., a few lightattenuating panels (e.g., LCD panels) stacked in front of a backlight.As proposed by Wetzstein et al., 13 many views can be reproduced with reasonable quality using only a few layers having the same resolution as the displayed image for each view.Therefore, this type of display is called a "compressive display."This remarkable technique is achieved by applying a light-field factorization, in which the target light field is factorized into a few transmittance patterns for different layers.This layered structure can also be applied for light-field projections 14,15 and head-mounted displays. 16,17lthough layered light-field displays achieve superior performance in terms of the efficiency of light field representation, they are limited in resolution: an object with larger pop-out is reproduced with stronger blurring due to the upper bound of the spatial frequency for the display.In this paper, to solve this problem, we propose a structure composed of higher resolution (i.e., with finer pixels) binary layers and a lower resolution, multibit backlight.The purpose of this structure is to increase the upper bound of the spatial frequency and simultaneously to decrease the total number of bits for the display.Increasing the upper bound of the spatial frequency expands the range of depth over, which objects are reproduced clearly and thus reduces the blurring of objects with large pop-out.Simply increasing the layer resolution, however, degrades the efficiency of the light-field representation.Hence, limiting the layers' transmittance to binary (on/off) levels enables us to reduce the total number of bits for the display, i.e., the efficiency of light-field representation is improved.In other words, we argue that we can enjoy the benefit of improved spatial frequency while maintaining the efficiency advantage of the layered display by simultaneously increasing the resolution and reducing the bit depth per pixel of the layers.Although increasing the layer resolution has already been proposed by Hirsch et al., 18 we believe that this paper is the first to also consider the layers' bit depth and the display efficiency in relation to the total bits.Our research suggests what type of display panels is required for light-field displays: highresolution binary layers introduce an interesting trade-off between the total bits of the display and the quality of light-field reproduction.
Moreover, we use a low-resolution backlight, whose pixels can take multibit values, to compensate the number of intensity levels, which would be quite limited with only binary layers.[We are not the first to combine a special backlight with layers for a light-field display.Wetzstein et al. 13 used a directional backlight (i.e., a low-resolution lightfield emitter) composed of a lenslet array and a high-resolution display panel, to expand the display's field of view.In contrast, we use a simple low-resolution backlight (nondirectional) to compensate the number of intensity levels while maintaining the efficiency of the light-field representation.]With the layers' bit depth reduced to 1 (binary), the number of intensity levels that can be expressed by the layers is insufficient, causing artifacts around gradation regions of the displayed images.The low-resolution backlight thus efficiently increases the number of intensity levels of the display with only a slight increase in the total number of bits.In this configuration, a light field is effectively factorized into the high-resolution binary layers and the low-resolution, multibit backlight.
Overall, the contributions of this paper are mainly twofold: 1.For a layered light-field display, we show that increasing the layer resolution causes an increase in the upper bound of the spatial frequency, thus expanding the range of depth over, which objects are reproduced clearly and decreasing the blurriness of objects with large pop-out.2. We propose a structure for a layered light-field display composed of higher resolution binary layers and a lower resolution, multibit backlight, which improves both the upper bound of the spatial frequency and the display's efficiency in terms of total bits.This paper is an extension of our previous conference paper, 19 where the structure for a light-field display using high-resolution and low-bit-depth layers was analyzed.We have added the idea of using a low-resolution multibit backlight to further improve the quality of displayed light fields.
2 Principles of Layered Light-Field Display

Baseline Structure
Figure 1 shows the structure and configuration of a layered light-field display, in which two light-attenuating layers, such as LCD panels, are stacked in front of a backlight.First, we describe a baseline structure consisting of layers with the same resolution as the displayed image for each view, as shown in Fig. 1(a).In this case, an outgoing light ray, a collection of which constitutes the light field produced by the display, can be described by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 3 2 6 ; 6 2 9 L ðiÞ ðs; t; x; yÞ ¼ B 0 aðx; yÞbðx þ s; y þ tÞ; (1) where aðx; yÞ and bðx; yÞ denote the transmittances of the rear and front layers, respectively, B 0 is a constant representing the luminance of a uniform backlight, and ðs; tÞ is the outgoing direction.To visualize a desired 3-D image on this display, we need to prepare light-field data composed of a set of multiview images.These images correspond to the views to be observed from different directions.More specifically, each image is associated with the target light field by ; t e m p : i n t r a l i n k -; e 0 0 2 ; 3 2 6 ; 4 9 8 Lði; j; x; yÞ ¼ I i;j ðx; yÞ; where I i;j ðx; yÞ is the image observed from the viewpoint (or viewing direction) ði; jÞ.Then, the transmittance patterns for the layers, aðx; yÞ and bðx; yÞ, are optimized to reproduce the given data as accurately as possible.Formally, the optimization is given as This optimization is formulated as nonnegative tensor factorization.Specifically, iteration is applied to alternately update the front and rear layers according to a multiplicative update rule.During updating, thresholding is also applied so that the transmittance values of the layers are limited within [0,1].
As proposed by Wetzstein et al., 13 if layer devices that run faster than human visual perception are available, timemultiplexing over several consecutive frames can also improve the quality of the displayed light field, but this is outside the scope of this paper.L ðiiÞ ðs;t;x;yÞ¼Bðbx∕Sc;by∕ScÞ aðNxþn x ;Nyþn y ÞbðNxþn x þs;Nyþn y þtÞ; where Bðx; yÞ denotes the luminance of the low-resolution backlight.It is assumed that a single light ray is generated by aggregating N 2 neighboring pixels on each layer.In this structure, the luminance of the low-resolution backlight Bðx; yÞ is also optimized, as are the layer transmittance patterns, aðx; yÞ and bðx; yÞ, for a target light field Although this optimization is formulated in a different form from the conventional one (Sec.2.1, baseline structure), we can solve it similarly.Specifically, aðx; yÞ, bðx; yÞ, and Bðx; yÞ are updated alternately according to the multiplicative update rule.

Reducing layer bit depth
The light-field factorization proposed by Wetzstein et al. 13 assumes that the transmittance values of the layers are continuous.In contrast, we propose to reduce the bit depth per pixel to improve the display's efficiency in terms of the total number of bits.When the transmittance values are not continuous but discrete, the optimization for the layers and backlight becomes a combinatorial optimization, which is NP-hard.Therefore, we use the following empirical method.As described above, the transmittance patterns of the layers are updated iteratively.In this iteration, we gradually tighten the threshold so that the pixels of the layers become discrete values.Figure 2 shows the case of a bit depth of two as an example.We tighten the threshold only for the layers because the backlight can take continuous values.

Upper Bound of Spatial Frequency
Following previous studies, 13,20 we analyze the upper bound of the spatial frequency of a display consisting of two layers, as shown in Fig. 3.The horizontal axis represents the depth, with the two layers located at 0 and 1, and the vertical axis represents the spatial frequency.Parameter N is a scale factor for the layer resolution; that is, the layer resolution is N times finer than the displayed images in width and height.To clearly display an object at a certain depth, the spatial frequency's upper bound at that depth should be greater than the Nyquist frequency of the displayed images.The blue line in the graph indicates the case of N ¼ 1, for which the resolution of the layers is the same as that of the displayed images.The upper bound takes a maximum  equidistant between the two layers decreases as it diverges from there.
Here, we are interested in the range of depth over which the upper bound of the spatial frequency is not below the Nyquist frequency of the images to be displayed (we call this the "effective range"), because we assume that the displayed images are band-limited below the Nyquist frequency.In the case of N ¼ 1, the "effective range" is from 0 to 1.In other words, an object can be displayed clearly only within the two layers.When the layers have N times higher resolution; however, the upper bound is also multiplied by N accordingly.As a result, the "effective range" expands, and the object can be displayed clearly even outside the two layers.

Resolution and Bit Depth for Layers
First, we evaluated the effects of the layers' resolution (indicated by the scale factor N) and bit depth on the reproduced light field.We calculated the transmittance patterns of the layers from a target light field, simulated the light field reproduced by the display, and evaluated the reproduction quality by measuring the peak signal-to-noise ratio (PSNR) for the target light field.A "Lego Truck" dataset 21 was used as a target light field after converting it to grayscale and resizing it to 512 × 384 pixels.This dataset has 17 × 17 views, of which 13 × 13 views were used for these experiments.In this experiment, a uniform backlight was used.Figure 4 shows the results.The PSNR values in this figure were calculated from the mean square errors over all pixels (512 × 384) and all views (13 × 13).With N ¼ 1, the display quality greatly deteriorated as the bit depth decreased.With N increased, however, reasonable quality could be achieved even with a low bit depth.From this result, we can state that the layers' bit depth per pixel can be reduced without degrading the display quality if the layer has a higher resolution than the displayed image for each view.We also measured the PSNR values for individual views to evaluate the display quality in terms of view directions, as shown in Fig. 5.With N ¼ 1, not all views were correctly reproduced even using eight bits.With N ¼ 6, however, we achieved relatively good quality over all views.Directions closer to the center tended to have better quality.
We expect that higher PSNR values might be achieved by further increasing the resolution, but because of the heavy computational cost, we did not test any configurations above N ¼ 6.Moreover, note that the quality of the layered display depends not only on the upper bound of the spatial frequency but also on the low-rankness of the target light field. 20We expect that the quality limitation of this compressive display will be further addressed in future work.
Next, to evaluate the efficiency of the light-field representation, we investigated the relation between the compression ratio of the display and the quality of the reproduced light field.The compression ratio of the display is defined as the total bits of the display divided by the total bits of a target light field.(If the target light field is composed of 13 × 13 views and each view has 512 × 384 pixels and 8 bits/ pixel, then the total bits of the target light field are 13 × 13 × 512 × 384 × 8 ¼ 265;814;016 bits.Then, if the display is composed of only two layers, with each layer having 512 × 384 pixels and 8 bits∕pixel, then the total bits of the display are 2 × 512 × 384 × 8 ¼ 3;145;728 bits.In this  case, the compression ratio of the light-field representation is 1.18 × 10 −2 .)We changed the horizontal axis in Fig. 4 to represent the compression ratio of the display and replotted the data as shown in Fig. 6.The baseline layered structure (N ¼ 1, 8-bit layers) is plotted as a yellow star.It achieved a high compression ratio, but the upper bound of the spatial frequency was low because N ¼ 1; in other words, the effective range in which the object could be clearly represented was narrow, as described in Sec. 3.With the resolution of each layer set to N times that of the displayed images to increase the upper bound of the spatial frequency, the total bits of the display also increased, by a factor of N 2 .The total bits, however, could be reduced by decreasing the layers' bit depth while maintaining quality.Therefore, we conclude that it is possible to achieve a high-quality light-field display while maintaining the high efficiency of light-field representation by simultaneously increasing the resolution and reducing the bit depth of the layers.

Backlight Resolution
The effect of the backlight resolution on the quality of the reproduced light field was also evaluated.The resolution and bit depth of the layers were set to N ¼ 6 and 1, respectively, and only the resolution of the backlight was varied.Figure 7 shows the results.As expected, as the backlight resolution increased, the display quality improved.Even when the backlight had a much lower resolution (S ¼ 16) than the displayed image for each view, the display quality was sufficiently higher than the case with the uniform backlight.From this result, we conclude that it is beneficial to use  a pixelized backlight instead of a uniform one, even with extremely low resolution.

Subjective Comparison of Different Structures
Finally, we used simulations to subjectively compare different light-field display structures, and Fig. 8 shows the results.The first group of images (a) was obtained with the baseline structure, in which the layers have the same resolution and bit depth as the displayed images.As shown in the green box, a region with large pop-out was reproduced with blurring.
The second group (b) shows the results obtained with a structure composed of layers with six times higher resolution in width and height than the displayed images.Because of the higher resolution, the effective range of depth was expanded, and thus, the region with large pop-out could be clearly reproduced.The total bits of the display, however, increased by a factor of 36, and the display's compression ratio became much worse.The third group (c) shows the results obtained with a structure composed of high-resolution and binary layers.Both a high compression ratio and clear display throughout the depth range could be achieved with this structure.As shown in the red box, however, staircase artifacts appeared in gradation regions because the number of intensity levels that could be expressed by the display was limited.Finally, the fourth group (b) shows the results obtained with the combination of high-resolution binary layers and lowresolution backlight proposed in this paper.The backlight resolution was set to one-eighth that of the displayed image in width and height.The entire displayed image was clear, without blurring, and the gradation appeared natural.
In addition, because the backlight had a low resolution, the efficiency decline was negligible compared with that of the structure with the uniform backlight.Figure 9 shows the images displayed by the proposed structure to different viewpoints.We confirmed that the parallax can be reproduced depending on the viewpoints.From these results, we conclude that the combination of the high-resolution binary layers and the low-resolution backlight could efficiently reproduce the target light field with high quality.

Hardware
For future work, we will develop a prototype of a layered light field display with our proposed structure to demonstrate that it can clearly reproduce an object with large pop-out while keeping the total bits of the display low.In this paper, as a preliminary step, we constructed a display model composed of acrylic transparent sheets on which layer patterns and a low-resolution backlight pattern were printed, as shown in Fig. 10.We used the "Lego Truck" dataset 21 to generate the transmittance patterns of the layers and the luminance of the backlight for the baseline and proposed structures.The results photographed 1 m away from the display models.(For all experiments reported in this paper, we assumed that the images composing the target light field were orthographic, i.e., strictly speaking, that the target light field could be observed if the display was viewed from infinity.Because the reproduced light field was sufficiently dense in the angular direction; however, the images did not deteriorated even when the display was observed from a finite distance.Specifically, we confirmed that we could observe good images both at 0.5 m and farther from the display.)and notable close-ups are shown in the top and bottom rows, respectively, of Fig. 11. Figure 11

Conclusion
We have proposed a structure for a layered light-field display with high-resolution binary layers and a low-resolution, multibit backlight.The increased layer resolution increases the upper bound of the spatial frequency while limiting the layers' transmittance to binary (on/off) values reduces the total number of bits for the display.The low-resolution backlight compensates the number of intensity levels, which would be quite limited with only binary layers.We experimentally validated that the proposed structure can reproduce a light field with high quality and high efficiency.
For future work, we may consider colorization.The simplest structure for colorization would consist of color filter arrays inserted in all layers of the display.As another approach, Hirsch et al. 18 proposed a layered structure in which only one layer has a color filter.We could also adopt field-sequential colorization, 22 in which RGB channels  are alternately displayed over time.In addition, to further validate the effectiveness of the proposed structure, we will develop a display prototype.

Fig. 1
Fig. 1 Structures and configurations of layered light-field displays.(a) Baseline structure and (b) proposed structure.

2. 2
Figure 1(b) shows the proposed structure composed of two high-resolution layers and a low-resolution backlight.We assume that the resolutions of each layer and the backlight are N times finer and S times sparser, respectively, in width and height than the displayed image for each view.Then, we model an outgoing light ray by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 6 3 ; 4 7 8

Fig. 2
Fig. 2 Illustration of a threshold function for layer optimization (bit depth of 2).

Fig. 3
Fig.3Upper bound of spatial frequency as a function of depth.

Fig. 4
Fig. 4 Quality of displayed images with layers of various resolutions and bit depths.

Fig. 5
Fig. 5 Quality of displayed images for individual view directions.

Fig. 6
Fig. 6 Quality of displayed images as a function of the compression ratio of the display.

Fig. 7 Fig. 8
Fig. 7 Quality of displayed images as a function of backlight resolution.
(a) shows the results for the baseline structure, in which the layer resolution was 512 × 384, the same as that of the displayed image for top-left view top-right view bottom-left view bottom-right view

Fig. 9
Fig. 9 Displayed images to different viewpoints (the lines added to show the parallax among them).

Fig. 11
Fig. 11 Displayed images obtained with stacked acrylic transparent sheets (brightness corrected).(a) Baseline structure and (b) proposed structure.