With the development of the video game and film industry, realistic lighting has become an attractive hotspot. Many approaches are proposed for producing realistic lighting effects. The plenoptic function proposed in Ref. 1 is a seven-dimensional function that defines the flux of a light ray with the wavelength through the space location () toward the direction () at time . It can be reduced to a five-dimensional (5-D) function if we assume that the scene is static and that only three spectral bands (R, G, B) are considered. A further assumption is that the radiance of a light ray through a space free of internal occlusion remains constant. In that case, the 5-D function can be reduced to a four-dimensional (4-D) function with one spatial dimensional eliminated. Contemporary light field representations are based on the 4-D function.
Light field describes the light spreading in every position and direction in space. McMillan and Bishop2 proposed a 5-D light field representation by taking a set of panoramic images at different positions in a static scene. Levoy and Hanrahan3 presented the most commonly used 4-D light field representation light slab. As illustrated in Fig. 1, a light ray can only be determined by a pair of points on the two parallel planes at arbitrary positions. The point is parameterized by () while point by (). The line connecting the two points indicates a light ray. A lot of work on light field is based on the 4-D representation, such as synthetic aperture imaging,4 light field photography,5,6 and integral imaging.78.–9
Light field-based lighting is one form of image-based lighting. Traditional image-based lighting can only simulate infinitely distant light with RGB values by an environment mapping.10 Debevec11 captured a light probe that was a high-dynamic range (HDR) spherical panoramic image to sample omnidirectional light intensity at its capturing position. Several panoramic images with different exposure times were assembled into a light probe for displaying more levels of light intensity.12 In addition, a complicated device light stage13 was set up by Debevec’s team at the University of Southern California that has been used for producing realistic lighting effects in many famous movies, e.g., Avatar. However, a light probe just records the intensity of light rays incident into a single capturing position, so it can only represent spatially invariant illumination. Incident light field (ILF) was proposed by Unger et al.1415.–16 to generate spatially varying lighting effects. The illumination incident into a free space is captured and represented as a 4-D ILF. The ILF follows the plane-based representation and provides a realistic lighting effect in a relatively small area. But it is difficult to extend to a wider region because the complexity and data volume increase dramatically with spatially varying illumination. Although Unger discussed the possibility of extending ILF-based lighting to full room size in a Siggraph talk,17 it was too brief to show any technical details or results. Some simplifications of light field representation are valid by simplifying the data structure and data volume. Mury et al.18,19 simplified the measurement and reconstruction of light field structure in a finite three-dimensional (3-D) space. Hu et al.20 had tried a nonuniform illumination sampling and representation to improve light field capturing and rendering efficiency. Light source plays an important role in realistic lighting, with much work concentrated on light source estimation. Lehmann and Palm21 introduced an approach named color line search to estimate the color of a single illuminant. Han and Sohn22 introduced a method to estimate the light source direction for face relighting. Shim23 proposed to use a face in an image as a natural light probe11 for relighting. Corsini et al.24 presented a method of estimating the position, direction, and intensity of a real light source by capturing illumination with two mirror balls. Goesele et al.25 employed near-field photometry to measure and reproduce real light source in rendering.
In this paper, a novel scene surface light field representation is proposed to support realistic lighting with spatial variation. In comparison with the existing work, three contributions are made. First, a nonuniform illumination capturing and calibration strategy are employed to extend light field sampling and rendering scope to full indoor space. Second, owing to the scene surface-based representation, an accurate light source estimation of position, shape, direction, color, and intensity is presented. It is greatly beneficial to improve rendering effect and efficiency. Finally, a practical indirect light resampling method based on the scene surface is proposed for enhancing a realistic lighting effect.
The rest of the paper is organized as follows: Sec. 2 describes the nonuniform illumination capturing and calibration strategy. In Sec. 3, the scene surface light field representations of both direct light source and indirect light are presented. In Sec. 4, some rendering results are illustrated. Section 5 summarizes the work presented in this paper.
Illumination Capturing and Calibration
In this section, an illumination capturing and calibration strategy are proposed. The illumination sample data light probes are captured in a real scene and used to record the omnidirectional light intensity at their capturing positions. After the reconstruction of the scene, the light probes calibration is applied on the reconstructed 3-D model.
As shown in Fig. 2, the capture setup consists of a panorama camera placed on a translation stage driven by a numerical control device. The panorama camera is Ladybug3 from point gray, and it contains six 2 Mpixels cameras that enable us to take panoramas from of the full 360 deg sphere. The missing view is toward to the bottom of the camera which we are less concerned with. The camera is able to capture 12 Mpixels fused panoramas by streaming them to disk through a fire wire at 15 fps. If we lower the resolution of the captured panorama, the frame rate could be higher. We choose panoramas with the resolution of to shorten the capturing time.
The translation stage is driven by the numerical control equipment with sub-millimeters accuracy in movement. It is 1-m long with a shortest step of 2 mm. At most, 500 panoramas on the stage can be captured at different positions.
The panorama camera and translation stage are moved inside a room to take panoramas as raw illumination sample data. In order to capture the illumination variation in room size with a smaller data volume at a shorter time, we employ a nonuniform capturing strategy. The illumination distribution is prejudged for the manual selection of capturing locations. More panoramas are captured at the area with more illumination variations, like the edge of a shadow. In addition, the selected locations should cover most of the scope of the room. This capturing strategy could improve sampling efficiency and resampling precision because the most illumination variations are recorded by a certain amount of panoramas which are used to extract light rays incident into their capturing positions. On account of the nonuniform capturing strategy, the capturing positions of panoramas are unknown, so a calibration procedure is required to calculate their relative positions in the room.
Light Probes Composition
We capture panoramas on a translation stage as a group. It means 20 different capturing positions with 5-cm spacing, and eight panoramas with different exposure times from 1/256 to 1 s at each position. Every eight panoramas are composited into the light probe. Then 50 capturing locations are selected following the nonuniform capturing strategy to capture 50 groups around the room. In conclusion, this amounts to 8000 captured panoramas when all is done. They are all automatically captured by control of numerical control equipment. What we need to do is move the translation stage and start the capturing program.
Light probe is an image that records the illumination incident into its capturing position. There are two essential features of the light probe, HDR and panoramic. Light probe samples light rays by its pixels, and each pixel indicates the direction and intensity of a light ray. A regular RGB picture is inefficient to represent the luminance range in real scenes. White area will appear on the picture when overexposed, and black areas when underexposured. HDR composition is introduced to solve this problem. An HDR image is generally composed of a plurality of different exposed pictures from the same scene. As shown in Fig. 3, eight panoramas are captured at the same location with different exposure times from 1/256 to 1 s. Then the eight panoramas are composed with the light probe by the method proposed in Ref. 12.
Panoramas are captured to compose light probe for sampling light rays in all directions. A commonly used method is to take photos aiming at a mirror sphere.11,1516.–17 The view at the back of sphere cannot be seen from the picture and the camera photographs itself on the mirror sphere surface. It means that neither the illumination at the back of sphere and camera can be captured due to the missing view and occlusion. In contrast, we choose a panorama camera to capture the light probe, so as to sample the illumination that failed in the mirror sphere photos.
Figure 4(a) is the light probe example we captured. As is shown, it is an HDR spherical panoramic image. Every pixel in the image except the black area indicates a light ray sample. The positions of pixels determine the directions of light ray samples, and the radiances of these light rays are represented by their relevant pixel values. We can find that most views are captured in the scene except from the bottom of the camera. It does not matter because there are few lights emitting from the bottom in most cases.
The imaging model of the panorama camera is a classical spherical projection model. Suppose that there is a unit sphere located at the optical center of the panorama camera. The -axis points upward and are the horizontal planes across the camera optical center. Then the captured image is able to be mapped onto the surface of the unit sphere. As shown in Fig. 4(b), pixel on the unit sphere surface indicates the light ray incident from point into the sphere center. The light ray can only be determined by the angle pair () and its radiance is set as the pixel value of .
Scene Reconstruction from Kinect Fusion
We employ an open source library called the point cloud library26 which includes Kinect fusion algorithm27 to reconstruct the room. Kinect fusion is a 3-D reconstruction algorithm based on Microsoft Kinect. There are an infrared transmitter, an infrared receiver, and a color camera in Kinect. It is able to capture color image and depth image at the same time. Figure 5 shows an example of the color image and depth image captured by Kinect. Since the infrared sensors and color camera are precalibrated, the pixels of color image and depth image are in a one-to-one correspondence. With the depth information and the pose of camera, which is estimated by the corresponding color image sequence, the object geometry can be reconstructed.
The truncated signed distance function is used to be a volumetric representation of the scene from the depth image, and the iterative closest point algorithm is used to estimate the pose of Kinect from two sequential color frames.28 In conjunction with the two approaches, a fused dense point cloud of the scene is provided. After the point cloud is triangle meshed, the scene surface in the view is generated. As Kinect fusion is unable to reconstruct an unbounded extended area, an incremental surface reconstruction algorithm29 is required to reconstruct the whole scene geometry. The reconstructed area is shifted with the movement of Kinect, so that we can continuously expand the reconstructed surface in an incremental update. The whole scene is reconstructed after the area has been completely scanned.
Light Probes Calibration
The light rays sampled by light probes are reprojected to the reconstructed scene model to set up the scene surface light field representation. Therefore, the relative positions of the light probes in the scene are required to be known in advance. Two calibration processes are needed to get the relationship. One is extrinsic camera parameters’ calibration and the other is the transforming relationship between the camera coordinate system and the world coordinate system.
The extrinsic camera parameters calibration is, in fact, to find out the relative positions of all the light probes. With our setup, light probes are captured on the translation stage with precise movements controlled by numerically control equipment. It is easy to calculate the relative position on the stage
In Eq. (2), is the start position on the stage, is the end position, and is the number of panoramic images captured on the stage. Then, we can obtain the position of the ’th panoramic image in the capturing sequence.
We move the camera with the translation stage in the room to capture light probes. The subsequent problem is to calibrate all the light probes. As the internal relationship has been given in Eq. (2), only the start position and the end position are needed to be calculated. As illustrated in Fig. 6, a panorama camera at different positions always has overlapping fields of view, so structure from a motion (SFM) algorithm is appropriate for estimating the extrinsic camera parameters. We use Bundler,30,31 which is an SFM toolkit, to calibrate the rotation and translation of the panorama camera. Bundler takes a set of images as input and outputs the extrinsic camera parameters in a camera coordinate system. One image is selected and its capturing position is set as the coordinate origin. The rotation and translation of other images’ capturing positions are calculated by Bundler in the selected camera coordinates. Because the amount of light probes are captured around the room and required to be calibrated into the camera coordinates, we finish the calibration work incrementally in groups. Groups of images are calibrated into the camera coordinate system sequentially, and they share the common origin, so that they are in the same coordinate system and their translation vector can be seen as their spatial coordinates.
The scene is reconstructed by Kinect fusion algorithm, while the world coordinate system is set up by the pose of Kinect when it captures the first frame. The starting capturing position of Kinect is the origin of the world coordinate system. In order to calculate the relative positions of light probes in the scene, it is required to align the world coordinate system with the camera coordinate system. We add the first frame of Kinect into Bundler to calculate the rotation and translation from the selected origin light probe to Kinect start position. This matrix is, in fact, the transformation matrix of two coordinates. To calculate the real position of the light probe in the scene, it would be transformed from the camera coordinates to the world coordinates by . Each light probe is marked by an extrinsic parameters matrix , records the rotation , and translation in the camera coordinates. Equation (3) is the transforming equation to transform light probes into the world coordinates. Matrix is the real pose of the light probes in the world coordinates
Scene Surface Light Field Representation
In this section, the scene surface light field representation is presented including direct light source and indirect light. It is more consistent with the actual emission and reflection of light transport than the previous ones, thus it provides accurate direct light source estimation and indirect light resampling.
Direct Light Source Representation
Light source extraction
As shown in Fig. 7(a), light sources only occupy a little area on the light probe, but they contribute most of the illumination in the scene. It is easy to see that extracting the light source from the scene and making a dense representation will greatly improve the rendering efficiency and effect.
Figure 7(a) is the light probe sample in latitude–longitude image format. As shown in this image, the intensity of the pixels in the light sources’ areas is much higher than the other areas, so we employ Otsu algorithm32 to evaluate a binarization threshold according to the image’s pixel intensity. Figure 7(b) is the image segmentation result based on the binarization threshold, and highlighted areas are extracted as light source areas. In addition, we can find that not only the lamps but also some other highlighted areas, e.g., the mirror reflection of windows, are regard as light sources. Such extractions are processed on each light probe.
The relative positions between light probes and recovered scenes have been solved. Consequently, the real emitting locations of the light sources can be extracted from the scene through reprojection of the white areas in Fig. 7(b). As illustrated in Fig. 8, the steps of light sources extraction are as below:
1. The light rays are reprojected from the extracted white areas on the light probes to the reconstructed scene, along the direction () by spherical projection which has been described in Sec. 2.2. As the scene is meshed by triangular patches, the reprojected light rays will intersect with some patches at some points.
2. Step 1 is applied to each light probe.
3. Only a small part of patches contain intersections. These patches except those contain few intersections are regard as light sources areas. A threshold is determined by OTSU algorithm according to the number of intersections each patch contains. The patches that contain intersections less than the threshold value are abandoned for their supposed accuracy errors or discontinuity.
4. The other patches are picked out to construct light sources.
Anisotropic point light sources
Both direct light source and indirect light representations are set up on scene surface. The basis representation of light ray sample is shown as follows:
In Eq. (4), () defines a parameterization of geometry surface and () represents the direction of a light ray. indicates the radiance of a certain light ray and is valued by RGB color in HDR.
The direct light source representation can be seen as a collection of anisotropic point light sources located on the scene surface. Because the scene is reconstructed by triangular mesh and the areas of light sources have been extracted, we can easily create such anisotropic point light sources at the vertexes of triangular patches in light source areas.
Figure 9 is an example of an anisotropic point light source. All the rays from a certain vertex of triangular patch in the light source area are emitting to the capturing positions of the light probes, and form an anisotropic point light source. Each light ray intersects with the light probe at one pixel and the pixel value is set as the radiance of the light ray. Another key issue is to solve the directions of these light rays. Since the positions of the vertexes and the light probes have all been calibrated into the world coordinate system, the directions of their connecting rays can be calculated byFig. 4(b), is the azimuthal angle and is the polar angle, is the direction vector of the light rays, is the projecting vector of in -plane, and and are the unit vectors in -axis and -axis.
Light source resampling
The number of light rays that each anisotropic point light source emits is equal to the number of light probe samples. It is too sparse to represent direct light source, so a resampling process is needed. As illustrated in Fig. 10, the anisotropic point light source is resampled to be continuous in angular dimension. Such a resampling process is executed to all the created point light sources.
The steps of an anisotropic point light source resampling are:
1. Recording the direction () and radiance (relevant color value on the light probe) of each emitting light ray.
2. Mapping the radiances of all the light rays to an angular map. As the discretization of light probes are captured, the map shows a discrete points distribution.
3. A Delaunay triangulation is applied to generate a triangular mesh.
4. Make the radiance of light rays continuous in angular distribution by using the barycentric interpolation in triangle.
After resampling in angular dimension, the point light source is resampled in spatial dimension to get a complete direct light source representation. This resampling is similar to that in angular dimension. Assume that a light ray emits from the light source and the start position is not at a mesh vertex, then it must start from the edge or inside of a triangular patch. As the three vertexes of the relevant triangular patch have been resampled to be continuous in angular dimension, the radiances of three light rays emitting from the three vertexes and toward the same direction with are known. Then the radiance of can be solved by triangular barycentric interpolation with the three known light rays. After finishing the resampling of both angular and spatial dimensions, a dense direct light source representation is established.
Since the direct light source representation is created on the reconstructed scene surface, the position and shape of light sources are accurate. The emission and reflection of light ray samples from the light sources are more consistent with the real situation of light transport than the existing planar parameterization. It improves the resampling accuracy of the direction and intensity of light rays.
Indirect Light Representation
Indirect light represents the illumination that does not emit from the light source, such as light reflected from walls, furniture, windows, etc. To illuminate something, the contribution of indirect light is much smaller than that of the light source, but it is an essential part of realistic lighting, especially to render the details in a dark area. Most light in space is indirect light, so it is infeasible to represent the indirect light like the direct light source due to the large data volume. A sparse data representation and a fast resampling method for the indirect light are fit for balancing rendering effect and time consumption. The indirect light representation is based on the light probes sequence retrieval. A kd-tree structure is used to index light probes sampled in the scene for efficient searching. During rendering, the nearest light probes beside the objects to be rendered are quickly found to resample light rays.
As illustrated in Fig. 11, the bunny is a virtual object rendered by indirect light. , , and are quickly found out by a nearest search approach based on kd-tree with the maximum distance . Light ray is a random sample in ray tracing. Different from a traditional ray tracing algorithm, the whole scene is regarded as a light source. After is projected to the scene surface at the intersection , ray , , and from to the nearest neighbors , , and are sampled to calculate the radiance of .
The radiance of ray sample in ray tracing is solved by33 which defines the weight of the near light rays’ radiance contribution. and are the positions of the virtual objects and the near light probes, while is the threshold of max distance. is the radiance of the light ray sample in ray tracing, such as in Fig. 11. is the radiance of the light rays captured by nearby light probes, such as , , and .
In this section, some rendering results are illustrated. Custom mental-ray shaders for the direct light source and indirect light are implemented, respectively, and then used in global illumination rendering.
Figure 12 is a comparison of rendering results based on the different light field representations. Figure 12(a) is the traditional image based lighting result illuminated by an environment map. Since the illumination is measured by a single point sample, infinitely distant light can only be simulated to light virtual objects. It just fits for performing a uniform illumination due to the missing spatial variation. By contrast, we show a spatially varying lighting effect in Fig. 12(b). The complicated illumination that is occluded by a bonsai is captured and represented in our light field representation, then used in rendering. As shown, the complicated shadow reveals that our light field representation works well in spatially varying lighting.
Figure 12(c) is rendered in Ref. 14 by ILF, which is parameterized on a plane. Also its illumination is captured with a uniform spatially sampling resolution. About light probes are captured to compose an ILF. It shows a realistic lighting effect. Figure 12(d) is our result based on a nonuniform capturing strategy. With prejudging of illumination distribution, 160 light probes are chosen to be sampled at the regions diverse in illumination, e.g., the edge of shadows and spotlighting areas. Also only 40 light probes are sampled at the other regions. A realistic lighting effect almost the same as Fig. 12(c) is produced with less than one-fourth of the light probes sampled. Moreover, smoother and softer shadow edges are displayed in Fig. 12(d), since more light probes are sampled here than that in Fig. 12(c). This comparison figure is also presented in our previous work on nonuniform light field representation.20 The new representation presented in this paper retains the ability to handle illumination spatial variation.
Figure 13 illustrates the lighting contributions of a direct light source and an indirect light. Direct light sources are extracted from the scene and sunshine constitutes the main light source. Figure 13(a) is rendered only by indirect light, 13(b) is merely lighted by direct light, and Fig. 13(c) is a rendering result of the composition of a light source and indirect light. The virtual bunny in Fig. 13(b) has a brighter surface and sharper shadow than 13(a). Obviously, the lighting contribution of indirect light is much less than the direct light source. However, most details in the dark area can be seen in Fig. 13(a) rather than Fig. 13(b) due to the light source occlusion. It is indicated that the indirect light plays an important role in realistic lighting as a light source. Figure 13(c) is the final result of our light field lighting. It shows a realistic lighting effect.
Figure 14 illustrates a fusion effect of a virtual object and a real scene with consistent lighting in a spatially varying illumination environment. The illumination is captured and represented on the reconstructed laboratory surface. The virtual bunny (Stanford bunny) is lighted by the scene surface light field at different positions, and then rendered into the captured scene. The shadow and appearance of the bunny show an obvious difference at different positions, and they are just consistent with the scene illumination. The fusion results demonstrate a realistic lighting effect.
In this paper, a scene surface light field representation is proposed to generate a realistic lighting effect. A complete solution of capturing, processing, representing, and rendering on spatially varying illumination in full indoor space is presented. In contrast to the existing work, a nonuniform capturing and calibration strategy are employed to extend illumination sampling and rendering scope to full indoor space. Then, the captured scene is reconstructed by Kinect fusion, and a scene surface based light field is created, which is more consistent with the real physical property of light transport than the existing planar representation. The rendering results based on the scene surface light field illustrate the realistic lighting effect especially in spatially varying illumination.
The scene surface-based light field is represented on the reconstructed scene model, so the method is suitable for a bounded scene, such as interior space. It cannot treat with unbounded scenes whose reconstruction is not complete. Instead, regular planar representation is applicable. In addition, compared with the existing approaches, our method introduces extra scene reconstruction which will cost more time. In the future, we plan to optimize the scene reconstruction and light probes capturing process to improve the efficiency.
This paper is supported by the National High Technology Research and Development 863 Program of China under Grant (Nos. 2012AA011801 and 2012AA011803) and National Natural Science Foundation of China (No. 61300066).
Tao Yu received his BS.degree from Beijing Institute of Technology and MS degree from Beijing Jiaotong University in 2008 and 2011, respectively. He is currently working toward his PhD at State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing. His research intersts include light field-based lighting and HDR imaging.
Zhong Zhou received BS degree from Nanjing University and PhD degree from Beihang University in 1999 and 2005, respectively. He is a associate professor, PhD adviser, State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China. His main research interests include augmented virtual environment, natural phenomena simulation, distributed virtual environment, and Internet-based VR technologies. He is member of IEEE, ACM, and CCF.
Jian Hu received his BS degree from Sichuan University and MS degree from Beihang University in 2010 and 2013, respectively. His research intersts include light field rendering.
Qinping Zhao is a professor, PhD adviser, the Director of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University and Director-general of Chinese Association for System Simulation. He has published more than 160 papers, 33 national authorized patents, and three books. His research interests mainly lies in virtual reality.