Three-dimensional (3-D) video is a promising technology that can lead the next generation of multimedia services and applications. Recently, the research on 3-D video has become a hot issue due to growing demands of 3-D applications. This tendency will be continued until related technologies, i.e., computer graphics, computer vision, video compression, high-speed processing units, high-resolution displays and cameras, are converged all together.1
The main issue concerning 3-D video producing is how to provide the depth impression with minimal visual fatigues. One of the popular approaches is the use of virtual views generated by a depth-based-image rendering technique.2 Since the depth data describes the distance between the camera and objects in a scene, multiview image generation is achievable. With the generated multiview images, the 3-D displays such as the stereoscopic display or autostereoscopic display can provide better depth impression of 3-D viewing.
In order to support such comfortable 3-D viewing, the moving picture experts group (MPEG) has investigated the 3-D video coding technologies which compress the video-plus-depth data.3 Through intensive investigation works, experts in MPEG have developed both the depth estimation and the view synthesis methods.4 Continuously, they called for proposals on 3-D video coding techniques.5 As responses for the call, many coding tools were proposed for 3-D video coding.6
One of the important problems in producing of 3-D video that current researchers face is the generation of highly accurate depth data.7 Although software-based depth estimation algorithms have been investigated for decades, obtaining precise depth information on texture-less or disoccluded regions indirectly is still an ill-posed problem.8 In order to resolve this problem, a variety of direct depth-sensing devices such as structured light pattern sensors9 and depth cameras10 have been developed to generate accurate depth data in real time. However, due to the expensive cost of equipment, these active sensors are still far from manufacturing 3-D applications. Fortunately, since relatively cheap and compact time-of-flight (TOF) cameras were released, many depth capturing methodologies are being introduced using the TOF camera.
Several types of the camera fusion system have been proposed employing the TOF camera to capture depth data in real time. Lindner et al.11 and Huhle et al.12 developed a fusion camera system configured with one RGB camera and one TOF camera to reconstruct a 3-D model. Kim et al.13 and Hahne et al.14 employed high-resolution stereo cameras and one TOF camera to improve depth quality. Zhu et al.15 presented a depth calibration method to improve depth accuracy. Lee et al.16 improved the depth quality using a combination of the image segmentation and the depth estimation. In our previous work, the framework of 3-D scene capturing using the camera fusion system from capturing to multiview rendering was introduced.17
Although the TOF camera in the camera fusion system provides real-time depth measuring, there exist three depth errors; depth flickering in the temporal domain, holes in the warped depth map, and the mixed pixel problem. First, temporal depth flickering is induced by nonLambertian surfaces of objects; the captured depth values for a static object vary in time. Second, holes in the warped depth map are generated by the viewpoint shifting. The camera fusion system usually refers warped depth data generated by projecting depth information obtained from the TOF camera onto the image plane of the RGB camera; hence, the newly revealed area has no measured depth values. Third, the mixed pixels are generated by false measuring of depth information around object boundaries.18 If the infrared (IR) ray emitted from the TOF camera hits object boundary regions, part of the ray is reflected by front objects and the rest by background objects. Both reflections are received by the TOF camera, resulting in mixed measurement; these mixed pixels seriously degrade the quality of captured depth maps. In this paper, we introduce three depth error reduction methods to improve the accuracy of the captured depth maps.
The rest of this paper is organized as follows. In Sec. 2, the camera fusion system and its depth errors are introduced. In Sec. 3, three error reduction methods are proposed in detail. In Sec. 4, the experimental results are presented. Concluding remarks are given in Sec. 5.
Camera Fusion System and Depth Errors
Camera Fusion System with Depth Camera
As an extension to our previous work in Ref. 17, we configured a camera fusion system using two RGB cameras and one TOF camera, as shown in Fig. 1. The TOF camera is located in the center and two RGB cameras at either side. The center TOF camera captures depth video in real time with relatively low resolution, e.g., , and two RGB cameras capture high-resolution color images up to in real time. The first objective of this camera fusion system is to capture two RGB videos and their corresponding depth videos simultaneously. The second objective is to generate multiview images using captured data. In this work, depth error reduction methods are devised and applied since the quality of the generated multiview images is highly dependent on the accuracy of depth data.
Figure 2 describes the overall framework of the camera fusion system. Building up on the previous work in Ref. 17, three proposed depth error reduction methods are added. As a preprocessing, the camera calibration and image rectification are finished beforehand. The first depth error reduction method is a temporal refinement at depth capturing via the TOF camera. The refined depth data are warped to the RGB camera positions using a 3-D warping. Since the 3-D warping generates hole regions, the proposed hole filling method is applied to the warped depth map. Then, the proposed mixed pixel removal compensates for the depth discontinuity regions. After upsampling on the refined depth map,19 the multiview generation part generates 36 views.
Temporal Flickering on Captured Depth Data
The principle of depth sensing with the TOF camera is measuring the receiving time of light reflected by an object in a scene.20 In detail, a light pulse is transmitted by an IR light source and the range information is determined through the turn-around time with the knowledge of the speed of light. Therefore, the accuracy of the depth measuring is highly dependent on the Lambertian reflectance of object’s surface. If an object has a nonLambertian surface, the reflected light pulse can vary over time. This is the main cause of the temporal flickering.
Figure 3 depicts the temporal flickering of the captured depth data. When the TOF camera captured a scene like Fig. 3(a), we traced the depth values of three points as shown in Fig. 3(b). The first check point was set on the middle of the screen, which is very reflective. The second check point was set on the plastic shoe rack. The third check point was set on the closet which is the most stable point. There was no movement of camera and objects during capturing. Figure 3(c) is the variations of three points over 100 frames. The first check point showed the most flickering effect, whereas the third check point showed minor depth variation. This flickering in captured depth video induces severe visual artifacts in multiview video generation.
Holes in Warped Depth Maps
The depth maps warped from center to left and from center to right views have holes due to the viewpoint shifting, as shown in Fig. 4. The holes consist of one-pixel-width holes and wide holes; the former is generated by rounding errors in pixel mapping and the latter is induced by disoccluded regions at the target viewpoint. Figure 4(b) and 4(c) shows wide holes (black regions) around foreground objects. The one-pixel-width holes can be filled by means of a median filter (MF). However, wide holes can be filled by referring to neighboring pixels since there are no referable depth values in the original depth map at the center.
Typical hole filling method is to use an image inpainting,21 which uses neighboring pixels and their gradient values. However, it does not consider geometrical positions of objects; hence, it generates mixed depth values from both foreground and background objects. In the previous work in Ref. 17, a smaller one among the valid depth values in horizontal direction was used to fill holes; we call this horizontal-direction hole filling. However, this method generates inconsistent depth values.
Mixed Pixels around Object Boundary
As mentioned in Sec. 1, mixed pixels in a depth map often arise around object boundaries when the TOF camera is used. Due to false depth sensing, the captured depth map induces incorrect pixel mapping around object boundaries. Consequently, edges of the depth map are not aligned with object boundaries of the color image. Figure 5 shows an example of mixed pixel problem. The black object in the image has a reflective surface; hence, the captured depth values of object boundaries are not consistent and are unstable, as shown in Fig. 5(b). Therefore, the mixed pixels in the warped depth map should be eliminated before conducting multiview video generation.
Depth Error Reduction Methods
Temporal Enhancement for Depth Flickering
Temporal flickering is inevitable since the TOF camera can neither distinguish reflectivity of objects nor compensate for depth errors during capturing automatically. Therefore, the goal of the proposed temporal enhancement is to reduce temporal flickering for static objects. For this, intensity image provided by the TOF camera is utilized. Since the intensity image has nothing to do with depth measuring, there is no flickering artifact; a static object with constant illumination has consistent intensity values in temporal domain. Using this property, the proposed method detects the flickering pixels and refines the depth data using modified joint bilateral filter (M-JBF).
The input data of the proposed temporal enhancement consists of the current intensity image (), the current depth map () to be enhanced, and the previously refined depth map (), as shown in Fig. 6. First, the flickering depth values are detected by comparing two colocated depth values. In detail, for a pixel , the detector compares two adjacent pixels in both () and (). If the difference is greater than a threshold value th, the depth value of becomes zero, otherwise it has the same depth value of () as
During this process, the detector generates an alpha map indicating the flickering pixels; for the temporally stable pixels and for the flickering pixels.
Next, the enhancement method assigns newly defined depth values to the flickering pixels using M-JBF. Since the flickering depth values are deleted from the captured depth map, M-JBF determines new depth values for the flickering pixels by referring to neighboring depth values. Let and be the target pixels to be determined and the referable neighboring pixels, respectively, and both pixels belong to a kernel . Then M-JBF defines a new depth value as
The constant variables and are the standard deviations for the intensity term and range term, respectively, and they control the similarity of intensity values and the range of neighboring pixels. In Eq. (2), the flickering pixels are removed in assigning weights since their alpha values are zero. Consequently, the proposed M-JBF determines a new depth value for the pixel which is similar to the neighboring depth values.
Hole Filling for Warped Depth Data
The hole regions at the warped depth map are invisible in the depth camera view but visible in the RGB camera view. Therefore, the revealed hole regions at the RGB camera view arise around the foreground objects. With this property, it is reasonable to use the background depth values for the hole region’s depth values. Proposed hole filling method determines a depth value using surrounding background depth values by minimizing noise depth values. One constraint is that the background depth value should be the closest one from the foreground object. Figure 7 describes the flows of the proposed depth hole filling method.
The proposed depth hole filling performs from the left-top pixel to right-bottom in raster scan order. Let and be the warped depth map with holes and the alpha map indicating holes, respectively. If the current pixel is a hole, its alpha value is set to 0; otherwise its alpha value is set to 1. For the hole pixel , the hole filling method determines a virtual depth value by selecting a minimum value among neighboring depth values in a certain window as
This virtual depth value distinguishes the foreground’s depth values among referable depth pixels.
Next, using the virtual depth value and neighboring depth values, the proposed modified-bilateral filter (M-BF) determines a new depth value for the hole pixel. M-BF determines the depth value for a hole centered in as6). The second term of Eq. (6) is the range term which considers the distance between the current pixel and the neighboring pixels.
Note that the proposed M-BF determines the hole’s depth value using the weighted averaging via the background depth value. The hole filled depth values are similar to the background depth value without abrupt depth change.
Mixed Pixel Removal
As mentioned in Sec. 1, the mixed pixel problem induces miss-aligned depth discontinuities with object boundaries in color images. In the multiview video generation, this problem generates boundary noises.22 As shown in Fig. 8, the proposed mixed pixel removal consists of three main steps: edge extraction, mixed pixel detection, and adaptive joint multilateral filtering.
In edge extraction, two edge maps, and , are obtained from the warped and hole filled depth map and its corresponding color image via Canny edge detection. Subsequently, mixed pixels are defined as follows: (1) for an edge pixel given in , an edge pixel in is chosen if belongs to the kernel having at the center position, (2) a mixed pixel is defined by the pixel between and . Figure 9 illustrates the procedure of mixed pixel detection.
In order to determine a new depth value for the mixed pixels, an adaptive joint multilateral filtering (A-JMF) is proposed. The mixed pixel is replaced by the depth value calculated by weighted averaging with its neighboring pixels in .
Formally, the new depth value at via A-JMF is computed by7), if and only if is equal to , A-JMF is carried out. Otherwise, () is directly assigned by .
of A-JMF in Eq. (7) is defined by
By modeling via the Gaussian function, , , and are represented by
In particular, the proposed A-JMF excludes mixed pixels when calculating (). For this, the scaling factor () in controls the degree of reliability of ; the closer is to , the lower the degree of reliability at is. is represented by10), when is equal to , the degree of reliability at is the lowest. Hence, setting to zero leads to () being zero. Therefore, the mixed pixel value is not used in calculation of (). When is not equal to , the degree of reliability at is determined by the distance between and .
Experimental Results and Discussion
The proposed depth error compensation methods are designed for capturing high quality depth maps using the camera fusion system, as shown in Fig. 1. Therefore, each method is applied to the captured depth maps and compared to the results with the conventional methods. After demonstrating the refined depth maps, we conducted additional experiments with the multiview video sequences that were enclosed with corresponding depth data; we compared the refined depth data to the original one. In addition, we measured the processing time of each method to evaluate complexity. The simulation system consists of Intel Core email@example.comGHz processor, 8MB DDR2 RAM, and Windows7 64-bit. The window size for all filters was set to . The standard deviations, , and , were set to 0.1, 0.1, and 0.5 for the filters.
Results on Temporal Enhancement
Since the temporal flickering pixels in the captured depth map have similar characteristics with noise, we employed typical noise filters to the depth map for evaluation. First, an MF and bilateral filter (BF)23 were applied. Second, the joint bilateral filter (JBF)24 that uses the color image to enhance depth values around object boundaries was used. Third, the temporal depth filtering (TDF) with structural similarity (SSIM; TDF + SSIM)25 was performed, which uses an SSIM measure to suppress transition depth errors by considering color variation. The fourth filter is the combination of JBF and filter (KF; JBF + KF).26 Since KF traces the previous status of objects, the temporal consistency can be improved.
In order to compare the performance of each method, the above six methods and the proposed method were applied to the captured depth data via the depth camera, as shown in Fig. 3. We traced the variation of depth values for 100 frames at the point which is the most flickering point in Fig. 3(b). In addition, the processing time of each method was compared since the capturing complexity is important in the camera fusion system. The resolution of the depth map was .
Figure 10 shows the results of the depth values. As can be seen, the proposed method stabilized the depth values in time. Table 1 depicts the average depth values and standard deviations of each method for the point. The proposed method showed the closest average and the smallest standard deviation. Moreover, the proposed method showed fast processing speed in real time. Since the proposed M-JBF is designed from JBF, its processing speed was similar to BF and JBF.
Analysis on depth value and processing speed.
In addition, the results of enhanced depth map for the target object are shown in Fig. 11. To compare the depth errors clearly, all depth maps were converted to color images with a gray-to-RGB conversion method. As can be seen in Fig. 11(b), the depth values of the black screen are spatially inconsistent, whereas the proposed method, as shown in Fig. 11(h), suppressed depth errors and generated spatially consistent depth values. Although JBF and JBF + KF methods generated spatially consistent depth values, the depth values around object boundaries were smoothed severely.
For an objective evaluation, we applied the temporal enhancement methods to the multiview video sequences, i.e., Book_arrival,27 Breakdancers, and Ballet,28 which contain the corresponding depth data. To make a noisy depth data, we added noises onto the region of interest of each frame with a Gaussian noise generator with zero mean and the standard deviation 20. When we apply the proposed method, we measured average standard deviations for the whole region and the processing time in milliseconds for 100 frames, as presented in Tables 2 and 3. Importantly, the standard deviations of the proposed method showed the smaller values compared with the other methods. In the sense of processing speed, the proposed method is rather complex than the MF, the BF, and the JBF. However, those of ours are a competitive method since it suppresses the noise depth values efficiently.
Average standard deviation values for 100 frames.
Average processing speed of the temporal compensation methods.
In order to evaluate the subjective quality, we compared the reconstructed depth maps of Breakdancers sequence. Figure 12(a) and 12(b) is the original color image and the corresponding depth data with noise (noisy depth, ND), respectively. The results of MF and TDF + SSIM still contain noises on the depth map, whereas those of BF, JBF, JBF + KF, and the proposed method removed the noises efficiently, because those filters are based on BF. When we look at the depth maps, it is hard to distinguish the improvement of the proposed method compared with other BF-based methods. However, the proposed method suppresses the temporal flickering depth values efficiently compared to the other methods regarding the results of Table 2.
Results on Depth Map Hole Filling
With the proposed hole filling method, holes in a warped depth map are filled with the M-BF as represented in Eq. (5). Figure 13 presents the example of the hole filling. Using 3-D warping technique, the depth map at center is shifted to the viewpoints of the RGB cameras. In the depth map warped to left view in Fig. 13(b), the holes are generated on the left side of the foreground object. Since the mixed pixels around object boundaries generate pepper-like noises, we used the MF before conducting the hole-filling process, as shown in Fig. 13(c). As a result, the hole regions became clear. Figure 13(d) shows the resultant hole filled depth maps. As can be seen, the holes are completely filled up by keeping the shape of the foreground object.
In order to evaluate the proposed hole-filling method, we tested two hole-filling methods. The first is the inpainting method,21 which is designed for reconstructing a damaged image. Recently, this method is employed for filling holes in the virtual view generation method.4 The second method is a horizontal hole filling (HOR)17 that fills the holes with a lower depth value between the leftmost and rightmost available depth values from the holes. Figure 14 shows the comparison of the resultant depth maps. The inpainting method generated false depth values; the shape of the object has been distorted. The second method, HOR, generated depth errors in-between the fingers. However, the proposed method showed the best results in that the shape of the foreground object has been kept and the holes are filled with the background’s depth values successfully.
For the objective evaluation of the hole-filling method, we tested four multiview video sequences provided by MPEG 3-D video group5: Newspaper, Book_arrival, Balloons, and Undo_dancer. Using the 3-D warping technique, we obtained viewpoint shifted depth maps with holes and applied three hole filling methods. Then, we calculated the peak-signal-to-noise ratio (PSNR) values for the hole filled depth maps. Given the original depth map and the hole filled depth map , is defined as:
The PSNR value of the hole filled depth map is defined as:
Table 4 represents the comparisons of the calculated PSNR values of the hole filled depth maps based on Eq. (12). Overall PSNR values of the proposed methods were higher than those of the other methods. Compared with the inpainting method, the proposed method showed better quality as much as 3.83 dB for Undo_dancer sequence. Table 5 shows the processing speed of each method for 100 frames. Overall processing time of the proposed method was faster than the inpainting method and slower than HOR method. It is because HOR searches a reference depth value only in the horizontal direction, whereas the proposed method uses window-based filtering. In particular, the processing speed of Newspaper sequence was about 14 fps due to wide hole regions.
Comparisons of PSNR values of the hole filled depth maps.
|Test data||Hole depth map (dB)||Inpainting (dB)||HOR (dB)||Proposed (dB)|
Processing speed of depth map hole filling.
For the subjective evaluation, we compared the hole filled depth maps. Figure 15 is a comparison of the hole filled depth maps. Figure 15(a) and 15(b) is the color image and its corresponding depth map of the target view (view 3), respectively. Figure 15(c) is the depth map with holes warped from the reference view (view 1). Figure 15(d) is the result of hole filled depth map using the inpainting method, which has expended depth values toward the background. Figure 15(e) is the resultant depth map of HOR method, which has depth errors marked with a red circle. Figure 15(f) is the result of the proposed method, which showed neither expanded depth values nor depth errors around the foreground objects.
Results on Mixed Pixel Removal
After filling holes in the warped depth map, the proposed mixed pixel removal is followed. In order to show the performance of the proposed method, we captured a scene with a highly reflecting object, as shown in Fig. 16. The black object in the color image, Fig. 16(a), is a highly reflecting material, thus its depth map, Fig. 16(b), has distorted depth values. The proposed method detected mixed pixels, as shown in Fig. 16(c). Figure 16(d) shows the final depth map in which depth values of the object boundary are efficiently registered.
Additionally, the proposed method was applied to 12 ground truth depth maps provided by the Middlebury stereo: art, barn, bull, cone, dolls, laundry, map, moebius, reindeer, rocks, sawtooth, and venus.29 To make distorted depth maps from ground truth data, we artificially added Gaussian noises with a standard deviation value 20 along object boundaries. The proposed method is compared with JBF(Ref. 24) and JMF (Ref. 30). For objective evaluation, the quality of output depth maps is measured by bad pixel rates for nonocclusion regions referring to the ground truth data. Let and be the ground truth depth data and reconstructed depth data, respectively, and be the alpha map representing the occlusion region; 0 indicates an occluded pixel and 1 indicates a nonocclusion pixel. The bad pixel rate is defined as:
Figure 17 shows the comparison of bad pixel rates for the test depth maps. The average bad pixel rates of JBF, JMF, and the proposed method were 13.46%, 12.89%, and 10.59%, respectively. The proposed method produced lower bad pixel rates than JBF and JMF as much as 2.87%, and 2.30%. Figure 18(a)–18(c) demonstrates color image, ground truth depth map and artificially generated depth map of cone. Figure 18(d)–18(f) shows the output depth maps via JBF, JMF, and the proposed A-JMF method. Since the proposed method filters mixed pixels based on the degree of reliability, depth data on cone’s edges become sharp while removing mixed pixels. Further improvement is expected when the parameters of the proposed filter are optimized for each sequence.
Table 6 presents the processing speed of the mixed pixel removal. For this, we took 100 frames of the warped and hole filled depth data captured by the camera fusion system as shown in Fig. 16. As a result, the proposed method showed the slowest speed. It is because the proposed method employs additional steps such as edge detection and mixed pixel detection based on the filtering. Reducing this high complexity is another challenge of our further work. GPU programming is used for the sampling solution.
Processing speed of mixed pixel removal.
|Processing speed (ms)||44.65||34.27||116.28|
Generated Multiview Videos using Depth Data
The error reduced depth data are utilized for the multiview generation process. After the mixed pixel removal, the depth maps are upsampled to the identical resolution of the color image via the multistep depth upsampling method.30 Figure 19 demonstrates the generated multiview images using two color images and their depth data. The top-left image is the original left image and the bottom-right image is the original right image. As can be seen in Fig. 19, the foreground object, a book, moves from right to left as the viewpoint is shifted from 0 to 35. The black regions are the disocclusions due to viewpoint shifting.
Table 7 presents the overall processing speed of each part of the proposed system. As we described above, only the mixed pixel removal showed lowest speed among the proposed methods. Besides, depth upsampling and multiview generation occupied most of computing power. As well as the mixed pixel removal, efficient depth upsampling method and multiview generation are important issues in 3-D video system but those are not the scope of this work. We have plans to develop a fast multiview generation and hole filling with GPU (graphics processing unit) for further works.
Overall processing speed of each process.
|Parts||Process||Image resolution||Processing speed|
|Color image||Capturing from RGB camera||1280×960||31.29||31.96|
|Depth map||Capturing from depth camera||200×200||27.87||35.88|
|Depth temporal enhancement||200×200||22.99||43.50|
|3-D depth warping||320×240||16.43||60.85|
|Depth hole filling||320×240||12.91||77.46|
|Mixed pixel removal||320×240||116.28||8.60|
In this paper, we have proposed three different methods to reduce depth errors in the captured depth map using the camera fusion system. We performed temporal refinement in the depth map captured by the depth camera, hole filling after depth map warping, and mixed pixel removal after hole filling. The temporal refinement method stabilized depth values of static objects, the hole filling method determined the depth value by referring to the background depth values, and the mixed pixel removal aligned depth discontinuities with the object boundaries of the color image. The experimental results showed that the processed depth map had reduced flickering depth values and resulted in less depth errors in the warped depth map. The hole filled and boundary aligned depth map helps to generate more pleasing multiview images.
This work is supported in part by the project on “A Development of Interactive Wide Viewing Zone SMV Optics of 3D Display” of the MKE, and in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (Grant No. 2012-0009228).
A. Kubotaet al., “Multiview imaging and 3DTV—special issue overview and introduction,” IEEE Signal Process. Mag. 24(6), 10–21 (2007).ISPRE61053-5888http://dx.doi.org/10.1109/MSP.2007.905873Google Scholar
C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV,” Proc. SPIE 5291, 93–104 (2004).PSISDG0277-786Xhttp://dx.doi.org/10.1117/12.524762Google Scholar
ISO/IEC JTC1/SC29/WG11, “Vision on 3D video,” in MPEG Output Document, N10357 (2009).Google Scholar
ISO/IEC JTC1/SC29/WG11, “Report on experimental framework for 3D video coding,” in MPEG Output Document, N11631 (2010).Google Scholar
ISO/IEC JTC1/SC29/WG11, “Call for proposals on 3D video coding technology,” in MPEG Output Document, N12036 (2011).Google Scholar
ISO/IEC JTC1/SC29/WG11, “Overview of 3DV coding tools proposed in the CfP,” in MPEG Output Document, N12348 (2011).Google Scholar
R. LarsenE. BarthA. Kolb, “Special issue on time-of-flight camera based computer vision,” Comput. Vis. Image Underst. 114, 1317 (2010).CVIUF41077-3142http://dx.doi.org/10.1016/j.cviu.2010.10.001Google Scholar
D. ScharsteinR. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis. 47, 7–42 (2002).IJCVEQ0920-5691http://dx.doi.org/10.1023/A:1014573219977Google Scholar
D. ScharsteinR. Szeliski, “High-accuracy stereo depth maps using structured light,” in IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn., pp. I/195–I/202 (2003).Google Scholar
M. LindnerA. KolbK. Hartmann, “Data-fusion of PMD-based distance-information and high-resolution RGB-images,” in Int. Symp. on Signals, Circuits & Systems (ISSCS), pp. 121–124 (2007).Google Scholar
M. Lambooijet al., “Visual discomfort, and visual fatigue of stereoscopic displays: a review,” J. Imaging Sci. Technol. 53, 0302011–03020114 (2009).JIMTE61062-3701http://dx.doi.org/10.2352/J.ImagingSci.Technol.2009.53.3.030201Google Scholar
Y. M. Kimet al., “Design and calibration of a multi-view TOF sensor fusion system,” in IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn., pp. 1–7 (2008).Google Scholar
L. M. J. MeestersW. A. IjsselsteijnP. J. H. Seuntiens, “A survey of perceptual evaluations and requirements of three-dimensional TV,” IEEE Trans. Circuits Syst. Video Technol. 14, 381–391 (2004).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2004.823398Google Scholar
J. Zhuet al., “Fusion of time-of-flight depth and stereo for high accuracy depth maps,” in Comput. Vision Pattern Recogn. (CVPR), pp. 1–8 (2008).Google Scholar
E. K. LeeY. S. Ho, “Generation of multi-view video using a fusion camera system for 3D displays,” IEEE Trans. Consum. Electron. 56, 2797–2805 (2010).ITCEDA0098-3063http://dx.doi.org/10.1109/TCE.2010.5681171Google Scholar
C. Leeet al., “3D scene capturing using stereoscopic cameras and a time-of-flight camera,” IEEE Trans. Consum. Electron. 57, 1370–1376 (2011).ITCEDA0098-3063http://dx.doi.org/10.1109/TCE.2011.6018896Google Scholar
R. L. Larkinset al., “Surface projection for mixed pixel restoration,” in Int. Conf. Image Vision Comput., pp. 431–436 (2009).Google Scholar
J. Kopfet al., “Joint bilateral upsampling,” in Proc. of the SIGGRAPH conf. ACM Trans. on Graphics (2007).Google Scholar
T. Moelleret al., “Robust 3D measurement with PMD sensors,” in Technical report, (PMDTec 2005).Google Scholar
A. Telea, “An image inpainting technique based on the fast marching method,” J. Graph. Tools 9, 25–36 (2004).Google Scholar
C. LeeY. S. Ho, “Boundary filtering on synthesized views of 3D video,” in Int. Conf. on Future Generation Communication and Networking, pp. 15–18 (2008).Google Scholar
C. TomasiR. Manduchi, “Bilateral filtering for gray and color images,” in IEEE Int. Conf. Comput. Vision, pp. 839–846 (1998).Google Scholar
O. P. GangwalR. P. Berretty, “Depth map post-processing for 3D-TV,” in IEEE Int. Conf. Consum. Electron., pp. 1–2 (2009).Google Scholar
D. FuY. ZhaoL. Yu, “Temporal consistency enhancement on depth sequences,” in Picture Coding Symp., pp. 342–345 (2010).Google Scholar
M. CamplaniL. Salgado, “Adaptive spatio-temporal filter for low-cost camera depth maps,” in IEEE int. Conf. Emerg. Sig. Processing, pp. 33–36 (2012).Google Scholar
ISO/IEC JTC1/SC29/WG11, “HHI test material for 3D video,” in MPEG Input Document, m15413 (2008).Google Scholar
Microsoft Research 3D Video Download, http://research.microsoft.com/en-us/um/people/sbkang/3dvideodownload/.Google Scholar
A. K. Riemenset al., “Multi-step joint bilateral depth upsampling,” Proc. SPIE 4298, 48–55 (2009).PSISDG0277-786XGoogle Scholar
Cheon Lee received his BS degree in electronic engineering and avionics from Korea Aerospace University (KAU), Korea, in 2005 and MS and PhD degrees in information and communication engineering at the Gwangju Institute of Science and Technology (GIST), South Korea, in 2007 and 2013, respectively. His research interests include digital signal processing, video coding, data compression, 3-D video coding, 3-D television and realistic broadcasting, camera fusion system with depth camera.
Sung-Yeol Kim received his BS degree in information and telecommunication engineering from Kangwon National University, South Korea, in 2001, and MS and PhD degrees in information and communication engineering at the Gwangju Institute of Science and Technology (GIST), South Korea, in 2003 and 2008, respectively. From 2009 to 2011, he was with the Imaging, Robotics, and Intelligent System Lab at The University of Tennessee at Knoxville (UTK), USA, as a research associate. His research interests include digital image processing, depth image-based modeling and rendering, computer graphic data processing, 3DTV and realistic broadcasting.
Byeongho Choi received his BS and MS degrees in electronic engineering from the University of Hanyang, Republic of Korea, in 1991 and 1993. From 1993 to 1997, he worked for LG Electronics Co. Ltd as a junior researcher. In 1997, he joined Korea Electronics Technology Institute (KETI), where he was involved in the development of multiview video, stereo vision and other video systems. He is currently a managerial researcher of SoC Research Center. He is also currently pursuing a PhD degree in the Department of Image Engineering at Chung-Ang University. His research interests include digital image processing, and its application, especially-3DTV and stereo vision systems.
Yong-Moo Kwon received his PhD at Hanyang University in 1992. He is now a principal researcher of Imaging Media Research Center at Korea Institute of Science and Technology (KIST). He has done researche in the area of image-processing technology, multimedia databases, virtual heritage technology, and 3-D media technology. Currently, he is now investigating the tangible socia media technology. The key research issues include real-time gaze tracking for the application to human computer interaction, image-based 3-D modeling, tangible and social media platform, and networked collaboration.
Yo-Sung Ho received both BS and MS degrees in electronic engineering from Seoul National University (SNU), Korea, in 1981 and 1983, respectively, and PhD degree in electrical and computer engineering from the University of California, Santa Barbara, in 1990. He joined Electronics and Telecommunications Research Institute (ETRI), Korea, in 1983. From 1990 to 1993, he was with Philips Laboratories, Briarcliff Manor, New York, where he was involved in development of the advanced digital high-definition television (AD-HDTV) system. In 1993, he rejoined the technical staff of ETRI and was involved in development of the Korea direct broadcast satellite (DBS) digital television and high-definition television systems. Since 1995, he has been with the Gwangju Institute of Science and Technology (GIST), where he is currently a professor in the Information and Communications Department. His research interests include digital image and video coding, advanced coding techniques, 3-D television, and realistic broadcasting.