17 July 2013 Edge-preserving down/upsampling for depth map compression in high-efficiency video coding
Author Affiliations +
Optical Engineering, 52(7), 071509 (2013). doi:10.1117/1.OE.52.7.071509
Abstract
An efficient down/upsampling method to compress a depth map efficiently within the high-efficiency video coding (HEVC) framework is presented. A different edge-preserving depth upsampling method is proposed by using both the texture and depth information. We take into account the edge similarity between depth maps and their corresponding texture images as well as the structural similarity among depth maps to build a weight model. Based on the weight model, the optimal minimum mean square error upsampling coefficients are estimated from the local covariance coefficients of the downsampled depth map. The upsampling filter is combined with HEVC to increase coding efficiency. The objective results demonstrate that we achieve a maximum bit rate saving of 32.2% compared to full resolution method and 27.6% compared to a competing depth down/upsampling method on depth bit rate. The subjective evaluation showed that our proposed method achieves better quality in synthesized views than existing methods do.
Deng, Yu, Zhang, Feng, and Liu: Edge-preserving down/upsampling for depth map compression in high-efficiency video coding

1.

Introduction

The research and development in three-dimensional (3-D) video are capturing the attention of the research community, application developers, and the game industry. Many interesting applications of 3-D video—such as 3-D television (3DTV), free-viewpoint television, 3-D cinema, gesture recognition systems, and other consumer electronics products—have been developed. An attractive 3-D video representation is a multiview video plus depth (MVD) format,1 which allows rendering numerous viewing angles from only two to three given input views. However, MVD results in a vast amount of data to be stored or transmitted, and efficient compression techniques for MVD are vital for achieving high 3-D visual experience with constrained bandwidth. In addition, the introduction of MVD format allows generating an arbitrary number of intermediate views with low-cost depth image–based rendering2 techniques, but the quality depends on the accuracy of the depth maps.3,4 Thus, in this article, we concentrate on the compression of depth information in an MVD format.

A new video coding standard for high efficiency video coding (HEVC)5 is now being finalized with a primary focus on efficient compression of monoscopic video. Preliminary results have already demonstrated that this new standard provides the same subjective quality at 50% of the bit rate compared to H.264/AVC High Profile. Recently, JCT-3DV has been formed for the development of new 3-D standards, including extensions of HEVC. Since depth maps generally have more spatial redundancy than natural images, the depth down/upsampling can be combined with HEVC framework to increase coding efficiency. There have been some works proposed to compress a downsampled depth map at the encoder in the H.264/AVC framework.67.8.9 MPEG 3DV experiments also demonstrate that this down/upsampling-based depth coding approach can improve the depth map coding efficiency.10 At the same time, 3D-AVC Test Model11 successfully exploits the possibility of subsampling depth data by the factor of 2, which substantially increases compression efficiency. Since the quality of the synthesized views depend on the accuracy of the depth map information, depth coding-induced distortion not only affects the depth quality but also the synthesized view quality. Therefore, depth down/upsampling method at the decoder needs to be carefully designed to guarantee synthesized view quality.

Classical techniques, such as pixel repetition, bilinear, or bicubic interpolation cause jagged boundaries, blurred edges, and annoying artifacts around edges. Bilateral filter is a widely used edge-preserving filtering technique, where the weights of the filter are selected as a function of a photometric similarity measure of the neighboring pixels. Besides that, a joint bilateral filter12 is proposed by using auxiliary information from high-resolution images, which is beneficial for edge preserving. The concepts of bilateral and joint bilateral filter have been used for in-loop filtering1314.15 and postfiltering1617.18 on reconstructed depth images. Liu et al.15 designed a joint trilateral in-loop filter to reconstruct the depth map that takes into account both the similarity among depth samples and that among corresponding texture pixels. Wildeboer et al.16 proposed a joint bilateral upsampling algorithm by utilizing the high-resolution texture video in the process of depth upsampling; they calculated a weight-cost based on pixel positions and intensity similarities. Ekmekcioglu et al.17 exploited an adaptive depth map upsampling algorithm with a corresponding color image in order to obtain coding gain while maintaining the quality of the synthesized view. Recently, Schwarz et al.18 introduced an adaptive depth filter utilizing an edge information from the texture video to improve HEVC efficiency. However, the texture-assisted joint bilateral filter for depth image suffers from the texture copy problem. The edge-directed interpolation techniques recover sharp edges while suppressing pixel jaggedness and blurring artifacts by imposing accurate source models. Li and Orchard19 proposed a new edge-directed interpolation (NEDI) algorithm for natural images, which exploits image geometric regularity by using the covariance of a low-resolution image to estimate that of a high-resolution image. Asuni and Giachetti20 improved the stability of NEDI by using edge segmentation. Zhang et al.21 estimated the low-resolution covariance adaptively with improved nonlocal edge-directed interpolation. Since NEDI needs a relatively large window to compute the covariance matrix for each missing sample, it may introduce spurious artifacts in local structures due to nonstationary structures and result in incorrect covariance estimate.

Preserving the edges of depth maps is important for improving the synthesized view quality. This article proposes a novel edge-preserving depth upsampling method for down/upsampling-based depth coding using both the texture and depth information. The optimal minimum mean square error (MMSE) upsampling coefficients are estimated from the local covariance matrix of the downsampled depth map. By using an adaptive weight model, which takes into account both the structural similarity within the depth map and the edge similarity between the depth map and its corresponding texture image, our proposed method is capable of suppressing artifacts caused by the different geometry structures in a local window.

The remainder of this article is organized as follows. Section 2 describes the depth map coding framework and details the proposed down- and upsampling algorithms. Section 3 presents some experimental results and comparative studies and Sec. 4 concludes the article.

2.

Proposed Method

Figure 1 shows the framework of the proposed depth map encoder and decoder based on a HEVC codec. We utilize the efficiency of HEVC and concentrate on depth down/upsampling to increase coding efficiency and synthesized view quality. The encoder contains a preprocessing block that enables the spatial resolution reduction of depth data. Then the resulting depth map is encoded with HEVC. For the decoding process, a novel edge-preserving upsampling (EPU) is utilized to upsample the spatial resolution of the decoded depth map, especially on object boundaries, by taking the depth and texture characteristics into account. The motivation is that, on one hand, with an efficient HEVC codec, encoding the depth data on the reduced resolution can reduce the bit rate substantially. On the other hand, with an efficient upsampling algorithm, encoding the depth data on the reduced resolution can still achieve a good synthesized view quality. The novelty of this approach is the two key components of the proposed depth map coding framework: reliable median downsampling and EPU filter. In what follows, we give a detailed description of the down/upsampling algorithm.

Fig. 1

Down/upsampling coding scheme in HEVC.

OE_52_7_071509_f001.png

2.1.

Depth Prefiltering

We use an edge detection–based prefiltering before downsampling to preserve important objection boundaries and remove potential high frequencies in constant depth regions. Figure 2 illustrates a block diagram of the prefiltering. It contains three blocks of boundary layer detection, Gaussian blur, and boundary enhancement. A Canny edge detector22 divided the input depth map into the smooth region and the boundary layer. The filtered depth map contains the enhanced boundaries and the blurred smooth region.

Fig. 2

Block diagram of the depth map prefiltering.

OE_52_7_071509_f002.png

The smooth depth region is then filtered using a bilateral filter. The bilateral filter is an edge-preserving filtering technique where the kernel filter weights are modified as a function of the photometric similarity between pixels, thus giving higher weights to pixels belonging to similar regions and reducing the blurring effect in the edges, where photometric discontinuities are present. Let us consider Dfull(p) as the intensity of the pixel at position p and Ωp its neighborhood and the resulting filtered pixel Dfilt(p) obtained with the bilateral filter is:

(1)

Dfilt(p)=1kpqΩpDfull(p)f(p,q)g[Dfull(p)Dfull(q)],
where
f(·)=exp(pq22σf2)
is a two-dimensional (2-D) smoothing kernel also known as the domain term that measures the closeness of the pixels, and
g(·)=exp{[Dfull(p)Dfull(q)]22σg2}
is the range term that measures the intensity similarity of the pixels. The scalar kp=qΩpf(p,q)g(Dfull(p)Dfull(q)) is a normalization factor. In our experiment, the filter size is 15×15, and σf=3.5 and σg=15.

The boundary layer is enhanced by a Gaussian high-pass filtering. We mark a 7-pixel wide area along depth edges as the boundary layer which includes foreground and background boundary information. In our experiment, the boundary layer is enhanced by a Gaussian high-pass filter with a size of 3×3 and σ=0.5.

2.2.

Depth Downsampling

Reducing the resolution of encoding depth can reduce the bit rate substantially, while the loss of resolution also degrades the quality of the depth map. Therefore, the downsampling method should be designed for better recovering of the quality of high-resolution depth after decoding. Conventional linear downsampling filters create new unrealistic pixel values which will spread to the entire depth map in the upsampling procedure, further causing distortion in the synthesized view.

Considering the above, we propose a reliable median filter for depth downsampling. The proposed reliable median filter is a nonlinear downsampling filter. The downsampled results are obtained in two steps:

  • Step 1 We obtain those reliable depth values Rm×n of a block Wm×n of the depth map in detail as follows.

    Define Wm×n as a m×n block of the depth map, we sort all pixels in Wm×n by e intensity value and the mean value for Wm×n is defined by

    (2)

    sort[W(x,y)]={D1,D2,Dm×n}Dave=mean(Wm×n).

    The pixels in Wm×n are categorized into low and high groups by Dave as

    (3)

    W(x,y){Sfg,ifW(x,y)>DaveSbg,otherwise.

    Let max(Wm×n) and min(Wm×n) be the maximum and minimum values of the block Wm×n, respectively. If the maximum value max(Wm×n) and minimum value min(Wm×n) are very close, then the local window Wm×n is a smooth region, so all pixels in the block Wm×n are reliable candidates; otherwise, the local window Wm×n contains foreground and background regions, so only the pixels belonging to the foreground region are chosen as the reliable candidates in order to avoid background covering foreground. The reliable candidates formulated as follows:

    (4)

    Rm×n={Wm×n,max(Wm×n)min(Wm×n)Sfg,otherwise,
    where the threshold T0=10 in our experiment.

  • Step 2 The median of the reliable data is the filtering results. The reliable median filter for depth downsampling is

    (5)

    Dd(x,y)=median(Rm×n).

    The proposed reliable depth downsampling filter has the following merits over other linear filters: (1) it is more robust against outliers; a noisy neighboring pixel does not affect the median value significantly and (2) the median filtering does not create new unrealistic pixel values when the filter straddles an edge since the median value must actually be the value of one of the pixels in the same object.

    The proposed downsampling excludes the nonsimilar neighbor pixels from the filtering process, thus discriminating from pixels that belong to different objects. It is a generalized form of a 2-D median downsampling filter.6,7 When the downsampling factor is 2, the reliable-based median filter can be simplified as the 2-D median downsampling filter.

2.3.

Edge-Preserving Depth Upsampling

After HEVC encoding and decoding, the downsampled depth map d is needed to be recovered to the original full resolution for rendering virtual views. An EPU is proposed for depth map reconstruction, utilizing edge information from the corresponding texture frame.

Figure 3 gives the sketch map of the upsampling process. Let d denotes the input low-resolution depth map of size M×N. We start with the simplest case of upsampling by a factor of 2 and assume D is the high-resolution depth map after upsampling to size 2M×2N. We first copy the low-resolution depth map d directly to its high-resolution version D, i.e., D2x,2y=dx,y and then interpolate D2x+1,2y+1, D2x+1,2y, and D2x,2y+1 from D in two steps. The first step is to interpolate D2x+1,2y+1 from its four nearest neighbors D2x,2y, D2x+2,2y, D2x,2y+2, and D2x+2,2y+2 along the diagonal directions of a square lattice. The second step is to interpolate other missing samples D2x+1,2y and D2x,2y+1 from a rhombus lattice in the same way after a 45-deg rotation of the square grid. Therefore, the implementation of all the pixels is almost identical. For example, D2x+1,2y+1 is calculated as

(6)

D2x+1,2y+1=k0D2x,2y+k1D2x,2y+2+k2D2x+2,2y+k3D2x+2,2y+2,
where k0, k1, k2, and k3 are interpolation coefficients.

Fig. 3

Covariance estimation based on local statistics from a local window.

OE_52_7_071509_f003.png

Since natural images typically consist of smooth areas, textures, and edges, they are not globally stationary. A reasonable assumption is that the sample mean and variance of a pixel are equal to the local mean and variance of all pixels within a fixed range surrounding. The validity of the assumption is applied in most statistical image representations in previous work as shown by Kuan23 and Lee.24 Moreover, compared with natural images, depth maps are more homogenous mostly, therefore, it is reasonable to treat depth maps as being locally stationary. Furthermore, optimal MMSE linear interpolation is successful in the image recovery in that it effectively removes noise while preserving important image features (e.g., edges). Thus, under the assumption that depth image can be modeled as a locally stationary Gaussian process, according to classical Wiener filtering theory, the optimal MMSE linear interpolation coefficients K=[k0,k1,k2,k3]T are given by

(7)

K=R1r,
where R=E[DDT], D=[D2x,2y,D2x+2,2y,D2x,2y+2,D2x+2,2y+2]T, and r=[D2x+1,2y+1D] are the local covariance at the high-resolution level.

By exploiting the similarity between the high-resolution covariance and the low-resolution covariance, R and r can be estimated from a local window of its low-resolution depth map. As shown in Fig. 3, we estimate R and r based on the local statistics from a local w×w window centered at the interpolated pixel location, leading to

(8)

R^=p1c1Tc1+p2c2Tc2++pw2cw2Tcw2=n=1w2pncnTcnr^=p1c1TD1+p2c2TD2+pw2Dw2Tcw2=n=1w2pncnTDn,
where Dn is the known pixel from low resolution d(D2x,2y=dx,y),pn is the weighting of sample Dn, and cn is a 4×1 matrix whose samples are the four neighbors of Dn along the diagonal directions, as shown in Fig. 3.

We note that the covariance estimation in NEDI (Ref. 19) with each sample inside the w×w window having the same weight pn=1/w2 is a special case of ours. In edge-preserving depth map upsampling, the samples D0=[D1,D2,,Dw2]T used to calculate coefficients should have similar geometric structure (i.e., edge direction) with the region centered in the interpolated pixel D2x+1,2y+1. Otherwise, in the presence of a sharp edge, if a sample is interpolated across instead of along the edge direction, large and visually disturbing artifacts will be introduced. In this article, we introduce a weight model for each sample and make samples adaptive to the local characteristics of the depth map.

Aiming to take advantage of the geometric similarity within depth maps as well as the photometric similarity between the depth map and its corresponding texture sequence, we propose to use the pixel distance, intensity difference, and texture similarity to build a weight model with

(9)

pn=pnc+pnd+pnt3,
where pnc depends on the distance between the current pixel position (xn,yn) and the center pixel position (xc,yc), which is measured by the Euclidean distance as

(10)

dist(n)=(xcxn)2+(ycyn)2,
and given by

(11)

pnc=max_distdist(n)max_distmin_dist,
where max_dist and min_dist are the maximum and minimum pixel distance within the window W, respectively.

The quantity pnd in Eq. (9) is a function of the absolute difference difD(n)=|DnDc| between the current pixel value Dn and center pixel value Dc in a depth map, and given by

(12)

pnd=max_difDdifD(n)max_difDmin_difD,
where max_difD and min_difD indicate the maximum and minimum depth intensity difference within the window W, respectively.

Different from texture sequences, the depth maps usually come with their accompanying texture video. It is known that they share similar structures, especially along the edges. Therefore, an additional term pnt measuring this similarity is introduced in Eq. (9). Similar to depth samples’ similarity pnd, the third subcost function pnt means the similarity of texture intensity between the current texture pixel value In and the center texture pixel value Ic in texture image. It is measured by the absolute difference difT(n)=|InIc| as given in

(13)

pnt=max_difTdifT(n)max_difTtmin_difT,
where max_difT and min_difT indicate the maximum and minimum texture intensity differences within the window W, respectively. With this texture similarity, even if the reconstructed depth map has certain artifacts around the edges, we can still utilize the corresponding texture information to provide help with depth boundaries.

With the weight model in Eq. (9), we can estimate R and r using Eq. (8). Consequently, the interpolation coefficients K=[k0,k1,k2,k3]T needed in Eq. (6) can be obtained from Eq. (7) as

(14)

K=(n=1w2pncnTcn)1n=1w2pncnTyn.

3.

Experimental Results

We study the performance of the proposed depth down/upsampling method for depth map coding using two types of test sequences in resolutions (1920×1088pixels: Poznan_Street,25 Undo_Dancer and 1024×768pixels: Newspaper, Bookarrival26), with YUV 4:2:0 8 bits per pixel (bpp) format. The test materials are provided by MPEG and depth maps have been estimated from original video based on the depth estimation reference software.27 For Poznan_Street sequence, view 3 and view 5 are selected as reference views. For Undo_Dancer sequence, view 4 and view 6 are selected as references and view 5 as the target view. For Book-Arrival sequence, view 8 and view 10 are selected as references and view 9 as the target view.

For each reference depth map, we downsample it by a factor of two before encoding using the 3-D-HEVC test model (HTM) version 4.128 with quantization parameters (QP) 24, 28, 32, 40, and 44. The texture video sequences have a fixed QP 32. Thirty frames are coded for each sequence. Other encoder configurations follow those specified in the common test conditions29 for 3-D video coding. No multiview video coding is applied. After the decoding is finished, the intermediate view is synthesized by view synthesis reference software.30 The efficiency of the proposed method is evaluated through rate distortion (RD) performance and subjective quality of synthesized view. For the RD curves, the x-axis stands for the total bit rate for the two depth maps and two texture sequences, and the y-axis is the Y_PSNR of the synthesized views compared to the original view.

3.1.

Coding Performance

First, the performance of the down/upsampling-based depth coding scheme is compared to that of full scale. For the full scale method, the depth maps are encoded without down/upsampling using HTM reference software. Figure 4 shows the RD curves comparison between the proposed method and the full resolution method.

Fig. 4

Rate distortion (RD) performance comparison of encoding depth maps between full scale and down/upsampling based method. (a) Book Arrival and (b) Newspaper.

OE_52_7_071509_f004.png

It can be seen that the down/upsampling-based depth maps coding scheme outperforms full-scale depth map coding at lower bit rates. Specially, as shown in Table 1, bit rate saving is up to 32.2% for “BookArrival” and 27.6% for “Newspaper” on depth maps, whereas it is 8.9% for “BookArrival” and 5.3% for “Newspaper” on total bit rates. Since the bit rates of depth maps are only about 10 to 20% that of texture sequences, the gain of bit rate saving is less for total bit rate than that for depth bit rate. At higher bit rates, the frames are encoded with larger QP and preserve much more details in texture. Therefore, the influence of down/upsampling distortion becomes larger. The RD performance is below the full scale case with high bit rate.

Table 1

Performaces (bitrate versus synthesized view PSNR) of full scale and down/upsampling depth map coding.

Full scaleDown/upsampling
Quantization parameter (QP)T1+T2 (QP32)D1+D2 (kb/s)2T+2D (kb/s)Y_PSNR (dB)D1+D2 (kb/s)2T+2D (kb/s)Y_PSNR (dB)
S1
241186.91854.538.12493.11160.737.69
28638.31305.937.79268.5936.137.41
32667.6353.91021.537.35151.9819.536.98
40122.8790.436.3456.2723.836.04
4474.9742.535.6135.470335
BD_rate (2T+2D)=8.9%%
BD_rate (2D)=32.2%
S2
QPT1+T2 (QP32)D1+D2 (kb/s)2T+2D (kb/s)Y_PSNR (dB)D1+D2 (kb/s)2T+2D (kb/s)Y_PSNR (dB)
24791.11442.735.93339.3990.934.67
28447.91099.535.17194.6846.234.14
32651.6261.8913.434.35112.8764.433.67
4091742.632.8541.6693.232.12
4455.2706.831.8626.3677.931.89
BD_ rate (2T+2D)=5.3%
BD_rate (2D)=27.6%%

Second, we evaluate the performances of the proposed downsampling, upsampling, and prefiltering method separately. In order to test the effectiveness of the proposed downsampling algorithm, the original depth maps are downsampled using different downsampling methods while being upsampled with the same EPU algorithm. Figure 5(a) shows the RD curves of different depth downsampling method, “Median downsc.” stands for the downsampling as proposed by Oh in Ref. 6 and “Reliable Median downsc.” that described in Sec. 2.2.

Fig. 5

RD performance of proposed (a) downsampling, (b) upsampling, and (c) down/upsampling method.

OE_52_7_071509_f005.png

In order to test the effectiveness of the proposed upsampling algorithm, the decoded depth maps are upsampled using different interpolation algorithms while downsampled using the same reliable median downsampling before encoding. Figure 5(b) shows the RD curves of different upsampling methods, where “EPU upsc.” stands for the upsampling method as described in Sec. 2.3, “NEDI upsc.” stands for the upsampling method in Ref. 9, and “EWOC upsc.” and “JBU upsc.” stand for the recent published upsampling algorithms in Refs. 17 and 18, respectively.

Figure 5(c) shows the RD curves to compare the coding efficiency of the proposed methods against two advanced down/upsampling-based depth coding methods. “Method 1” is the combined method, where depth maps are preprocessed as described in Sec. 2.1, then downsampled as described in Sec. 2.2 and upsampled as described in Sec. 2.3. “Method 2” is the result with the proposed downsampling and upsampling. “EWOC” stands for the depth map coding method in Ref. 18. “JBU” stands for the down/upsampling algorithm for depth maps in Ref. 17, where depth maps are downsampled with median filtering and upsampled with JBU. No prefiltering is applied to either “Method 2” or JBU method.

We can see that both the proposed upsampling method and downsampling show good performance as shown in Fig. 5(a) and 5(b). By combining the proposed prefiltering, downsampling and the upsampling methods, additional gain can be achieved as shown in Fig. 5(c).

3.2.

Synthesized View Quality

Depth map downsampling and upsampling directly impact the subjective quality of synthesized views. Figures 6Fig. 78 compare our proposed upsampling method with EWOC upsampling and the JBU in terms of the subjective quality of the synthesized views at the decoder after depth map encoding at the same rate. It is seen that the synthesized images with EWOC interpolation and JBU upsampling exhibit strong jaggedness around object edges. On the other hand, for our proposed upsampling method, it employs texture image which provides the edge information in the upsampling procedure; therefore, our method obtains clearer and smoother edges along object boundaries.

Fig. 6

The synthesized view [Undo_dancer, quantization parameter (QP)=40 of view 5] with depth map upsampled with (a) (d) proposed, (b) (e) JBU upsampling, and (c) (f) EWOC method at the decoder.

OE_52_7_071509_f006.png

Fig. 7

The synthesized view (Poznan_street, QP=28 of view 4) with depth map upsampled with (a) (d) proposed, (b) (e) JBU upsampling, and (c) (f) EWOC method at the decoder.

OE_52_7_071509_f007.png

Fig. 8

The synthesized view (Bookarrival, QP=28 of view 8) with depth map upsampled with (a) (d) proposed, (b) (e) JBU upsampling, and (c) (f) EWOC method at the decoder.

OE_52_7_071509_f008.png

3.3.

Computational Complexity Analysis

We show the processing times in the Table 2. Depth map encoding time for the proposed method contains downsampling time (low-pass filtering and downsampling procedures), HEVC encoding time, decoding time, and proposed upsampling time. Depth coding time for full scale contains HEVC encoding time and decoding time. S1 and S2 denote the Newspaper and Book-Arrival sequences.

Table 2

Processing times of full scale coding and down/upsampling coding.

Full scaleDown/upsampling
Enc T [s]Dec T [s]Down T [s]Enc T [s]Up T [s]Dec T [s]
S171646289262018288336094
Sum T=71935  sSum T=24362  s
S28971231440142293113800144
Sum T=90026  sSum T=40899  s

Since the resolution of encoding video in down/upsampling-based method is less than full scale method, the encoding time of downsampled is far less than that of full scale. Although additional downsampling and upsampling procedures are needed for the down/upsampling method, the overall computation time of down/upsampling-based method is less than the full scale method as shown in Table 2.

4.

Conclusions

We have presented an edge-preserving depth upsampling method for down/upsampling-based depth coding within the HEVC framework. Different from the NEDI algorithm of Ref. 19, we introduced a weight model for each sample that incorporates geometric similarity as well as intensity similarity in both the depth map and its corresponding texture sequence, thus allowing an adaptation of interpolation coefficients to the edge orientation. An evaluation of performance in terms of coded data and synthesized views has been provided. Experimental results show that our proposed interpolation method for down/upsampling-based depth coding improves both the coding efficiency and synthesized view quality.

Acknowledgments

This work is supported in part by the National Science Foundation of China under Grant Nos. 61231010 and 61202301.

References

1. 

K. MullerP. MerkleT. Wiegand, “3-D video representation using depth maps,” Proc. IEEE 99(4), 643–656 (2011).IEEPAD0018-9219http://dx.doi.org/10.1109/JPROC.2010.2091090Google Scholar

2. 

C. Fehn, “Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV,” Proc. SPIE, 5291, 93–104 (2004).PSISDG0277-786Xhttp://dx.doi.org/10.1117/12.524762Google Scholar

3. 

K. SharmaI. MoonS. G. Kim, “Depth estimation of features in video frames with improved feature matching technique using Kinect sensor,” Opt. Eng. 51(10), 107002 (2012).OPEGAR0091-3286http://dx.doi.org/10.1117/1.OE.51.10.107002Google Scholar

4. 

S. S. ZhangS. Yan, “Depth estimation and occlusion boundary recovery from a single outdoor image,” Opt. Eng. 51(8), 087003 (2012).OPEGAR0091-3286http://dx.doi.org/10.1117/1.OE.51.8.087003Google Scholar

5. 

B. Brosset al., “High Efficiency Video Coding (HEVC) text specification draft 8,” ITU-T SG16 WP3, and ISO/IEC JTC1/SC29/WG11, Doc. JCTVC-J1003, Stockholm, CE (2012).Google Scholar

6. 

K. J. Ohet al., “Depth reconstruction filter for depth coding,” Electron. Lett. 45(6), 305–306 (2009).ELLEAK0013-5194http://dx.doi.org/10.1049/el.2009.3182Google Scholar

7. 

K. J. Ohet al., “Depth reconstruction filter and down/up sampling for depth coding in 3-D video,” IEEE Signal Process. Lett. 16(9), 747–750 (2009).IESPEJ1070-9908http://dx.doi.org/10.1109/LSP.2009.2024112Google Scholar

8. 

H. P. Denget al., “A joint texture/depth edge-directed up-sampling algorithm for depth map coding,” in Proc. 2012 IEEE Int. Conf. Multimedia and Expo(ICME’12), pp. 646–650, IEEE, Melbourne (2012).Google Scholar

9. 

M. O. Wildeboeret al., “Color based depth up-sampling for depth compression,” in Proc. IEEE Conf. Picture Coding Symposium (PCS2010), pp. 170–173, IEEE, Nagoya (2010).Google Scholar

10. 

K. KlimaszewskiK. WegnerM. Domanski, “Influence of views and depth compression onto quality of synthesized views,” ISO/IEC JTC1/SC29/WG11, M16758, UK (2009).Google Scholar

11. 

M. HannukselaY. ChenT. Suzuki, “AVC Draft Text 3,” in Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 Doc. JTC3V-A1002, 1st Meeting, Stockholm, SE (2012).Google Scholar

12. 

J. Kopfet al., “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007).http://dx.doi.org/10.1145/1276377.1276497Google Scholar

13. 

K. J. OhA. VetroY. S. Ho, “Depth coding using a boundary reconstruction filter for 3-D video systems,” IEEE Trans. Circ. Syst. Video Technol. 21(3), 350–359 (2011).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2011.2116590Google Scholar

14. 

D. MinJ. LuM. N. Do, “Depth video enhancement based on weighted mode filtering,” IEEE Trans. Image Process. 21(3), 1176–1190 (2012).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2011.2163164Google Scholar

15. 

S. J. Liuet al., “New depth coding techniques with utilization of corresponding video,” IEEE Trans. Broadcast. 57(2), 551–561 (2011).IETBAC0018-9316http://dx.doi.org/10.1109/TBC.2011.2120750Google Scholar

16. 

M.O. Wildeboeret al., “Depth up-sampling for depth coding using view information,” in Proc. 3DTV Conf.: The True Vision—Capture, Transmission and Display of 3D Video (3DTV-CON), pp. 1–4, IEEE (2011).Google Scholar

17. 

E. Ekmekciogluet al., “Utilisation of edge adaptive upsampling in compression of depth map videos for enhanced free-viewpoint rendering, ” in Proc. 2009 16th IEEE Int. Conf. Image Processing (ICIP), pp. 733–736, IEEE (2009).Google Scholar

18. 

S. Schwarzet al., “Adaptive depth filtering for HEVC 3D video coding,” in Proc. 2012 Picture Coding Symposium (PCS 2012), pp. 49–52, IEEE (2012).Google Scholar

19. 

X. LiM. T. Orchard, “New edge-directed interpolation,” IEEE Trans. Image Process. 10(10), 1521–1527 (2001).IIPRE41057-7149http://dx.doi.org/10.1109/83.951537Google Scholar

20. 

N. AsuniA. Giachetti, “Accuracy improvements and artifacts removal in edge based image interpolation,” in Proc. 3rd Int. Conf. Computer Vision Theory and Applications (VISAPP’08), pp. 58–65, Springer (2008).Google Scholar

21. 

X. F. Zhanget al., “Nonlocal edge-directed interpolation,” in Proc. 2009 Pacific Rim Conference on Multimedia (PCM’09), pp. 1197–1120, Springer, Bangkok, Thailand (2009).Google Scholar

22. 

J. F. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 1521–1527 (1986).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.1986.4767851Google Scholar

23. 

D. T. Kuanet al., “Adaptive noise smoothing filter for images with signal-dependent noise,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-7(2), 165–177 (1985).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.1985.4767641Google Scholar

24. 

J. S. Lee, “Digital image enhancement and noise filtering by use of local statistics,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2(2), 165–168 (1980).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.1980.4766994Google Scholar

25. 

M. Domanskiet al., “Poznan Multiview Video Test Sequences and Camera Parameters,” MPEG Doc. m17050, ISO/IEC JTC1/SC29/WG11 (2009).Google Scholar

26. 

I. Feldmannet al., “HHI Test Material for 3D Video,” MPEG/M15413, Archamps, France (2008).Google Scholar

27. 

M. Tanimotoet al., “Depth Estimation Reference Software(DERS) 4.0,” ISO/IEC JTC1/SC29/WG11, MPEG 2008/M16605, London, UK (2009).Google Scholar

28. 

29. 

D. RusanovskyyK. MüllerA. Vetro, “Common test conditions of 3DV core experiments,” in Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 Doc. JTC3V-A1100, 1st Meeting, Stockholm, SE (2012).Google Scholar

30. 

M. TanimotoT. FujiiK. Suzuki, “View Synthesis Algorithm in View Synthesis Reference Software 3.0 (VSRS3.0),” MPEG Doc. M16090, ISO/IEC JTC1/SC29/WG11 (2009).Google Scholar

Biography

OE_52_7_071509_d001.png

Huiping Deng received a BS degree in electronics and information engineering, an MS degree in communication and information system from Yangtze University, Jingzhou, China, in 2005 and 2008, respectively. She is currently working toward the PhD degree in the Electronics and Information Engineering Department, HUST. Her research interests are video coding and computer vision, currently focusing on three-dimensional video (3DV).

OE_52_7_071509_d002.png

Li Yu received the BS degree in electronics and information engineering, the MS degree in communication and information system and the PhD degree in electronics and information engineering, all from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1995, 1997, and 1999, respectively. In 2000, she joined the Electronics and Information Engineering Department, HUST, where she has a professor since 2005. She is a co-sponsor of China AVS standard special working group and working as the key member of China AVS standard special working group. Her team has applied more than 10 related patents and submitted 79 proposals to AVS standard organization. Her current research interests include multimedia communication and processing, computer network, wireless communication.

Biographies and photographs of the other authors are not available.

Huiping Deng, Li Yu, Juntao Zhang, Bin Feng, Qiong Liu, "Edge-preserving down/upsampling for depth map compression in high-efficiency video coding," Optical Engineering 52(7), 071509 (17 July 2013). http://dx.doi.org/10.1117/1.OE.52.7.071509
Submission: Received ; Accepted
JOURNAL ARTICLE
9 PAGES


SHARE
KEYWORDS
Computer programming

Volume rendering

Digital filtering

Video coding

Linear filtering

Video

Optical engineering

RELATED CONTENT

Video-filtering-on-interframes
Proceedings of SPIE (November 22 1999)

Back to Top