1 February 2013 Stereo matching algorithm based on illumination normal similarity and adaptive support weight
Author Affiliations +
Abstract
For the purpose of representing the feature of the gray image, illumination normal of pixels in a two-dimensional gray image plane is proposed, which can reflect the high-frequency information of the gray image. In order to get an accurate dense disparity map based on the adaptive support weight (ASW) approach in RGB vector space, a matching algorithm is proposed that combines the illumination normal similarity, gradient similarity, color similarity, and Euclidean distance similarity to compute the corresponding support weights and dissimilarity measurements. After testing by the Middlebury stereo benchmark, the result of the proposed algorithm shows more accurate disparity than many state-of-the-art stereo matching algorithms.

1.

Introduction

Dense stereo matching is one of the most challenging problems in the field of computer vision. It is an important requirement for many applications, such as three-dimensional (3-D) reconstruction and virtual view synthesis. Generally, the purpose of stereo matching is to find the corresponding pixels between the stereo image pairs captured by two or more cameras in the same scene, and get the disparity map composed by the coordinate difference of corresponding pixels in the stereo image pair.

There are plenty of algorithms available to solve the dense stereo problem, the choice of which depends on whether you want to get the area-based solution by a global method or local method. Stereo matching algorithms can be classified as either global or local. The typical global algorithms, such as graph cuts,1 belief propagation,2,3 and dynamic programming,4,5 can generate a dense disparity map precisely based on global energy function and suitable constraints. However, graph cuts and belief propagation usually consume a great deal of time and memory, and dynamic programming needs specific constraints at different times. Local matching algorithms are known for their simplicity and efficiency, and they also can achieve disparity more accurately. The basic idea of local matching is to estimate the disparity of a pixel in the target image by correlating a support window around the pixel with a similar support window in the reference image. One of the typical local matching algorithms is adaptive support weight (ASW), proposed by Yoon.6 The method in Ref. 6 adopts the fixed-size square windows and allocates a support weight to each pixel in the window according to pixel color and position similarities. The disparity map generated by Ref. 6 can get perfect effects similar to that obtained by global algorithms. The gradient information can indicate the variation between neighboring pixels and the structure of the information,7 as well as decreasing the noise presented in the disparity map more. The method that uses the gradient similarity and local ASW to compute the disparity is proposed.8 Considering that the information will be lost when converting the stereo images from RGB vector space to the CIELab color space, the ASW approach in RGB vector space is proposed.9 The gradient similarity is used to compute the support weight in Ref. 9. However, the difficulties in stereo matching are still at the boundaries of objects and fine texture area, which can be reflected by the high-frequency information. In this paper, we propose to utilize the illumination normal similarity of the two-dimensional (2-D) gray image to compute the support weight based on the ASW in RGB vector space. The experimental results prove that the proposed method can improve the accuracy of the disparity map.

This paper is organized as follows. Section 2 gives the definition of illumination normal in the image space. Section 3 provides a specific explanation for the proposed method. In Sec. 4, experimental results of the proposed method are compared to that of other methods. Conclusions and future work are provided in Sec. 5.

2.

Illumination Normal of Pixels in a 2-D Image Plane

A normal vector almost exists for each point of the object in 3-D space. Given a 2-D gray image, the gray value of every pixel can reflect the illumination information of the object. In order to obtain the illumination normal vector of pixels in a 2-D image, each pixel of the image is regarded as a point in 3-D space. This can be expressed as P[x,y,p(x,y)], where x and y are the horizontal and vertical coordinates, respectively, and p(x,y) is the pixel value at the position (x,y).

The current point and the points located below and to the right of it are used to compute its normal vector. Figure 1 illustrates how the illumination normal vector is calculated. Point A is the current point, B and C are the neighboring points used to compute the normal vector of point A. The 3-D vectors from A to C and from A to B are computed as follows:

(1)

vec1=P(C)P(A)=P[x+1,y,p(x+1,y)]P[x,y,p(x,y)]

(2)

vec2=P(B)P(A)=P[x,y+1,p(x,y+1)]P[x,y,p(x,y)].

Fig. 1

The illustration of calculating the illumination normal vector.

OE_52_2_027201_f001.png

The illumination normal vector of point A is obtained by the cross-product of vec1 and vec2:

(3)

vecN(A)=vec1×vec2=[vecNx(A),vecNy(A),vecNz(A)].

Normalize the illumination normal vector of point A:

(4)

normal(A)=n[nx(A),ny(A),nz(A)],
where

(5)

nx(A)=vecNx(A)[vecNx(A)]2+[vecNy(A)]2+[vecNz(A)]2

(6)

ny(A)=vecNy(A)[vecNx(A)]2+[vecNy(A)]2+[vecNz(A)]2

(7)

nz(A)=vecNz(A)[vecNx(A)]2+[vecNy(A)]2+[vecNz(A)]2.

The modulus images of the illumination normal vector of the image pairs, which are used to analyze the illumination normal similarity of the image pairs, are shown in Fig. 2. The features of the illumination normal vector in Fig. 2(b) and 2(d) reflect the high-frequency information of the gray image pairs. The high-frequency information reflects some small-scale details of the image, which is useful for searching the matching pixels in the stereo pair. In this paper, the character mentioned above is utilized, and the illumination normal similarity of the gray image is combined into the ASW method to compute the weights in the support window.

Fig. 2

The analysis of illumination normal vectors of image pairs. (a) left image (gray); (b) the modulus image of the illumination normal vector of the left image (gray); (c) right image (gray); (d) the modulus image of the illumination normal vector of the right image (gray).

OE_52_2_027201_f002.png

3.

Proposed Algorithm

To assign the support weight more accurately for each pixel in the support window, the similarity measurements are considered. Geng et al.9 adds the gradient similarity in RGB vector space to the gestalt group proposed by Yoon.6 Here, we propose to compute the support weight by a number of multi-similarity measurements, including color similarity, Euclidean distance similarity, gradient similarity, and illumination normal similarity. The support weight of a pixel in a support window can be expressed by

(8)

w(p,q)=exp(Δcpqτc)·exp(Δdispqτd)·exp(Δgradpqτg)·exp(Δnpqτn),
where p, q are the pixels in the reference image, which have the RGB components; q is the pixel in the support window centered at p; Δcpq, Δdispq, and Δgradpq are the color difference, spatial distance, and gradient difference between pixel p(x,y)={pR,pG,pB} and q(x,y)={qR,qG,qB}, respectively; Δnpq is the illumination normal difference between pGray and qGray; pGray and qGray are the gray values of p and q, respectively; and τc, τd,τg and τn are all constant. Δcpq, Δdispq, Δgradpq and Δnpq are calculated by

(9)

Δcpq=pq2=(pRqR)2+(pGqG)2+(pBqB)2

(10)

Δdispq=dispdisq2=(xpxq)2+(ypyq)2

(11)

Δgradpq=Δgradxpq+Δgradypq=gradxpgradxq2+gradypgradyq2

(12)

Δnpq=n(p)n(q)2=[nx(p)nx(q)]2+[ny(p)ny(q)]2+[nz(p)nz(q)]2.

The weights calculated for the pixels in the window between the reference image and the target image are combined in the aggregation step. The dissimilarity E can be expressed by

(13)

E(p,pd)=1NqNp,qdNpdematching(q,qd)·w(p,q),
where pd and qd are the corresponding pixels in the target image for p and q with the disparity d; Np and Npd are the support windows centered at p and pd, respectively; and ematching(q,qd) is the pixel-based matching cost between q and qd, which is obtained by

(14)

ematching(q,qd)=ec(q,qd)·edis(q,qd)·en(q,qd).
Here, ec(q,qd) is the color difference, edis(q,qd) is the gradient difference, and en(q,qd) is the illumination normal difference, as follows:

(15)

ec(q,qd)=exp(Δcqqdλc)

(16)

egrad(q,qd)=exp(ΔgradxqqdλgradxΔgradyqqdλgrady)

(17)

en(q,qd)=exp(Δnq,qdλn),
where Δcqqd, Δgradxqqd, Δgradyqqd, and Δnq,qd are obtained by Eqs. (9)–(12), and λc, λgradx, λgrady, and λn are constant.

Find the best disparity of pixel p by maximizing the dissimilarity function E(p,pd):

(18)

dp=arg maxdDE(p,pd),
where D={dmin,,dmax} is the search range of possible disparities, which is variable for different image pairs.

In order to refine the disparity, the consistency check is used to detect matching errors, as follows:

(19)

DL(x,y)=dR[xdL(x,y),y],
where dL(x,y) is the disparity of the pixels regarding the left image as the reference image and dR(x,y) is the disparity of the pixels regarding the right image as the reference image. Here, dL(x,y) and dR(x,y) are computed separately. The pixels that fail during the consistency check are classified as bad. The support weight for each neighboring pixel in the fixed-size support window centered on the bad pixel is recomputed using the proposed method. The disparity of the pixel with the largest support weight when recomputed is considered to be the disparity of the bad pixel.

4.

Experimental Results

4.1.

Performance Comparison

The stereo image pairs “tsukuba,” “venus,” “teddy,” and “cones,” which are provided by the Middlebury stereo benchmark, were used in our experiments. The size of the support window was fixed at 35×35pixels, and the constants τc=30, λc=40, τd=10, λgradx=20, λgrady=10, τg=30,9 τn=40, and λn=1 were fixed for all the test stereo image pairs. To evaluate the proposed algorithm, we obtained the ground truth provided by Scharstein and Szeliski10 and the disparity maps of the ASW method by Yoon6 in the Middlebury stereo benchmark. The subjective quality comparison of the disparity maps is shown in Fig. 3. Figure 3(a) and 3(b) are the color image and the ground truth, respectively; while Fig. 3(c), 3(e), 3(g) and Fig. 3(d), 3(f), and 3(h) are the disparity maps and the bad-pixel images of disparity maps produced by our algorithm, ASW,6 and ASW-RGB,9 respectively. The error threshold Th in our experiment was 0.5. We found that the smaller the area of gray and black was, the more accurate the disparity map was. Figure 3(d), 3(f), and 3(h) shows that the disparity map of our algorithm is more accurate than those of ASW6 and ASW-RGB.9

Fig. 3

The comparison of results. (a) Left image. (b) Ground truth. (c) Disparity map generated by our algorithm. (d) Bad-pixel image of our algorithm. (e) Disparity map generated by the ASW approach.6 (f) Bad-pixel image of the ASW approach.6 (g) Disparity map generated by the ASW-RGB approach.9 (h) Bad-pixel image of the ASW-RGB approach.9

OE_52_2_027201_f003.png

In order to measure the objective quality of the disparity map, the Middlebury stereo benchmark provides the quality metrics to evaluate the generated disparity map, which can be separated into three parts: all pixels (“all”), nonoccluded regions (“nonocc”), and pixels near depth discontinuities (“disc”). When the absolute difference between generated disparity and ground truth is less than Th, the generated disparity value can be considered correct. Tables 1 and 2 show two cases of Th=1 and Th=0.5. To evaluate the proposed algorithm objectively, it is compared to the results of the other local matching ASW methods.6,8,9,1112.13 The comparison of results is shown in Tables 1 and 2, and the results of the proposed algorithm (ASW-MS) improve the matching accuracy by different degrees.

Table 1

Performance comparison of the proposed method with the Middlebury stereo benchmark (error threshold: 1.0).

AlgorithmTsukubaVenusTeddyConesAverage percent of bad pixels
NonoccAllDiscNonoccAllDiscNonoccAllDiscNonoccAllDisc
ASW-MS (proposed)2.032.638.500.500.962.176.5711.917.12.968.557.935.98
AdaptDispCalib111.191.426.150.230.342.507.8013.617.33.629.339.726.10
VSW121.621.886.980.470.813.408.6713.318.03.379.879.776.29
GradAdaptWgt82.262.638.990.991.394.928.0013.118.22.617.677.436.55
Adaptweight61.381.856.900.711.196.137.8813.318.63.979.798.266.67
ASW-RGB92.563.199.890.911.564.468.4813.519.23.328.918.727.06
BioPsyASW133.625.5214.63.154.2020.411.518.223.24.9313.011.711.2

Table 2

Performance comparison of the proposed method with the Middlebury stereo benchmark (error threshold: 0.5).

AlgorithmTsukubaVenusTeddyConesAverage percent of bad pixels
NonoccAllDiscNonoccAllDiscNonoccAllDiscNonoccAllDisc
ASW-MS (proposed)9.239.9815.36.146.7511.112.218.827.17.8013.715.212.8
GradAdaptWgt87.678.2515.07.518.0512.313.519.628.57.3413.014.813.0
ASW-RGB99.8210.616.17.948.7013.414.320.529.58.1614.015.814.1
VSW1219.219.518.58.178.6513.217.423.231.413.118.320.417.6
AdaptDispCalib1124.624.721.37.147.5615.018.825.229.79.2115.116.717.9
Adaptweight618.118.818.67.778.4015.817.623.934.014.019.720.618.1
BioPsyASW1322.924.424.19.6910.824.518.526.134.512.620.222.320.9

4.2.

Influence of the Illumination Normal

In order to analyze the influence of the illumination normal in the algorithm, the experiment with illumination normal (ASW-MS) and without illumination normal (ASW-MS-outN) were tested. The comparison results are shown in Tables 3 and 4. The data in these tables are the percentages of the bad pixels. For a more accurate disparity map, the smallest percentage of bad pixels is needed. Tables 3 and 4 show that having the illumination normal similarity can get a more accurate result than being without the illumination normal similarity.

Table 3

Performance comparison of the proposed method with and without illumination normal in the Middlebury stereo benchmark (error threshold: 1.0).

AlgorithmTsukubaVenusTeddyConesAverage percent of bad pixels
NonoccAllDiscNonoccAllDiscNonoccAllDiscNonoccAllDisc
ASW-MS2.032.638.500.500.962.176.5711.917.12.968.557.935.98
ASW-MS-outN2.483.049.531.031.734.848.1313.219.13.228.928.556.98

Table 4

Performance comparison of the proposed method with and without illumination normal in the Middlebury stereo benchmark (error threshold: 0.5).

AlgorithmTsukubaVenusTeddyConesAverage percent of bad pixels
NonoccAllDiscNonoccAllDiscNonoccAllDiscNonoccAllDisc
ASW-MS9.239.9815.36.146.7511.112.218.827.17.8013.715.212.8
ASW-MS-outN13.113.717.38.228.9913.214.120.429.67.9413.915.614.7

4.3.

Performance Analysis of the Proposed Method with Different Sizes of the Support Window

The size (35×35) of the support window of the proposed method is the same as that of Refs. 8, 9, and 12. In order to compare our results to Refs. 6 and 11, which used different sizes of support window, we tested the proposed method using the same sizes of support window as those studies used. The other relevant constants in the algorithm are the same as in the case of the 35×35 support window. The size of the support window in our algorithm is odd and the central point of the window is a pixel, while the size (48×48) of the support window in Ref. 13 is even and there is no pixel at the central point of the window, so the comparison result between Ref. 13 and the proposed algorithm is not listed.

Tables 5 and 6 show the comparative results between the proposed method and that of Refs. 6 and 11 with the size (33×33, 21×21) of the support window and error threshold 1.0 and 0.5, respectively. The results show that the size of the support window can affect the matching precision when the other relevant constants are fixed in the algorithm. When the error threshold is 1.0, the average percentage of bad pixels of the proposed method is less than that of other references except Ref. 11. For error threshold 0.5, however, all results of the proposed method are better than those of Refs. 6 and 11.

Table 5

Performance comparison of the proposed method with Refs. 6 and 11 in terms of the size (33×33, 21×21) of the support window in the Middlebury stereo benchmark (error threshold: 1.0).

AlgorithmTsukubaVenusTeddyConesAverage percent of bad pixels
NonoccAllDiscNonoccAllDiscNonoccAllDiscNonoccAllDisc
Adaptweight6 (33×33)1.381.856.900.711.196.137.8813.318.63.979.798.266.67
ASW-MS (33×33)2.042.658.470.541.002.166.5511.917.02.918.547.825.96
AdaptDispCalib11 (21×21)1.191.426.150.230.342.507.8013.617.33.629.339.726.10
ASW-MS (21×21)2.653.327.680.841.402.886.9312.416.92.658.467.236.11

Table 6

Performance Comparison of the proposed method with Refs. 6 and 11 in terms of the size (33×33, 21×21) of the support window in the Middlebury stereo benchmark (error threshold: 0.5).

AlgorithmTsukubaVenusTeddyConesAverage percent of bad pixels
NonoccAllDiscNonoccAllDiscNonoccAllDiscNonoccAllDisc
Adaptweight6 (33×33)18.118.818.67.778.4015.817.623.934.014.019.720.618.1
ASW-MS (33×33)9.4610.215.46.206.8111.012.118.726.87.6913.615.012.7
AdaptDispCalib11 (21×21)24.624.721.37.147.5615.018.825.229.79.2115.116.717.9
ASW-MS (21×21)12.413.216.36.677.3311.011.718.525.36.6212.813.512.9

5.

Conclusions and Future Work

In this paper, based on the multi-similarity measure, we present a new ASW matching algorithm that includes color similarity, Euclidean distance similarity, gradient similarity, and illumination normal similarity. The experimental results show that the algorithm proposed here can improve the matching precision compared to other local ASW matching algorithms. In future research, we plan to investigate other similarity measures to improve our method further.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61271315 and 61171078, and in part by the Research Fund for Doctorial Program of Higher Education of China under Grant No. 20110061110084.

References

1. L. HongG. Chen, “Segment-based stereo matching using graph cuts,” in Proc. IEEE Conf. Comp. Vision Patt. Rec., Vol. 5, pp. I74–I78, IEEE, Harbin, China (2004). Google Scholar

2. Q. YangL. WangN. Ahuja, “A constant-space belief propagation algorithm for stereo matching,” in IEEE Conf. Comp. Vision Patt. Rec., pp. 1458–1465, IEEE, San Francisco (2010). Google Scholar

3. J. SunN.-N. ZhengH.-Y. Shum, “Stereo matching using belief propagation,” IEEE Trans. Pattern Anal. Machine Intell. 25(7), 787–800 (2003).ITPIDJ0162-8828 http://dx.doi.org/10.1109/TPAMI.2003.1206509 Google Scholar

4. C. LeiJ. SelzerY. Yang, “Region-tree based stereo using dynamic programming optimization,” inIEEE Comp. Soc. Conf. Comp. Vision Pattern Recogn., Vol. 2, pp. 2378–2385, IEEE, NewYork (2006). http://dx.doi.org/10.1109/CVPR.2006.251 Google Scholar

5. X. Changet al., “Real-time accurate stereo matching using modified two-pass aggregation and winner-take-all guided dynamic programming,” in Intl. Conf. 3D Imag., Model., Process., Visual., and Trans., pp. 73–79, IEEE, Hangzhou, China (2011). Google Scholar

6. K.-J. YoonI.-S. Kweon, “Adaptive support-weight approach for correspondence search,” IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650–656 (2006).ITPIDJ0162-8828 http://dx.doi.org/10.1109/TPAMI.2006.70 Google Scholar

7. I. ParkH. Byun, “Depth map refinement using multiple patch-based depth image completion via local stereo warping,” Opt. Eng. 49(7), 077003 (2010).OPEGAR0091-3286 http://dx.doi.org/10.1117/1.3463013 Google Scholar

8. L. De-MaeztuA. VillanuevaR. Cabeza, “Stereo matching using gradient similarity and locally adaptive support-weight,” Pattern Recogn. Lett. 32(13), 1643–1651(2011).PRLEDG0167-8655 http://dx.doi.org/10.1016/j.patrec.2011.06.027 Google Scholar

9. Y. GengY. ZhaoH. Chen, “Stereo matching based on adaptive support-weight approach in RGB vector space,” J. Appl. Opt., 51(16), 3538–3545 (2012).JOAOF81464-4258 http://dx.doi.org/10.1364/AO.51.003538 Google Scholar

10. D. ScharstereinR. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” in Proc. IEEE Work. Stereo Multi-Baseline Vision, pp. 131–140, IEEE, Kauai, HI (2002). Google Scholar

11. Z. Guet al., “Local stereo matching with adaptive support-weight, rank transform and disparity calibration,” Pattern Recogn. Lett. 29(9), 1230–1235 (2008).PRLEDG0167-8655 http://dx.doi.org/10.1016/j.patrec.2008.01.032 Google Scholar

12. W. Huet al., “Virtual support window for adaptive-weight stereo matching,” in Vis. Commun. Image Proc., pp. 1–4, IEEE, Tainan (2011). Google Scholar

13. L. NalpantidisA. Gasteratos, “Biologically and psychophysically inspired adaptive support weight algorithm for stereo correspondence,” Robot. Auton. Syst. 58(5), 457–464 (2010).RASOEJ0921-8890 http://dx.doi.org/10.1016/j.robot.2010.02.002 Google Scholar

Biography

OE_52_2_027201_d001.png

Kai Gao received a BS degree in electronics and information engineering from Changchun University of Science and Technology in 2006, an MS degree in detection technology and automation devices from Changchun University of Science and Technology in 2009. Now he is pursuing his PhD at the College of Communication Engineering at Jilin University. His research interests include image and video coding, stereo matching, and virtual view synthesis.

OE_52_2_027201_d002.png

He-xin Chen received MS and PhD degrees in communication and electronics in 1982 and 1990, respectively, from the Jilin University of Technology. He was a visiting scholar at the University of Alberta from 1987 to 1988. In 1993, he was a visiting professor at the Tampere University of Technology in Finland. He currently is a professor of communication engineering at Jilin University. His research interests include image and video coding, multidimensional signal processing, image and video retrieval, and audio and video synchronization.

OE_52_2_027201_d003.png

Yan Zhao received a BS degree in communication engineering in 1993 from Changchun Institute of Posts and Telecommunications, an MS degree in communications and electronics in 1999 from the Jilin University of Technology, and a PhD in communications and information systems in 2003 from the Jilin University. She was a postdoctoral researcher at the Digital Media Institute of the Tampere University of Technology in Finland in 2003. In 2008, she was a visiting professor at the Institute of Communications and Radio-Frequency Engineering at the Vienna University of Technology. She currently is an associate professor of communication engineering. Her research interests include image and video coding, multimedia signal processing, and error concealment for audio and video transmitted over unreliable networks. She is a member of IEEE.

OE_52_2_027201_d004.png

Ying-nan Geng received BS and MS degrees at the College of Communication Engineering in Jilin University. At present, she is working toward her PhD at the College of Communication Engineering in Jilin University. Her research interests are stereo matching, image and video coding, and virtual view synthesis.

OE_52_2_027201_d005.png

Gang Wang received a BS degree in electronics engineering from Changchun University of Technology in 1999, and an MS degree in signal processing from Jilin University in 2005. Now he is pursuing his PhD at the College of Geo-Exploration Science and Technology of Jilin University. His research interests include wireless communication application on geo-exploration and hyperspectral image communication.

© The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Kai Gao, Kai Gao, He-xin Chen, He-xin Chen, Yan Zhao, Yan Zhao, Ying-nan Geng, Ying-nan Geng, Gang Wang, Gang Wang, "Stereo matching algorithm based on illumination normal similarity and adaptive support weight," Optical Engineering 52(2), 027201 (1 February 2013). https://doi.org/10.1117/1.OE.52.2.027201 . Submission:
JOURNAL ARTICLE
8 PAGES


SHARE
Back to Top