The recent decade has seen tremendous growth in the machine vision applications as they become the new driving forces in precision engineering. Many of these applications highly benefit from the improvements of camera calibration techniques, which primarily fall into two categories: photogrammetry calibration^{1} and self-calibration.^{2} Compared with the latter which does not use any specific calibration target, the former usually provides higher accuracy by using a precisely-constructed calibration target and is therefore the interest of this letter. A representative photogrammetry approach is Zhang's method, which uses a planar pattern as the calibration target and is well known for its flexibility, robustness, and low cost.^{1} This method and its extensions implemented in the OpenCV library are considered as the conventional technique in this letter.

The literature in the relevant field addresses two major sources of error that affect the camera calibration results. The first one is the imperfection of the calibration target. Since the assumptions made for the conventional camera calibration are based on a perfect planar target with ideal patterns, the imprecision of the calibration target may lead to inaccurate results.^{3} The second problem is the uncertainty in locating the control points directly from the geometries of the calibration patterns in the captured raw images which suffers from lens distortion as well as perspective distortion.^{4}

This letter proposes an advanced geometric camera calibration approach to overcome the aforementioned two problems without loss of the original advantages of the conventional technique. The proposed technique uses a sophisticated lens distortion model that takes the radial, tangential, and prism distortion into account, and achieves a precise localization of the control points with a novel refinement process using a frontal image concept and an advanced digital image correlation (DIC) scheme; in addition, the imperfection of the calibration target can be compensated.

For the ideal pinhole model, assuming that the world coordinate is placed on the calibration target with its surface as the *XY* plane, the relation between the three-dimensional world coordinate of a calibration target point
[TeX:]
\documentclass[12pt]{minimal}\begin{document}${\bm M} = [X_w, Y_w, Z_w]^T$\end{document}
$\mathit{M}={[{X}_{w},{Y}_{w},{Z}_{w}]}^{T}$
and its corresponding location
[TeX:]
\documentclass[12pt]{minimal}\begin{document}${\bm m} = [u,v]^T$\end{document}
$\mathit{m}={[u,v]}^{T}$
in the image plane can be expressed as:

## 1

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} s \left\lbrace \begin{array}{c}\boldsymbol m\\[5pt] 1\\ \end{array}\right\rbrace = \boldsymbol A\left[ \begin{array}{cr}\boldsymbol R &\boldsymbol T \end{array} \right] \left\lbrace \begin{array}{c}\boldsymbol M\\[5pt] 1\\ \end{array} \right\rbrace , \ \ \boldsymbol A = \left[ \begin{array}{c@{\quad}c@{\quad}c}\alpha & \gamma & u_0 \\[5pt] 0 & \beta & v_0 \\[5pt] 0 &0 & 1 \\ \end{array} \right], \end{equation} \end{document} $$s\left\{\begin{array}{c}\mathit{m}\\ 1\end{array}\right\}=\mathit{A}\left[\begin{array}{cc}\mathit{R}& \hfill \mathit{T}\end{array}\right]\left\{\begin{array}{c}\mathit{M}\\ 1\end{array}\right\},\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\mathit{A}=\left[\begin{array}{ccc}\hfill \alpha & \hfill \gamma & \hfill {u}_{0}\\ \hfill 0& \hfill \beta & \hfill {v}_{0}\\ \hfill 0& \hfill 0& \hfill 1\end{array}\right],$$*s*is a scale factor; [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm A}$\end{document} $\mathit{A}$ is the intrinsic matrix, with α and β the horizontal and vertical focal length in pixel unit, γ the skew factor, and (

*u*

_{0},

*v*

_{0}) the coordinates of the principal point; [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm R}$\end{document} $\mathit{R}$ and [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm T}$\end{document} $\mathit{T}$ are the extrinsic parameters that denote the rotation and translation relating the world coordinate system to the camera coordinate system. Due to nonlinear optical distortion, the above equation is not sufficient for accurate camera calibration. In spite of the fact that some very complex models exist,

^{5}in practice they induce more instability rather than accuracy because of the high order distortion components. Here, the lens distortion is compensated by:

## 2

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} u^{\prime } &=&(1+a_0r^2+a_1r^4+a_2r^6)u+s_0r^2 \nonumber \\ & &+\,(p_0+p_2r^2)(r^2+2u^2) ,\nonumber \\ v^{\prime } &=&(1+a_0r^2+a_1r^4+a_2r^6)v+s_1r^2\\ & &+\,(p_1+p_3r^2)(r^2+2v^2) ,\nonumber \\ r^2 &=& u^2+v^2,\nonumber \end{eqnarray} \end{document} $$\begin{array}{ccc}\hfill {u}^{\prime}& =& (1+{a}_{0}{r}^{2}+{a}_{1}{r}^{4}+{a}_{2}{r}^{6})u+{s}_{0}{r}^{2}\hfill \\ & & +\phantom{\rule{0.16em}{0ex}}({p}_{0}+{p}_{2}{r}^{2})({r}^{2}+2{u}^{2}),\hfill \\ \hfill {v}^{\prime}& =& (1+{a}_{0}{r}^{2}+{a}_{1}{r}^{4}+{a}_{2}{r}^{6})v+{s}_{1}{r}^{2}\hfill \\ & & +\phantom{\rule{0.16em}{0ex}}({p}_{1}+{p}_{3}{r}^{2})({r}^{2}+2{v}^{2}),\hfill \\ \hfill {r}^{2}& =& {u}^{2}+{v}^{2},\hfill \end{array}$$*a*

_{0},

*a*

_{1},

*a*

_{2}), (

*s*

_{0},

*s*

_{1}), and (

*p*

_{0},

*p*

_{1}) represent the radial, prism, and tangential distortion coefficients, respectively, (

*u*,

*v*) denotes the distortion-free pixel location, and (

*u*

^{′},

*v*

^{′}) is the corresponding distorted point. To avoid calculation instability during the optimization process, in Eq. 2, (

*u*,

*v*) and (

*u*

^{′},

*v*

^{′}) are usually converted to their normalized terms (

*x*

_{cn},

*y*

_{cn}) and [TeX:] \documentclass[12pt]{minimal}\begin{document}$(x^{\prime }_{cn},y^{\prime }_{cn})$\end{document} $({x}_{cn}^{\prime},{y}_{cn}^{\prime})$ by applying the intrinsic matrix [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm A}$\end{document} $\mathit{A}$ .

Given the locations of the control points in the world coordinate as [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm M_{\bm i}}$\end{document} ${\mathit{M}}_{\mathit{i}}$ and in the image plane as [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm m_{\bm {ij}}}$\end{document} ${\mathit{m}}_{\mathit{ij}}$ , the calibration process involves a nonlinear optimization with the cost function defined as:

## 3

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} S = \sum _{i=1}^k \sum _{j=1}^l ||{\bm m_{\bm {ij}}}-P({\bm A},{\bm \varphi },{\bm r},{\bm T},{\bm M_{\bm i}})||^2, \end{equation} \end{document} $$S=\sum _{i=1}^{k}\sum _{j=1}^{l}\left|\right|{\mathit{m}}_{\mathit{ij}}-P(\mathit{A},\mathit{\varphi},\mathit{r},\mathit{T},{\mathit{M}}_{\mathit{i}}){\left|\right|}^{2},$$*k*and

*l*denote the number of control points in the calibration target and the number of images, respectively, [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm \varphi } = (a_0, a_1, a_2, s_0, s_1, p_0, p_1)$\end{document} $\mathit{\varphi}=({a}_{0},{a}_{1},{a}_{2},{s}_{0},{s}_{1},{p}_{0},{p}_{1})$ ,

*P*denotes the projection of control points onto the image planes according to Eqs. 1, 2, and [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm r}$\end{document} $\mathit{r}$ is the Rodrigues vector presentation of [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bm R}$\end{document} $\mathit{R}$ .

^{1, 6}The optimization is performed using the Levenberg–Marquardt (LM) algorithm.

The method described above relies on the intrinsic assumption that the control points on the calibration target are perfectly positioned. In reality, printing imprecision during calibration pattern fabrication is inevitable, especially for low-cost hardware systems. To deal with this issue, the proposed technique allows the world coordinates of the control points to be unknown, and they will be accurately determined together with the camera parameters. To ensure the uniqueness of the calibration target during optimization, a geometric constraint on three noncollinear control points, named markers, is applied.^{6} The constraint enforces planarity of the markers by setting their *Z* world coordinates to zero, and requires the distance between any two of the three markers to be accurately measured to solve for the scale factor. The markers, normally made at the corners of the calibration panel, also help determine the orientation of the panel.

An accurate ellipse fitting technique can be employed to detect the centers of the calibration circles.^{7} However, as the control points detected in the raw images suffer from lens and perspective distortion, their true locations cannot be accurately determined, and this will further lead to calibration errors. To cope with this issue, a two-step refinement with the frontal image concept, which is free from lens and perspective distortion, is employed. The refinement is conducted after the camera parameters are coarsely determined using the method described above. The raw images are successively undistorted by applying Eq. 2 and then reversely projected to the world coordinate system through using Eq. 1. The frontal image is generated from a direct scaling of the world coordinates. It is noteworthy that while the conventional subpixel edge and circle fitting method can be used to detect the control points in the frontal images, the method is not recommended because the gradient calculation involved in edge detection is sensitive to noise and the circle approximation does not yield high accuracy. On the other hand, as illustrated in Fig. 1, the transformation of each image from each location of the captured calibration board to the frontal image allows using the DIC concept to accurately locate the position of each control point by comparing the frontal image with the synthesized templates. Datta
^{4} achieved subpixel-location detection of the control points by carrying out a quadratic fitting in the neighborhood regions based on their correlation coefficients, but such a peak-finding approach is less accurate than the iterative schemes.^{8} Here, the proposed technique employs a high-accuracy cross-correlation algorithm, and the correlation coefficient function is:^{6, 8}

## 4

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} C = \sum _{i=1}^N {\left[af(x_i,y_i)+b-g\big(x^{\prime }_i,y^{\prime }_i\big)\right]^2}, \end{equation} \end{document} $$C=\sum _{i=1}^{N}{\left[af({x}_{i},{y}_{i})+b-g\left({x}_{i}^{\prime},{y}_{i}^{\prime}\right)\right]}^{2},$$*a*is a scale factor,

*b*is an intensity offset, and

*f*(

*x*

_{i},

*y*

_{i}) and [TeX:] \documentclass[12pt]{minimal}\begin{document}$g(x^{\prime }_i,y^{\prime }_i)$\end{document} $g({x}_{i}^{\prime},{y}_{i}^{\prime})$ denote the intensity values at the

*i*–th pixel in the template and frontal images, respectively. The template pattern is a square subimage of

*N*pixels with its center as the center of the circular target point. Denoting the subimage center as (

*x*

_{0},

*y*

_{0}) and its shift amount between two subimages as (

*u*,

*v*), the correlation shape function considering both translation and rotation is

## 5

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} x^{\prime }_i &=& x_i + u + u_x (x_i-x_0) + u_y (y_i-y_0), \nonumber \\[-6pt]\\[-6pt] y^{\prime }_i &=& y_i + v + v_x (x_i-x_0) + v_y (y_i-y_0),\nonumber \end{eqnarray} \end{document} $$\begin{array}{ccc}\hfill {x}_{i}^{\prime}& =& {x}_{i}+u+{u}_{x}({x}_{i}-{x}_{0})+{u}_{y}({y}_{i}-{y}_{0}),\hfill \\ \\ \hfill {y}_{i}^{\prime}& =& {y}_{i}+v+{v}_{x}({x}_{i}-{x}_{0})+{v}_{y}({y}_{i}-{y}_{0}),\hfill \end{array}$$*u*

_{x},

*u*

_{y},

*v*

_{x}, and

*v*

_{y}are coefficients of the shape function. To determine the eight unknowns (

*u*,

*v*,

*u*

_{x},

*u*

_{y},

*v*

_{x},

*v*

_{y},

*a*,

*b*), the Newton–Raphson algorithm is employed to minimize

*C*in Eq. 4. With the detected (

*u*,

*v*), the location of each control point in the frontal image can be directly determined. Then, these points in the frontal image are reversely projected back to the image plane to achieve hyperaccurate localization of the control points.

The rationale of the proposed method is that although the imprecise control points detected in the raw images lead to inaccurate camera calibration parameters, they offer good initial information about the relation between the camera and each scene which can be further processed to achieve very accurate localization of the control points. This information helps detect the camera intrinsic and extrinsic parameters as well as the world coordinates of the control points with high accuracies. The procedure of the proposed technique is summarized as follows:

Detect the control points in the raw images using the edge detection and ellipse fitting method.

Optimize the camera parameters and world coordinates of the control points using the LM algorithm. This involves a similar procedure as the conventional technique using Eqs. 1, 2, 3.

Obtain the frontal images, and use the calculated positions of the control points in the frontal images as initial guess to refine the positions using the DIC method.

Reversely project the detected control points in the frontal images back to the raw images.

Re-optimize the camera parameters together with the world coordinates of the control points.

To demonstrate the validity of the proposed technique, computer simulation along with real experiment have been conducted. The results suggest that no less than 40 control points and 15 positions are required to ensure a reliable calibration. Both simulation and experiment presented below use a flat panel with 10 × 7 ring patterns whose grid distance is 25.4 mm (as illustrated in Fig. 1), and 20 images at different positions are used in the calibration.

In the simulation, the images are synthesized with camera parameters obtained from a real calibration where the radial, tangential, and prism lens distortion are considered. In addition, Gaussian noise with a standard deviation of 0.2% of the 25.4 mm distance is added to the position of each ring pattern in the *x* and *y* directions, and the images are also blurred and Gaussian noise with σ = 2.0 is added. Figure 2 shows the errors in locating the positions of the control points against their true values. Using Heikkila's technique^{7} to detect the control points directly in the raw images, the effect of perspective distortion where the ellipse center is not the actual circle center is obvious. In contrast, with the frontal image and DIC concepts, the distortion effect is removed, and over 20 times localization improvement in terms of the root-mean-square error (RMSE) in both *x* and *y* directions is achieved. The residual of the calibration, named reprojection error and defined as the RMSE between the projection of the control points to the image planes and their measured locations, is 0.00086 pixels. Table 1 shows the results of the intrinsic parameters. It can be seen that the camera parameters can be accurately retrived once the control points are detected with high precision.

## Table 1

Intrinsic calibration parameters.

True | Retrieved | Error | |
---|---|---|---|

α | 57761 | 5775.99061 | 0.00016% |

β | 57761 | 5775.99101 | 0.00015% |

γ | 0.27671 | 0.27721 | −0.18% |

u_{0} | 10671 | 1067.02621 | 0.0025% |

v_{0} | 7511 | 750.96491 | 0.0047% |

a_{0} | 0.179 | 0.1789 | 0.012% |

a_{1} | −0.334 | −0.3339 | 0.33% |

a_{2} | 28.110 | 28.0892 | 0.074% |

p_{0} | −0.00371 | −0.0037067 | 0.19% |

p_{1} | −0.00229 | −0.0022943 | −0.14% |

s_{0} | 0.00631 | 0.0063065 | 0.16% |

s_{1} | 0.00627 | 0.0063018 | −0.36% |

Unit: pixel.

For the real experiment, Fig. 3 shows the displacement error vectors between the detected and projected positions at each control point of the calibration target. With the conventional method and the proposed one, the reprojection errors are 0.3848 and 0.0098 pixels, respectively. Because of the insufficiency of the lens distortion model and the imperfection of the calibration target, the reprojection error is noticeably polarized with the conventional method in spite of its control refinement. With the proposed method, the error behaves more isotropic, indicating that the lens distortion and the imprecision of the calibration target have been successfully compensated.

## Acknowledgments

This work was supported by the Army Research Office (ARO) under Grant No. W911NF-10-1-0502 and the National Science Foundation (NSF) under Grant No. CMMI-0825806.