## 1.

## Introduction

Nowadays, vision-based measurement and control systems are widely used in many fields such as three-dimensional (3-D) reconstruction, manufacturing, motion estimation, and surveillance. These systems always include multiple cameras working cooperatively. As a basic knowledge in multivision system, geometrical relationships between cameras have been described in Refs. 12.3.–4. In order to recover the relationships, traditional solutions^{5}6.7.^{–}^{8} usually place a calibration object with matching features in the cameras’ overlapped field of view (FOV). Using these methods, both intrinsic and extrinsic camera parameters can be well estimated. Considering full view or large-scale vision measurements, a common situation is to deal with cameras with nonoverlapped FOV. Due to lack of FOV, it seems to be impossible to obtain the feature correspondences when using traditional calibration methods. Therefore, calibration for nonoverlapping cameras is an important and challenging work.

Recently, several methods have been presented to solve the problem. A commonly used approach^{9}10.^{–}^{11} is based on large-scale surveying equipment such as theodolites or laser trackers. With these types of equipment, 3-D points of multiple calibration objects for nonoverlapping cameras can be easily obtained. These methods require complex operation and high precision of equipment, which is ponderous and inconvenient, especially for field calibrations. Moreover, the cost of these kinds of equipment is prohibitive. Besides, nonstandard calibration objects which can be “seen” by multiple nonoverlapping cameras are applied in some researches. For example, Liu et al. and Zhang et al.^{12}^{,}^{13} separately use a long one-dimensional target and two planar targets fixed together to calibrate cameras with nonoverlapping configurations. In these methods, the target can be freely moved and each camera only needs partial views of the target. The main restriction in practice is the stability and precision of these large targets. In vision-based robotics, Lebraly et al.^{14} use a planar mirror to create an overlap between views of the different cameras. The impact of the mirror refraction is also studied in the calibration algorithms. Their method is effective and easy to carry out. However, in order to avoid degeneracy, the mirror needs to be placed delicately and the calibration object needs to be small which leads to less precision. In vision-based surveillance, structure from motion has been studied and applied to calibrate multiple cameras.^{15}16.17.^{–}^{18} In these methods, targets’ trajectories need to be estimated based on the motion model generated from the measured positions in the FOV of each sensor. The relative orientation and location of the cameras are calculated using the observed and estimated target position. These methods are suitable for large-scale surveillance networks, but the calibrations need scene information which is hard to obtain in industrial measurements, and the precision remains to be improved.

Dealing with the problem, previous study^{19} utilizes pairs of skew laser lines, which achieve calibration of nonoverlapping cameras. However, as the laser lines need to be directed to the range of the respective cameras, large numbers of line lasers should be added in the system when the cameras’ number increases, which is inconvenient in practical application. In this paper, a novel calibration method using light planes is proposed. The light planes can be generated by a line laser projector or a rotary laser level, as the calibration objects. The coplanarity of light planes provides constraints which are used to recover the camera geometry. Compared to laser lines, the image of laser planes contains more information, which can increase the accuracy of feature extracting and laser planes can cover a larger space, which is more flexible and suitable for field calibrations.

The remainer of this paper is organized as follows. A brief introduction to the camera model and projective transformation is presented in Sec. 2. Section 3 details the calibration method. Main principle and coplanarity constraint are shown in Sec. 3.1. The method of light plane 3-D reconstruction is given in Sec. 3.2. Section 3.3 describes the procedure of camera geometry estimation. Section 4 provides the results on both synthetic and real data. The conclusions are given in Sec. 5.

## 2.

## Notations

In this paper, a two-dimensional (2-D) image point is denoted by $p={[u,v]}^{T}$, a 3-D world coordinates point by $P={[X,Y,Z]}^{T}$. The corresponding homogeneous coordinates are indicated by $\tilde{p}={[u,v,1]}^{T}$ and $\tilde{P}={[X,Y,Z,1]}^{T}$. Based on pinhole camera model, the mapping of 3-D world coordinates point to 2-D image point is described as

## (1)

$$s\tilde{p}=A[Rt]\tilde{P},\phantom{\rule[-0.0ex]{2em}{0.0ex}}A=\left[\begin{array}{ccc}{f}_{u}& \gamma & {u}_{0}\\ 0& {f}_{v}& {v}_{0}\\ 0& 0& 1\end{array}\right],$$If the world coordinate is established on a plane ($z$-axis was perpendicular), then the point on the plane is $\tilde{P}=\phantom{\rule{0ex}{0ex}}{[X,Y,0,1]}^{T}$. Let us redefine $\tilde{P}$ as $\tilde{P}={[X,Y,1]}^{T}$ and denote the $i$’th column of the rotation matrix $R$ by ${r}_{i}$. From Eq. (1), we have

According to the projective geometry, this plane to plane mapping can also be expressed by a projective transform

where $H$ is a $3\times 3$ homography matrix defined up to a scale factor. Let us denote the $i$’th column of $H$ by ${h}_{i}$. From Eqs. (2) and (3), we have## (4)

$$\lambda [\phantom{\rule{0ex}{0ex}}\begin{array}{ccc}{h}_{1}& {h}_{2}& {h}_{3}\end{array}]=A[\begin{array}{ccc}{r}_{1}& {r}_{2}& t\end{array}].$$## (5)

$${r}_{1}=\lambda {A}^{-1}{h}_{1},\phantom{\rule[-0.0ex]{1em}{0.0ex}}{r}_{2}=\lambda {A}^{-1}{h}_{2},\phantom{\rule[-0.0ex]{1em}{0.0ex}}{r}_{3}={r}_{1}\times {r}_{2},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\text{}\phantom{\rule[-0.0ex]{1em}{0.0ex}}t=\lambda {A}^{-1}{h}_{3}$$## 3.

## Method

## 3.1.

### Main Principle and Coplanarity Constraint

A fixed light plane in space can be expressed as different planar equations in respective camera coordinate frames due to the different orientation and position of each camera. Inversely, after applying rigid transforms which represent the geometry between different cameras, the individual planes should coincide with each other. This is what we called coplanarity constraint. Based on this fact, camera geometries can be recovered by placing the line laser projector and reconstructing the light plane several times.

Without loss of generality, two cameras are taken as an instance to interpret the principle, for multiple cameras can be disassembled into several couples. The principle scheme for the calibration setup is illustrated in Fig. 1. These two cameras are set up in the measuring field without any overlapped FOV according to their orientations and positions. Let us denote two cameras by Camera 1 and Camera 2, ${O}_{c1}$ and ${O}_{c2}$ are the camera coordinate frames, respectively. The geometry transform matrix between the cameras is denoted by $[Rt]$. A line laser projector is employed into the field which projects a large light plane, denoted by $\pi $. The projector is set to a position so that the light plane can intersect with both cameras’ view. In order to help the light plane to be seen and reconstructed, a planar pattern board is placed in front of each camera. Thus a laser line is projected on each planar board. By taking images of the planar board in different positions, the equation of plane $\pi $ in each camera coordinates can be obtained.

A plane can also be defined by a point and a normal vector. As shown in Fig. 2, the plane $\pi $ expressed in frame ${O}_{c1}$ is denoted by $\pi ({p}_{1},{n}_{1})$ and in frame ${O}_{c2}$ by $\pi ({p}_{2},{n}_{2})$. After the rigid transformation under $[Rt]$, $\pi ({p}_{2},{n}_{2})$ in frame ${O}_{c2}$ is denoted by $\pi ({p}_{2}^{\prime},{n}_{2}^{\prime})$ in frame ${O}_{c1}$. Since $\pi ({p}_{1},{n}_{1})$ and $\pi ({p}_{2},{n}_{2})$ represent the same plane but in different coordinate frames, $\pi ({p}_{1},{n}_{1})$ and $\pi ({p}_{2}^{\prime},{n}_{2}^{\prime})$ should coincide with each other. Then, we have

with ${n}_{2}^{\prime}=R{n}_{2}$ and $\stackrel{\rightharpoonup}{{p}_{1}{p}_{2}^{\prime}}=R{p}_{2}+t-{p}_{1}$, which yields Here, we get two constraints on geometry transformation matrix $[Rt]$, given one light plane. For solving rotation matrix $R$, at least two constraints like Eq. (8) are needed and for translation vector $t$, at least three constraints like Eq. (9) are needed, which means at least three light planes are needed to solve $[Rt]$. Moreover, the light planes are required not to be parallel with each other, since the parallels provide duplicate constraints.## 3.2.

### Light Plane 3-D Reconstruction

Based on the principle mentioned above, the first step of our method is to reconstruct the light plane in each camera’s coordinates. It is almost a calibration problem of structured light vision which can be found in related literatures.^{20}^{,}^{21} Here is our solution:

Step 1: Get the image of the planar pattern on which the laser line is projecting and correct the distortion.

Step 2: Extract feature points of the pattern and laser line points in the image. In our instance, we adopt a chessboard pattern and used standard corner detection algorithm to extract the corner points. For the laser line points, we use the method presented by Steger.

^{22}Step 3: The correspondence between image and actual planar pattern points can be used to compute a homography matrix $H$ from Eq. (3). Then, the geometry transform matrix $[Rt]$, from planar pattern coordinate frame to camera coordinate frame, can be computed from Eq. (5). Due to the noisy data in practice, we use the results as initial parameters of a nonlinear optimization routine. By minimizing the reprojective errors of the planar pattern points, the well estimated $[Rt]$ can be obtained.

Step 4: Transform laser line points gotten by step 2 from an image coordinate frame to a planar pattern coordinate frame according to Eq. (3). Expand the points’ coordinates with $Z=0$, then transform these points from a planar pattern coordinate frame to a camera coordinate frame under rigid transform matrix $[Rt]$.

Step 5: Place the planar pattern on a another position or orientation, then repeat procedures from step 1 to step 4 until we get adequate laser line points to fit the light plane. Analytically, two placements of planar pattern are enough for plane fitting. But, in practice, more planes are placed to improve accuracy because of noise in data.

Step 6: The equation of the light plane is described by $ax+\phantom{\rule{0ex}{0ex}}by+cz+d=0$. Using all the laser line points $({x}_{i},{y}_{i},{z}_{i})$ gotten after step 5, the light plane can be fit by minimizing the least-squares quantity

By conducting the procedures list above, each light plane can be reconstructed in respective camera coordinates.

## 3.3.

### Camera Geometry Estimation

This section details the camera geometry estimation procedure. Suppose $N(N\ge 3)$ light planes are reconstructed using the solution mentioned above. Then, we get $N$ constraints on rotation matrix $R$; thus, $R$ can be estimated by minimizing the following least-squares quantity derived from Eq. (8)

## (11)

$$\sum _{i=1}^{N}{\Vert {n}_{1i}-R{n}_{2i}\Vert}^{2},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{R}^{T}R=I.$$It is a nonlinear minimization problem due to the orthogonality of $R$. Without employing any nonlinear iterative algorithms, we linearize the problem by mapping rotation matrices to unit quaternions.^{23} Suppose, the quaternion is defined as four-dimensional vector $q={({q}_{0},{q}_{x},{q}_{y},{q}_{z})}^{T}$. Then, we minimize the following quantity

## (12)

$$\sum _{i=1}^{N}{|{A}_{i}q|}^{2},\phantom{\rule[-0.0ex]{2em}{0.0ex}}\text{}\text{}\text{subject to}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{|q|}^{2}=1,$$## (13)

$${A}_{i}=\left[\begin{array}{cc}0& {n}_{2i}^{T}-{n}_{1i}^{T}\\ {n}_{1i}-{n}_{2i}& {\begin{array}{c}({n}_{1i}+{n}_{2i})\end{array}}_{\times}\end{array}\right].$$The problem can be solved by eigenvalue method. The solution is the eigenvector of $\sum _{i=1}^{N}{A}_{i}^{T}{A}_{i}$ associated with the smallest eigenvalue. After the best $q$ is estimated, $R$ can be computed from

## (15)

$$R=\left[\begin{array}{ccc}{q}_{0}^{2}+{q}_{x}^{2}-{q}_{y}^{2}-{q}_{z}^{2}& 2({q}_{x}{q}_{y}-{q}_{0}{q}_{z})& 2({q}_{x}{q}_{z}+{q}_{0}{q}_{y})\\ 2({q}_{x}{q}_{y}+{q}_{0}{q}_{z})& {q}_{0}^{2}-{q}_{x}^{2}+{q}_{y}^{2}-{q}_{z}^{2}& 2({q}_{y}{q}_{z}-{q}_{0}{q}_{x})\\ 2({q}_{x}{q}_{z}-{q}_{0}{q}_{y})& 2({q}_{y}{q}_{z}+{q}_{0}{q}_{x})& {q}_{0}^{2}-{q}_{x}^{2}-{q}_{y}^{2}+{q}_{z}^{2}\end{array}\right].$$Similar to the rotation, we also get $N$ constraints on translation vector $t$. Once $R$ is solved, $t$ can be estimated by minimizing the following least-squares quantity derived from Eq. (9)

The one of requirements when minimizing Eq. (16) is

## (17)

$$\frac{\partial E}{\partial t}=2\sum _{i=1}^{N}[{n}_{1i}^{T}{n}_{1i}(R{p}_{2i}-{p}_{1i})+{n}_{1i}^{T}{n}_{1i}t]=0,$$## (18)

$$At=b,\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{where}\phantom{\rule[-0.0ex]{1em}{0.0ex}}A=\sum _{i=1}^{N}{n}_{1i}^{T}{n}_{1i}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\phantom{\rule[-0.0ex]{1em}{0.0ex}}b=\sum _{i=1}^{N}{n}_{1i}^{T}{n}_{1i}({p}_{1i}-R{p}_{2i}).$$## 4.

## Experiment

## 4.1.

### Synthetic Data

The proposed method is carried out with synthetic data to test the performance in the presence of noise. The synthetic data are created by use of two simulated cameras which have the following properties: ${f}_{u}={f}_{v}=2414$, ${u}_{0}=600$, and ${v}_{0}=500$. The image resolution is $1280\times 1024$. The rotation (in Euler angles) of the cameras is set as ($-4$, 65, $-5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{deg}$) and the translation vector is $[850,-22,-590]\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$. In this experiment, five randomly light planes are generated. Gaussian noise with 0 mean and standard deviation $\sigma $ is added to the image points. Then, the estimated geometry is compared with the ground truth. We vary the noise level from 0 to 1.5 pixels. For each noise level, we perform 100 independent trials and average the results. Figure 3 shows the errors in the recovery of the camera geometry. All errors increase linearly with the noise level.

Technically, the light plane is supposed to be absolutely flat, but it is slightly not in practice, especially when generated by an off-the-shelf line laser projector. Simply, we use a sector of quadratic cone ${x}^{2}+{y}^{2}-{\mathrm{tan}}^{2}a{z}^{2}=0$ for modeling the light plane distorted by lens of the projector, where $a$ is the semiapex angle. According to most specifications of the laser projectors, the curvature of the laser line is no more than $\pm 1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ at 5 m, which means $a$ is bigger than 89.912 deg for modeling a common projector. Based on this curvature model, our method is applied with distorted data. We vary laser line curvature from $\pm 0.25$ to $\pm 3\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$. For each given curvature, Gaussian noise with mean 0 and standard deviation 0.2 pixels is added to the image points, and 100 independent trials are performed. The averaged results are shown in Fig. 4. When the curvature is in the range of $\pm 1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, the relative errors are around 0.8% which is a little worse than the results with just random noise. Even with the curvature in the range of $\pm 3\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$, the relative errors are no more than 3%, which hardly happens in practice.

In order to investigate the performance with respect to the distance of the cameras, the third experiment is carried out. Most of the parameters are maintained except translation vector. The distance is varied from 0.5 to 10 m. For each distance, Gaussian noise with mean 0 and standard deviation 0.2 pixels is added to the image points, curvature $\pm 1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ at 5 m is also applied to the light plane, and 100 independent trials are performed. The averaged results are shown in Fig. 5. The distance almost has no influence on rotation error but translation error increases. The reason is that when calibrating widely separated cameras, in order to ensure that every camera can “see” all the light planes, the orientational variation of light plane is restricted to a small range. In another word, the normal vectors of all light planes have slight differences. This results in degenerate configurations especially in the computation of translation vector. Actually, the condition number of matrix $A$ in Eq. (18) becomes poor gradually with the shrinking range of changes in orientation, which means the results are more sensitive to the noise. Despite this degeneracy our method is still usable. Rotation error is hardly changed and all are below 0.005 deg for all trials. For $\text{distance}=6.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{m}$, the baseline error is $>1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$. And for $\text{distance}=10\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{m}$, the baseline error is around 2 mm. This is adequate for most practical applications.

## 4.2.

### Real Data

The method is used to calibrate a nonoverlapped two cameras vision system, which is shown in Fig. 6. The system consists of two CMOS cameras (Aigo DLC-130) with 12-mm lens. The imager resolution is $1280\times 1024$. The baseline of two cameras is about 1000 mm. The light plane is generated by an ordinary line laser projector. The chessboard contains a pattern of $6\times 6$ squares and the distance between the near square corners is 30 mm. The laser projector is placed under six random positions and orientations to generate six light planes. For each light plane, the chessboard is moved three times in front of each camera. Figure 7 shows the estimated geometry of cameras and reconstructed light planes. In order to evaluate the calibration stability, we also applied our method to all quintuple combinations of six light planes. The results are shown in Table 1. The results are very consistent with each other and standard deviations of all the parameters are very small, which indicate that the proposed method is stable.

## Table 1

Stability of results in all quintuples of light planes.

Quintuple | Rotation(Euler angles) (deg) | Translation vector (mm) |
---|---|---|

(23456) | (−3.473, 64.779, −4.952) | [847.591, −22.327, −586.294] |

(13456) | (−3.472, 64.771, −4.934) | [846.935, −22.291, −586.779] |

(12456) | (−3.466, 64.785, −4.928) | [847.581, −22.233, −586.636] |

(12356) | (−3.468, 64.782, −4.930) | [847.370, −22.313, −586.259] |

(12346) | (−3.456, 64.785, −4.927) | [846.911, −22.251, −586.774] |

(12345) | (−3.477, 64.803, −4.943) | [847.375, −22.114, −586.886] |

Mean | (−3.469, 64.784, −4.936) | [847.294, −22.255, −586.605] |

SD | (0.00737, 0.0106, 0.00968) | [0.303, 0.0779, 0.267] |

In order to evaluate the calibration accuracy, the vision system is also calibrated by a double theodolites based method^{9} which utilizes two Leica T1800 theodolites (angle measurement accuracy $\le 0.5$ in.). Both results are listed in Table 2. The results of the two methods are comparable, the angle difference is $>0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{deg}$ and the baseline difference is $>0.2\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$.

## Table 2

Comparison with double theodolites based method.

Method | Rotation(Euler angles) (deg) | Translation Vector (mm) | Baseline (mm) |
---|---|---|---|

Double theodolites | (−3.468, 64.789, −4.945) | [846.811, −23.062, −587.479] | 1030.899 |

Our approach | (−3.471, 64.783, −4.941) | [847.612, −22.270, −586.686] | 1031.088 |

## 5.

## Conclusion

In this paper, a calibration method for nonoverlapping cameras is presented. A large light plane which can be generated by an ordinary line laser projector or a rotary laser level is utilized as a calibration object. The method does not require any overlapping camera configuration. Benefitting from the “no mass” and “nonsolid” qualities, the light plane can be freely placed and easily made partly available within all cameras’ views, which makes the method more flexible and suitable for field calibrations. The experimental results with synthetic data show that the proposed method is robust to noise and can be used for a large-scale calibration. Also, results with real data show the impressing reliability and accuracy which are comparable to traditional double theodolites based method.

## Acknowledgments

This research has been supported by the National Natural Science Foundation of China under Grant Nos. 61275162 and 51175027.

## References

## Biography

**Qianzhe Liu** received his PhD degree from the School of Instrumentation Science and Opto-electronics Engineering at Beihang University, China, in 2012. He is currently a lecturer in the School of Instrumentation Science and Opto-electronics Engineering, Beijing Information Science & Technology University, China. His research interests are computer vision and optical fiber sensing.

**Junhua Sun** received his PhD degree from the School of Instrumentation Science and Opto-electronics Engineering at Beihang University, China, in 2006. He is currently an associate professor in the School of Instrumentation Science and Opto-electronics Engineering, Beihang University, China. His research interests are precision measurement and machine vision.

**Yuntao Zhao** received his BS degree from the School of Instrumentation Science and Opto-electronics Engineering at Beihang University, China, in 2006. He is currently pursuing the MS degree in the School of Instrumentation Science and Opto-electronics Engineering, Beihang University, China. His research interests are precision measurement and machine vision.

**Zhen Liu** received his PhD degree from the School of Instrumentation Science and Opto-electronics Engineering at Beihang University, China, in 2010. Since 2010, he has been a lecturer in the School of Instrumentation Science and Opto-electronics Engineering, Beihang University, China. His research interests are laser precision measurement and machine vision.