Translator Disclaimer
Open Access Paper
17 September 2019 Modelling and calibration of multi-camera-systems for 3D industrial supervision applications
Author Affiliations +
Proceedings Volume 11144, Photonics and Education in Measurement Science 2019; 111440D (2019)
Event: Joint TC1 - TC2 International Symposium on Photonics and Education in Measurement Science 2019, 2019, Jena, Germany
With the advent of industry 4.0, the introduction of smart manufacturing and integrated production systems, the interest in 3D image-based supervision methods is growing. The aim of this work is to develop a scalable multi-camera-system suitable for the acquisition of a dense point cloud representing the interior volume of a production machine for general supervision tasks as well as for navigation purposes without a priori information regarding the composition of processing stations. Therefore, multiple low-cost industrial cameras are mounted on the machine housing observing the interior volume. In order to obtain a dense point cloud, this paper reviews aspects of metric stereo calibration and 3D reconstruction with attention being focused on target-based calibration methods and block matching algorithms.



As low-end image sensors are available in many different configurations, the approach of using multiple cameras to increase the depth of information captured by the image sensors becomes more common. The Stanford University led this approach with the publication [1] presenting a dense array of CMOS image sensors used to capture high-speed videos and later with [2] using 100 cameras for synthetic aperture imaging. Besides these camera arrays utilizing the increased depth of information from more sensors, binocular stereo computer vision systems are common. Principally, two views are enough to compute 3D information of an object depicted in both frames. Therefore a rig of two cameras can be used or the camera has to move around the object to capture frames from different viewpoints. The latter principle is commonly referred to as structure from motion, whereas the primal approach is called stereoscopy. As both principles rely on at least two views with a common area, software implementations for structure from motion can, at least in parts, be utilized for stereoscopy and vice versa as they share the same mathematical approach.

In the presented work the open source computer vision library OpenCV in version 3.4.4 is used. The multi-camera system referred to in this paper is designed to overview the interior of a production machine with tools moving on a gantry in X, Y, Z direction. Multiple processing stations may be placed in the range of motion of the gantry. Since the machine should be able to work its way through the interior without colliding with any object, the three-dimensional profile needs to be detected. Therefore multiple cameras, at first four, are mounted to the machine frame (see Figure 1). A stereovision approach is used, which leads to six stereo pairs for four cameras.

Figure 1.

The conceptual layout of the multi-camera-system with four cameras mounted to the machine frame is depicted. From these four individual cameras six different combinations of camera pairs evolve: 1-2; 1-3; 1-4; 2-3; 2-4; 3-4


Generally speaking, the aim of stereoscopy algorithms is to calculate a disparity map from two frames by two cameras corresponding to the same scene. The disparity map holds information about the different positions of common scene points projected to the individual camera frames. With this information the point can be easily reprojected to three-dimensional coordinates, resulting in a point cloud with every point containing X, Y, Z coordinates in real world coordinates. In the presented setup this leads to six point clouds from six stereo pairs depicting the machine interior from different viewpoints.



The calibration of a multi-camera system is an essential part in the process of reconstructing three-dimensional data from two-dimensional images, as it defines the possible measurement accuracy as well as the scale. The aim of the calibration is to get the intrinsic and extrinsic parameters of every camera to transform a point from world coordinates to image pixel coordinates (see Figure 2). Later these parameters are used to undistort and rectify the images, along with for image transformations necessary for the reconstruction of 3D points.

Figure 2.

Overview of coordinate transformations for camera calibration.


The extrinsic parameters include the rotation matrix R and translation vector t. Intrinsic parameters are the camera matrix A, comprising the focal lengths fx, fy and the optical center (cx, cy), as well as the distortion coefficients d.

All calibration parameters stay true as long as the position and orientation of the cameras, along with the focal settings, do not change. Therefore the calibration has to be carried out only once for a camera setup.


Single Camera Calibration

To start the camera calibration, a camera model has to be chosen and be described. The OpenCV calibration algorithm utilizes a pinhole camera model and introduces radial and tangential distortion [3]. Distortion correction is vital since the presented multi-camera system uses low-cost board-level cameras with S-mount lenses, which introduce an amount of distortion to the images that can not be neglected.

At first a transformation from a three-dimensional point Pw(Xw, Yw, Zw) in world coordinates to a point P(u, v) in image pixel plane coordinate system has to be found (see Figure 3). The equation (1) transforms a point Pw to a point Pc(Xc, Yc, Zc) in the camera coordinate system, where R is a 3x3 rotation matrix and t is a 3x1 translation vector.


Figure 3.

Coordinate system for camera calibration


The point Pc is now projected through the pinhole model in order to obtain physical coordinates on the image plane as P(x, y), see (2).


By introducing radial distortion coefficients (k1, k2, k3), the corrected point coordinates Pk(xk, yk) are defined as following:


With k1, k2 and k3 being the radial distortion coefficients and r2 = x2 + y2. Due to a lens not aligned perfectly to the image plane, tangential distortion is introduced. Its correction for a point Pp(xp, yp) can be described as:


In summary the distortion coefficients are defined as d = (k1, k2, p1, p2, k3). The tangential and radial distortion corrected point Pq(xq, yq) is, as combination of the equations (3) and (4), defined as:


To translate the physical image plane coordinates to image plane pixel coordinates, the camera matrix A is needed, which contains information about the focal length in mm as fx and fy, as well as the optical center in pixel coordinates as (cx, cy). γ represents the skewness, which is the angle error in between the two axis of the pixel array. For industrial grade image sensors skewness is usually small and can be neglected.


In conclusion a point P(u, v) in pixel coordinates is defined as:



Stereo Camera Calibration

Stereo camera calibration, or binocular calibration, will, in addition to the already obtained intrinsic and extrinsic parameters, get the relative position of one camera to the other of a stereo pair along with matrices necessary for the reconstruction of the scene.

We suppose the left camera extrinsic parameters are rotation matrix RL and translation vector tL. For the right camera these are RR and tr. From these matrices and vectors the translation vector from left to right camera coordinates tLR and the corresponding rotation matrix RLR are derived.

The keyword for stereoscopic reconstruction is epipolar geometry, which encapsulates the relation of the projective geometry between two views. The centrepiece of this is the fundamental matrix F, which derives from the relation given in equation (8), with x as the projection of a real-world point X in the left camera and x′ in the right camera frame. The equation could be solved for image points with known real world distances obtained from both cameras of a stereo pair. For a more in-depth explanation of epipolar geometry refer to [4].


To be able to transform images pairwise abiding to epipolar geometry, rectification homographies HL and HR need to be found. With known fundamental matrix F, the algorithm described in [5] can be utilized.

Figure 6 shows frames of pair 2 distortion corrected and rectified, which means the rectification homo-graphies are applied.

To be able to reproject image points to 3D coordinate space, disparity has to be introduced. Figure 4 depicts a simplified projection of a point P onto two already rectified images. Since these images are rectified, the projections P1 and P2 are located on one epipolar line. Therefore v1 = v2 and u1u2. The equation for disparity is given by (9) with B as baseline or distance between the optical centers of the cameras (C1 and C2), f as focal length and z as distance to point P in 3D world coordinate system.


Figure 4.

Left side: point P projected onto rectified frames of stereo pair; right side: point P projected onto rectified frames of stereo pair in top view.


One step in stereo calibration is still missing, the computation of the disparity-to-depth mapping matrix Q. Its definition is given by equation (10). Since the vectors are noted in homogeneous coordinates their fourth entry refers to scale. With this equation a disparity map can be reprojected to the world coordinate system in 3D (see section 3).



Calibration Procedure

In order to apply the above elucidated relations, a set of points with known distances is needed. Therefore a chessboard type calibration target is used, as shown in Figure 5. A set of images with different positions of the target are taken, where the target should be moved to every position in the measurement volume in order to get a strongly calibrated stereo vision system. The position of the corners on the calibration target are detected by using the OpenCV function findChessboardCorners() and, to get the corner position with subpixel precision, cornerSubPix() is called afterwards. This routine is carried out for every image during the calibration process. This leads to a set of points for every camera where every point refers to a corner on the chessboard pattern. With these points and an array with distance values in mm corresponding to the dimension of the rectangles in the calibration pattern, the cameras can be calibrated. For every stereo pair at first the extrinsic and intrinsic parameters of the two partaking cameras are calculated (see section 2.1) using the OpenCV function calibrateCamera(), which is based on [6] and [7]. This routine delivers the camera matrices AL and AR, distortion coefficients dL and dR, rotation matrix and translation vector for every pattern view and a reprojection error. The distortion coefficients are later used to undistort images, as depicted in Figure 6 on the left side.

Figure 5.

Frames of camera 1 to 4 (left to right) with chessboard calibration pattern. The image frames are uncorrected, neither undistortion nor rectification transformations are applied.


From the rotation and translation matrices for every pattern view, the rotation matrix RLR and translation vector tLR from left to right camera are derived.

The above mentioned chessboard corner points in image pixel coordinates are now undistorted using the function undistortPoints(), and further on used to obtain the F matrix via the OpenCV function findFundamentalMat(). This function utilizes the random sample consensus (RANSAC) algorithm to solve the problem delineated in equation (8). With the obtained F matrix the above mentioned rectification homographies can be calculated. In OpenCV the function stereoRectifyUncalibrated() computes the rectification homographies HL and HR using the algorithm presented in [5]. On the right side Figure 6 depicts rectified images of stereo pair 2.

Figure 6.

Images obtained from stereo pair 2 (left: camera 1, right: camera 3). The image pair shown on the left side is distortion correct, the pair on the right side is rectified. On the greyscale pattern two objects (one cube and one cone) are placed, see yellow markers in left image.


After rectification a disparity map can be calculated using a matching algorithm, for example semi global block matching (SGBM), this procedure is explained in section 3. As mentioned above, the Q matrix is needed to reproject a disparity map to 3D. The connection between disparity, world points and Q is given in (10). Since we are using the OpenCV function stereoRectifyUncalibrated(), which does not return a Q matrix, the disparity-to-depth mapping matrix needs to be calculated in another way. Therefore the equation (10) is utilized. At first the points obtained from the calibration target are triangulated using OpenCVs triangulatePoints(), which results in a matrix with points in homogeneous coordinates, notated as B in (11). The matrix D in the same equation is acquired by applying HL, respectively HR, to image points obtained by findChessboardCorners() and cornersSubPix(). Equation (12) describes the derivation of a pseudoinverse matrix, which is in (13) used to convert equation (11) to obtain Q. In this way it is possible to calculate the disparity-to-depth mapping matrix based on the points retrieved from the different views of the calibration target. Assuming the calibration target was moved through the whole observed volume, maximum and minimum disparity values can also be calculated using these points.




The calibration as a whole delivers information to rectify camera frames according to epipolar geometry. From this point a disparity map for a stereo pair needs to be calculated, which then can be reprojected to 3D world coordinates with the already obtained disparity-to-depth mapping matrix Q.

The section 3.1 exemplifies the computation of the disparity map, section 3.2 explains the reprojection to 3D world coordinates.


Constructing Disparity

The OpenCV implementation of the semi-global block matching (SGBM) algorithm is applied to the rectified images of stereo pair 2 (see Figure 6, right side). The OpenCV class StereoSGBM uses a modified algorithm from [8], whereas, instead of mutual information cost function, a simpler sub-pixel metric from [9] is implemented.

In the first step pixelwise cost calculation is done, the cost is calculated as the absolute minimum difference of intensities in the range of a half pixel in 5 directions along the epipolar line. As the name of the algorithm suggests, a global smoothness constraint is approximated. The smoothed cost for a pixel (or block, depending on the chosen block size) and disparity is calculated by summing the costs of all minimum cost paths that end in the pixel or block. The disparity map is determined by selecting a disparity with corresponding minimum cost for every pixel of the source image. This process is done two times, the first time from left to right image, the second time from right to left image. Therefore a left and right matcher instance is created. The input minimum disparity value is obtained from the matrix D in equation (11), as well as the number of disparities, which is the maximum disparity value minus the minimum disparity value. From these two maps one combined disparity map is preserved using the Weighted Least Squares filter with a left-right-consistency-based confidence map. In OpenCV these functions are implemented in the ximgproc.DisparityWLSFilter class. Figure 7 depicts the output disparity maps.

Figure 7.

Disparity map of pair 2, corresponding to image pairs shown in Figure 6. From left to right these images are left-right disparity map, right-left disparity map and the filtered disparity map. The disparity is mapped to values ranging from 0 to 255, hence dark pixel represent areas further away from the camera pair, brighter pixel represent areas closer to the camera pair (except for the right-left matched map). In order for the SGBM algorithm to work properly, the surface has to be structured or a structure has to be projected. In these frames this is not implemented, hence the only valid section is the structured area (see Figure 6).



Constructing Point Clouds

The obtained disparity images (see Figure 7) need to be reprojected to 3D world coordinates. Therefore the disparity-to-depth mapping matrix Q is calculated, see section 2.2. In OpenCV the function reprojectImageTo3D() transforms a disparity map to a 3-channel point cloud. For each pixel (u, v) and its corresponding disparity value disparity(u, v) the equation (10) is solved. This results in a dense point cloud with 5.038.848 points. Figure 8 depicts the valid part of the resulting point cloud.

Figure 8.

Two views of the point cloud resulting from the reprojection of the disparity map to 3D world coordinates. Only the valid area (cf. Figure 7) is depicted.




This paper has presented a calibration technique for stereoscopic applications, utilizing the functions and algorithms implemented in the OpenCV library. The calibration procedure does not only rely on these functions but is also extended by own approaches, for instance the computation of the disparity-to-depth mapping matrix. With the presented calibration procedure it is possible to calibrate an array of cameras, segregated into stereoscopic camera pairs, and retrieve 3D information about the contemplated scene.

In the future the individual point clouds will be registered and filtered to obtain one point cloud containing all information from all views. Another significant step is the usage of structured light in order to make the SGBM algorithm work on all parts of the acquired images.


This paper was supported in part by the European Social Fund and Thüringer Aufbaubank.



Wilburn, B., Joshi, N., Vaish, V., Levoy, M., and Horowitz, M., “High-speed videography using a dense camera array,” in CVPR’04 Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2004). Google Scholar


Vaish, V., Synthetic Aperture Imaging using dense Camera Arrays, Stanford University(2007). Google Scholar


Wang, Y., Li, Y., and Zheng, J., “A camera calibration technique based on opencv,” in The 3rd International Conference on Information Sciences and Interaction Sciences, (2010). Google Scholar


Hartley, R. and Zisserman, A., Multiple View Geometry in Computer Vision, Cambridge University Pr.(2003). Google Scholar


Hartley, R., “Theory and practice of projective rectification,” International Journal of Computer Vision, (1999). Google Scholar


Zhang, Z., “A flexible new technique for camera calibration,” Pattern Analysis and Machine Intelligence, IEEE Transactions, (2000). Google Scholar


Bouget, J.-Y., “Camera calibration tool box for matlab,” Caltech Vision, (2015). Google Scholar


Hirschmueller, H., “Stereo processing by semiglobal matching and mutual information,” Pattern Analysis and Machine Intelligence, IEEE Transactions, (2008). Google Scholar


Birchfield, S. and Tomasi, C., “A pixel dissimilarity measure that is insensitive to image sampling,” Pattern Analysis and Machine Intelligence, IEEE Transactions, (1998). Google Scholar
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Guido Straube, Chen Zhang, Artem Yaroshchuk, Steffen Lübbecke, and Gunther Notni "Modelling and calibration of multi-camera-systems for 3D industrial supervision applications", Proc. SPIE 11144, Photonics and Education in Measurement Science 2019, 111440D (17 September 2019);

Back to Top