Feature enhancement for a defocusing structured-light 3-D scanning system

Abstract. Structured-light systems consisting of a camera and projector are powerful and cost-effective tools for three-dimensional (3-D) shape measurements. However, most commercial projectors are unable to generate distinct patterns due to defocusing and shallow focusing issues. We propose a hybrid method for enhancing the calibration and scanning features of the defocusing structured-light 3-D scanning system. Instead of using conventional sequential binary patterns, we replace the highest-level binary pattern by a high-order sinusoidal pattern. In our proposed system, a pan-tilt stage carrying a checkerboard is used to assist the simultaneous calibration of the camera and projector. Initially, the camera is calibrated to obtain the extrinsic positions of the stage. In addition, we utilize the multiplication of vertical and horizontal stripe patterns to enhance the corresponding features between the camera and projector. The projector is then calibrated using the extrinsic features determined from the calibrated camera. The experimental results show that the use of the high-order sinusoidal pattern significantly improves reprojection error. Our proposed method can easily be incorporated in the defocusing projector for scanning various types of objects.


Introduction
Optical three-dimensional (3-D) measuring systems have reached the mainstream for 3-D shape measurements, and they have been widely adopted in various fields. 1The existing 3-D measurement techniques can be categorized according to several types, such as stereoscopy, laser triangulation, structured light or fringe projection, and time of flight (ToF).All of these methods have been extensively employed due to their noncontact and nondestructive features with regard to the physical probes of the coordinate measurement machine (CMM).For example, a ToF camera obtains the distance by utilizing the traveling time required for a light signal to pass between the camera and subjects.The other methods, such as laser triangulation, stereoscopy, and structured light, utilize projective geometry to resolve distance measurements.The stereoscopy method usually utilizes the disparities of rectified stereo images to obtain 3-D shapes.Parallel binocular camera configurations, as well as stereo image after rectification, induce parallel epipolar lines for accelerating the feature matching process in the stereoscopy method.However, robustly determining the correspondences between two images, particularly in subpixel level, is still a challenging task.To simplify this task, the structured-light system, as well as the laser triangulation method, actively projects multiple stripes on a calibration object, and the cast stripes are correlated to the given stripes.Moreover, precise subpixel localizations of stripes can be obtained using super-resolution algorithms. 2 The structured-light system is considered to be one of the most efficient techniques for recovering 3-D points of objects.In the last few decades, there were several coded structured-light systems revealed for a variety of applications.For example, Salvi et al. 3 presented a definitive classification and comparison review of existing structured-light techniques.A comprehensive tutorial for assisting beginners in constructing various types of 3-D scanners was first provided by Lanman and Taubin. 4Since the 3-D scanning technique is interdisciplinary, not only the hardware design but also the software algorithms are highly dependent on the particular system.Gorthi and Rastogi proposed an overview for state-of-the-art fringe projection techniques that mostly utilize phase-shifting algorithms. 5In addition, Geng 6 focused on a comprehensive comparison of structured-light 3-D imaging systems, and several practical applications were illustrated.Among the existing structured-light systems, the time-multiplexing binary code, which produces distinct transitional boundaries between neighboring stripes, is one of the most commonly used techniques.
In our proposed system, binary-coded patterns are used and sequentially projected, in a manner similar to that of Valkenburg and McIvor. 7Moreover, our proposed scanning module consists of a monochromatic camera and a digital light projector (DLP) that are utilized for obtaining the 3-D shape of an object on a pan-tilt stage.For such a closed system, calibration using an external tool is difficult.Therefore, an automatic selfcalibration method is necessary.methods rely on a digital camera for calibration and measurement.To calibrate the camera, a perspective projection of the pinhole model along with radial lens distortion is used to obtain the intrinsic parameters.The most commonly used camera calibration method is the method proposed by Tsai. 8To extend the flexibility of Tsai's method, Zhang 9 proposed an algorithm based on homography transformations of multiple flat planes.Circular control points were further utilized by Heikkilä 10 to improve the calibration accuracy.These calibration methods have developed into mature, well-established methods for precise 3-D computer vision applications.
In recent decades, several approaches have been developed for calibrating structured-light systems.Because the projector can function as an inverse camera, the projector has been commonly treated as a camera for calibration due to the mature techniques of the camera calibration.However, the calibration procedure must be modified due to the fact that the projector cannot directly measure the pixel coordinates of 3-D points in the same way as a camera.Therefore, to calibrate a structured-light system, several methods have been proposed, such as stereovision based on geometrical constraints 11 and plane constraints based on a calibrated camera. 12An inverse camera calibration workflow, which adjusts the projected points so they are consistent with the physical cross corners, was proposed by Martynov et al. 13 On the other hand, Gao et al. 14 utilized an additional color cue to identify the real black-white and the cast red-blue checkerboards.Similarly, Ouellet et al. 15 proposed a geometric calibration method based on circular dots, whereas Anwar et al. 16 used virtual target images to calibrate the projector.Moreover, Orghidan et al. 17 utilized the vanishing points to simplify the calibration procedure.Markerless-based projector calibration, which only uses a white plate, has also been proposed as a flexible solution. 18n their papers, the homography between the camera and a physical plane is initially determined.Then, the features projected from the projector are observed by the camera to determine their homography.The homography between the physical plane and projector is obtained by decomposing the first two homographies, and the calibration for the intrinsic and extrinsic parameters can be readily obtained.Nevertheless, in this method, the accuracy of the calibrated projector likely suffers from calibration errors of the camera.
Lens distortion is a nonlinear behavior due to optical refraction of lenses and the misalignment of assembled parts.Therefore, structured-light systems can suffer from lens distortions of both the camera and the projector.To overcome this nonlinear behavior, a local homography method was proposed by Moreno and Taubin. 19In this method, the transformation between the camera and the projector is decomposed into several linear transformations.Because the camera lens distortion is usually formulated as a high-order polynomial, the distortion effect can be readily removed by correcting the captured images. 20Lens distortion of the projector can be corrected by applying an inverse distorted image to the projector. 21,22Therefore, the calibration accuracy can be considerably improved by eliminating the lens distortions in a structured-light system.
Unlike lens-distortion compensation, a look-up table (LUT), which records the transformation between the physical 3-D ground truth and the observed stripes, is also a feasible solution for achieving precise 3-D measurement. 23ince LUT can be regressed by high-order surface equations or a volumetric function, the distortion behavior in the camera and projector is preciously recorded.Therefore, without knowing the intrinsic and extrinsic parameters of both components, the structured-light system is still able to obtain accurate 3-D points.
Most existing methods require a good focusing projector to function as an inverse pinhole camera.However, the depth of focus is limited by the finite size of the aperture.For an out-of-focus projector, the projected features are difficult to identify.As a result, the system may suffer from ambiguous cast stripe boundaries, which can substantially enlarge the variance of the measurement results.To overcome this problem, an out-of-focus projector calibration, which can produce consistent results under various defocusing degrees, was proposed by Li et al. 24 A complete 3-D model usually requires several scans from multiple directions in addition to 3-D shape registration or extrinsic calibration.The iterative closest-point (ICP) method, which performs closed-form rigid-body transformations, is commonly utilized for 3-D shape registration. 25The ICP method needs a good, close initial position to search for corresponding 3-D features.Barone et al. 26 utilized an external tracking device to achieve global registration.Therefore, a large object, such as a statue, can be completely integrated from multiple scans.For a specific distribution of the relative scanning positions, such as an object on a turntable, Pang et al. 27 utilized a global registration method.Similarly, our system uses a pan-tilt stage for carrying the scanned object, and the rotating axes are calibrated by retrieving the extrinsic parameters for each position on a checkerboard.
Finding correspondences is another critical task for a structured-light system.A variety of coded patterns have been proposed in which the projected pattern carries unique information of the position with respect to the projector coordinates.This camera-obtained information can be used to determine the correspondence between the camera and projector. 28Generally speaking, the epipolar constraint, which allows the correspondence to be recognized in a specific region, is the most popular constraint. 29Although the projected coded features should be found in a visible region, the reflected features sometimes interfere with the geometrical shapes and surface properties.As a result, the classification of some pixels or identification of coded features may fail.Therefore, several studies were proposed to reduce noise by colorimetric lights, 30 to improve the pixel classification 31 and to remove interreflections of surface scatting. 32inocular vision is another common method for the structured-light systems consisting of two cameras and a projector. 33For high-speed data acquisition, the two cameras are usually synchronized.In this configuration, the projector provides distinct coded features that are observed by both cameras.The corresponding features are then obtained under epipolar constraints.Therefore, the 3-D points can be readily obtained by the direct linear triangulation method under the known parameters of a calibrated stereo camera.The projector becomes a feature generator, and geometrical calibration of the projector is not always necessary. 34,35ased on the calibrated stereo camera, 3-D equations of the stripes from the projector can be further estimated for consistent feature matching. 36,37This configuration can also be considered as two individual structured-light systems that are able to reduce the shadow region. 38The use of two different focal length cameras will allow for the acquisition of dense, multiple spatial-resolution images. 39Proposed Method In a structured-light 3-D scanning system, the projector is usually used to generate binary stripe patterns for measuring 3-D objects.However, the performance of the projector will dominate the measurement accuracy.This study considered a robust method for overcoming potential problems caused by the projector optics mechanism, such as the small depth of focus, nonuniform brightness distribution, low resolution, and low modulation transfer function (MTF).Specifically, in our proposed system, the projector was used to generate both binary stripe patterns and a sinusoidal pattern for producing features and measuring 3-D objects, which reduced errors obtained with conventional projector optics.

Camera Model
The mathematical model of a physical camera is commonly treated as an ideal pinhole model.The mathematical formulation of a pinhole camera is the projective operation of E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 4 9 3 x ¼ K½RjtX; ( where X denotes a point of 3-D scene in the world coordinate system, and x indicates where X is projected onto the image plane.The extrinsic parameter ½Rjt consists of a rotation and a translation matrix and describes where the camera is in the world coordinate system.Therefore, the 3-D points in camera coordinates can be obtained by applying the ½Rjt matrix on world coordinate X.Finally, K is the intrinsic parameter representing the characteristics of the camera such as the focal length, optical center, and aspect ratio.Therefore, x on the two-dimensional (2-D) image can be obtained after applying K to the 3-D points in camera coordinates.In practice, the behavior of a camera does not perfectly fit the ideal pinhole model due to nonlinear optical distortion.The relationship between the measured distorted point ðu d ; v d Þ and the ideal point ðu; vÞ in an image is usually governed by a high-order polynomial function as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 2 7 6 Here, ða 0 ; a 1 ; a 2 Þ, ðs 0 ; s 1 Þ, and ðp 0 ; p 1 Þ represent the radial, prism, and tangential distortion coefficients, respectively.

Camera Calibration
The command configuration of the structured-light 3-D system is shown in Fig. 1.The camera and projector were rigidly fixed on a rig.In practice, we placed a flat checkerboard on a pan-tilt turntable to automatically collect the calibration features.There were four coordinate systems used in this configuration, namely the camera, projector, world, and local coordinates.
The local coordinate represented the coordinate on each checkerboard, which can be determined by the corresponding extrinsic parameter of the camera.To estimate the relationship between the local coordinates and extrinsic camera parameters, we initially calibrated the camera to obtain the intrinsic, extrinsic, and distortion parameters of the camera based on Zhang's method. 9There were a total of 18 × 13 corner features generated in a frame.The calibration procedure typically requires a number of pose images to obtain the best calibration result.In our implementation, at least 30 regularly distributed pose images were used.

Projector Calibration
To complete the projector calibration, we utilized the centroid features of the projected image plane and estimated the centroid features on a real checkerboard.Based on this framework, the projector calibration procedure could be considered to be exactly the same as the camera calibration procedure.Since the extrinsic parameters from the calibration patterns were determined in the camera calibration, the transformation between each local coordinate and camera coordinate has been established.Nevertheless, the projection matrix P, which is a transformation from camera coordinates to projector coordinates, is still unknown.In general, the mathematical model of a projector is similar to that of a camera.However, to remove the distortion effects for the projector, estimation of the intrinsic parameters is necessary.In a structured-light 3-D system, we cannot directly measure the features on the projector coordinate.Therefore, we collected the centroid features of both the vertical and horizontal bright stripes from the highest-level patterns.In particular, we collected the projected centroid features observed by the camera using a second "camera calibration" to obtain the intrinsic, extrinsic, and distortion parameters of the projector.The projected centroid features were then converted into local coordinates using a homography transformation.Thus, the coordinate values of the projected centroid features on a real checkerboard were generated, establishing the correspondence between the projector image and the real checkerboard.

Centroid feature on the local coordinate system
In our structured-light 3-D system, the stripe patterns were sequentially projected.The most critical pattern for the projector calibration was the highest-level pattern.We already have the given centroid features on the projector image from the camera calibration.The next problem was to determine their physical positions on the local coordinate system.The procedure was performed in two steps.First, all candidate centroid features were determined, and then, the centroid features were transformed into local coordinates.Figure 3 shows an example of centroid feature generation using eighth-level patterns.To obtain distinct centroid features, we multiplied the vertical and horizontal stripes of the eighth-level pattern and performed a Hadamard product operation.Figures 3(a) and 3(b) show the vertical and horizontal stripes, respectively, and Fig. 3(c) shows the result after multiplying the two images.We used multiplication for the two images to emphasize the local intensity of each centroid feature, which was helpful in determining distinct position at the subpixel level.In the calibration procedure, a printed checkerboard was used, and all candidate centroid features were determined using the white squares.
However, several candidate centroid features at the checker boundaries suffered from destroyed spot shapes, as shown in Fig. 3(d).To remove these outliers, only the centroid feature closest to the center of the white block was selected, as shown in Fig. 3

(e).
To convert the image coordinate to the local checkerboard coordinate, one additional conversion was required.In the camera calibration, the checkerboard was rotated by the turntable to obtain multiple poses.Each pose represented an extrinsic parameter of the camera.Using Eq. ( 1), the 3-D point X in checkerboard coordinates was converted into the 2-D point x in image coordinates.However, to obtain X from a known x, the reverse conversion was required.Since our desired centroid features were given in local coordinates on the surface of checkerboard, the z-component of X was zero.The K½Rjt matrix in Eq. ( 1) was reduced to a 3 × 3 matrix by removing the z-component (third column) of the K½Rjt matrix.Therefore, the conversion between the image coordinate and the surface of the checkerboard became a homography transformation.Thus, the centroid features were determined in local coordinates for every pose.

Corresponding projector image features to the centroid features
Though the centroid features were determined in image coordinates, their corresponding features on the projector image are still unknown.To determine those features, a decoding procedure for each centroid feature was required.The decode procedure involved finding the matching codes by comparing the observed camera image with the given projector image.Since the 2-D feature has two degrees of freedom, constraints for the decoding procedure in both directions were required.Therefore, vertical and horizontal stripe patterns were individually used for decoding the xand y-components of the 2-D feature.Figure 4(a) shows the centroid features in one camera image, and Fig. 4(b) shows the determined corresponding features on the projector image.

Projector calibration
After obtaining the centroid features on the local coordinates and their corresponding features on the projector image, the projector calibration, which is exactly the same procedure as the camera calibration, was readily performed.The flowchart of the projector calibration is provided in Fig. 5.In practice, the turntable rotated the checkerboard to 30 different poses.
In each pose, a total of 33 images were taken, including one white pattern, 16 vertical patterns, and 16 horizontal stripe patterns.The white pattern images in all poses were intentionally used for calibrating the camera because the calibrated camera was used to induce the extrinsic parameters for determining the position of each checkerboard.These 16 vertical stripe patterns, including eight-level pattern images and their inverse image, were used for robustly determining the transition boundary of the stripes due to the nonuniform brightness distribution of the acquired images.Due to the limitation of the projector resolution, the highest level used in this study was eight.There are two reasons for using the vertical and horizontal stripe patterns.The first reason for using the two stripe patterns is to generate the centroid features of the eighth-level images.The second reason is to assist in finding the correspondence features on the projector image.After the projector calibration, the intrinsic and distortion parameters were obtained in order to remove the distortion effect of the projector.The extrinsic parameters of project were also used for recovering the relationship corresponding to the camera coordinate.

Three-Dimensional Reconstruction
The direct-triangulation method was implemented in the 3-D reconstruction procedure.Before determining the 3-D points, the corresponding points between the camera image and projector image must be determined.In practice, we projected the vertical stripe patterns and determined the centroid  lines in both the camera and projector images.All vertical centroid lines were determined at the eighth-level and encoded using hierarchical binary patterns.Thus, the line correspondences between the camera and projector images were obtained by comparing the encoding at all level patterns.For the horizontal constraints, we utilized a fundamental matrix to convert all features of the camera image into epipolar lines on the projector image.The fundamental matrix was obtained using the corresponding features from the camera and projector images.The intersections of the epipolar lines and centroid lines were collected as matching features.As a result, the 3-D coordinates of all corresponding features were determined using the direct-triangulation method.
4 Experimental Results

System Configuration
The mechanism of our structured-light system is shown in Fig. 6.A 3M-pixel camera manufactured by Pointgrey (FL3-U3-32S2M-CS) was used with an additional DLP (VIVITEK UMI-Q5), which has a native resolution of 912 × 1140 pixels.The inclined angle between the camera and projector was roughly 20 deg.To shorten the distance from the scanned objects, a mirror was inserted between the projector and the work zone.In the work zone, a pantilt stage driven by a step motor was used to automatically generate various poses of the checkerboard.In practice, the encoder for the motor was not required during calibration.

Comparison of the Cast Pattern in the Projector Calibration
To generate distinct centroid features, several patterns for the highest level are considered, as discussed in Sec.3.3.1.
The camera can be expected to observe more distinct features when projector casts a sharper pattern.Figure 7 shows the close up images of five different high-level patterns, i.e., level 8.The intensity distribution observed by the camera is known to be a combination of three factors: the real emission distribution of the projection, the material reflection property, and the light response of the camera.From the result shown in Fig. 8, the highest order pattern considered, i.e., sin 3 θ, had more distinct peaks, but induced a dimmer pattern.
To evaluate the calibration performance under different patterns, the camera calibration was initially performed based on Zhang's method. 9The projector was then calibrated based on the known extrinsic parameters of the camera.The reprojection errors of the projector in the case of several patterns are listed in Table 1.The experimental results showed that the higher order pattern had a smaller and more concentrated error.Based on the framework, the root mean square (RMS) error of the sin 3 θ pattern was as small as 0.15 pixels.

Projector Calibration under Shallow Focus
The reason we obtained a stable result is that we do not rely on the highest-level pattern for decoding the correspondences.Since the resolution of the projector in our structured-light system is 912 × 1140 pixels, the projector was able to generate 2 10 and 2 11 vertical and horizontal patterns, respectively.However, for most commercial projects, the depth of focus (DoF) is usually short because a relative large aperture is used to increase the radiant flux.Therefore, blur patterns are expected in the work zone.In addition, Fig. 6 The system configuration of our structured-light scanner.low MTF in the projector is a critical factor in the generation of a blurred image.For a prime lens camera, the DoF can be extended by adjusting aperture size, and the camera does not usually suffer from low MTF.Therefore, the images in the work zone are unlikely to be blurred as a result of proper camera settings.In contrast, the DoF and MTF of the projector are occasionally limited.Figure 9 shows a calibration board at a wide depth.In this condition, the low MTF of the projector induces blurred patterns, even for in-focus regions, as shown in the top row in Fig. 9. Another factor inducing blurred patterns is a result of a shallow focus projector, as shown in the bottom row of Fig. 9.In general, a higherlevel pattern should induce more accurate correspondences.However, the use of high-level patterns may suffer from uncertainty due to the physical resolution limitations of both the camera and projector.To avoid this uncertainty, the cubic sinusoidal pattern is considered instead of the binary pattern in the projector calibration.The experiment in Table 2 shows the calibration results in different threelevel patterns.Unlike Moreno and Taubin, 19 our proposed method was able to determine distinct features.Moreover, the calibration error in the projector was as small as 0.14 pixels.Our highest-level pattern was a sinusoidal wave pattern, and the remaining levels were binary patterns.Thus, the centroid features from the multiplication of vertical and horizontal patterns will be distinct even if the projective patterns are out of focus.

Scan Benchmark
To evaluate the performance of the calibration beyond its reprojection errors, we scanned a qualified ceramic sphere having a diameter of 19.9824 mm (Mitutoyo CMM Masterball 06ABM944D).To suppress the reflection due to its glossy surface, we sprayed a very thin and uniform layer of paint on the sphere.In our system, the pan-tilt stage was installed in the center of the work zone only for automatically generating various poses of the calibration checkboard.Extreme corners of the work zone may suffer  from a lack of valid features.Nevertheless, the scan accuracy was still qualified.Figure 10 shows the error distributions of the scanned spheres and their fitting spheres at different positions of the work zone, including the center and eight extreme corners.In each position, the scanned 3-D points were collected to estimate their fitting spheres.The overall mean error and standard deviation were 46.6 and 52.6 μm, respectively.For the center of the work zone, the mean error and standard deviation were as small as 23.9 and 23.6 μm, respectively.In Fig. 10, positions f and g, which are far from the projector, suffer from blurred images due to the out-offocus patterns of the projector.As a result, their estimated dimension errors reached 115 μm, as shown in Fig. 11.
To compare the error distribution with previous work, we scanned a cubic block based on two calibration data in Table 2.The block has a width of 70 mm, and all scanned points on two perpendicular surfaces of the block are collected to visualize the overall error distribution, as shown in Fig. 12.As mentioned, the projector under shallow focus will induce poor scan result.Figures 12(a) and 12(b) represent our proposed method and Moreno and Taubin's, 19 respectively.Comparing to Moreno and Taubin's method, our proposed method has wide depth of field.As a result, the overall error within the work zone is relatively small.
To verify the scanning of 3-D objects, we tested various materials, including gypsum, plastic with metal coding, rubber, and earthenware, as shown in Fig. 13.Some of the objects had rich colors.The gypsum sculpture surface has the property of Lambertian reflectance.Therefore, the apparent brightness of the reflective patterns was uniform regardless of observed angle of view.The observed features from the centroids of the high-level strips were convincing for    Because most commercial projectors usually suffer from shallow focus, our proposed method utilized the high-order sinusoidal pattern to enhance the corresponding features in the projector calibration.Calibration experimental results showed that the utilization of the sin 3 θ pattern as the highest-level pattern significantly improved calibration error.Compared to the existing calibration method, the proposed method was shown to be more robust for the defocusing structured-light 3-D scanning system.The scan benchmark experiment demonstrated qualitative comparisons for scanning various 3-D objects.The results showed that the proposed method was capable of performing quality 3-D scans of various materials using a shallow focus projector.

Disclosures
There are no conflicts of interest to declare.

Fig. 2
Fig. 2 (a) The eighth-level patterns and (b) different wave functions of the eighth-level pattern.

Fig. 3
Fig. 3 Determination of centroid features on camera images: (a) vertical stripes (eighth-level), (b) horizontal stripes (eighth-level), (c) multiplied patterns (eighth-level), (d) all candidate centroid features on the white checker regions, and (e) the centroid feature closest to the center of the checkerboard is selected.

Fig. 4
Fig.4The corresponding features on the projector image: (a) the centroid features in one camera image and (b) corresponding features on the projector image after the decoding procedure.

Fig. 5
Fig.5The flowchart of the projector calibration procedure.

Fig. 7
Fig.7Five different patterns cast on a white checker.

Fig. 8
Fig.8Observed pixel intensity on a white checker.

Fig. 9
Fig. 9 Captured images with different levels of horizontal patterns.The first row figures indicate that projectors with a lower MTF will yield a blurred pattern.The second row figures show that, when the object is out of focus, the pattern becomes blurred as well.

Fig. 10
Fig. 10 Comparison of the error distribution for various scanned positions in the work zone.

Fig. 12
Fig.12Pair comparison of error distribution on the block.

Fig. 13
Fig.13Scanning result for various objects, constructed of gypsum, plastic with metal coating, rubber, and earthenware, respectively.

Table 1
Calibration error for different eighth-level patterns.

Table 2
Reprojection error (RMS) of the projector.