Single-frame three-dimensional imaging using spectral-coded patterns and multispectral snapshot cameras

Abstract. We present an approach for single-frame three-dimensional (3-D) imaging using multiwavelength array projection and a stereo vision setup of two multispectral snapshot cameras. Thus a sequence of aperiodic fringe patterns at different wavelengths can be projected and detected simultaneously. For the 3-D reconstruction, a computational procedure for pattern extraction from multispectral images, denoising of multispectral image data, and stereo matching is developed. In addition, a proof-of-concept is provided with experimental measurement results, showing the validity and potential of the proposed approach.


Introduction
High-speed 3-D imaging matches the increasing demands for real-time capability in nondestructive industrial inspection, human-machine interaction, biomedical and security applications, etc. The common real-time solutions for threedimensional (3-D) imaging, e.g., passive stereo, 1,2 time-offlight (ToF) cameras, 3,4 and focal tomography, 5 are limited by their depth resolution and not suitable for tasks with high-accuracy requirements. A well-established high-accuracy 3-D imaging technique is the pattern projectionbased stereo photogrammetry. It solves the stereo matching problem with the pixel-level virtual features created by the projection of a sequence of light patterns. So, the pattern projection method can realize the pixel matching even at a wide baseline, thus improving the depth resolution. Moreover, the virtual features ensure a high measurement robustness at nontextured or sparse-textured surfaces. For such 3-D imaging systems, there are two possible approaches to raise their 3-D frame rates. The first approach is to minimize the number N of patterns for the computation of a single 3-D image, obviously N ¼ 1. The previous 3-D techniques based on single-frame pattern projection could be classified into two groups according to the pattern type: monochromatic patterns with phase modulation in the frequency domain [6][7][8] and composite RGB fringe patterns, in which the phase shift is coded with RGB colors. [9][10][11] However, the decoding of monochromatic patterns will confront a lot of artifacts at objects with shape discontinuities or very sharp edges, and a hard challenge at the RGB fringe projection is that the phase map unwrapping becomes difficult without additional patterns (e.g., gray codes). Moreover, these techniques are affected from nonuniform surface properties and thus are limited by their application occasions.
The other way to real-time 3-D imaging is the speed-up of both pattern projection and single-image acquisition. Typical high-speed projection techniques are laser speckle projection with acousto-optical deflection, 12 multiaperture or array projection, 13,14 and GOBO projection technologies. 15,16 With these mechanical-analogues projection principles, a 2-D projection frequency of maximum 100 kHz can be realized, enabling a 3-D frame rate up to 10 kHz. However, a drawback is that the expensive high-speed cameras in accordance to the projection frequencies are needed, which cause also a high effort in the data transfer.
Currently, multispectral cameras, 17 especially miniaturized multispectral snapshot cameras, 18-20 offer possibilities for real-time spectral imaging. They can realize simultaneous image data acquisition at multiple spectral bands. In this contribution, we demonstrate an approach for active singleframe 3-D imaging based on multiwavelength pattern projection and a stereo vision setup of two multispectral snapshot cameras. Further, we present a proof-of-concept with the first experimental results.

Approach to Multiwavelength Pattern Projection
The basic concept of the proposed approach is to realize the projection of various patterns at different wavelengths and the detection of these patterns at the corresponding spectral bands of multispectral snapshot cameras in stereo arrangement, simultaneously (see Fig. 1). To this, an adaptation of the projector's spectral characteristics to the spectral bands of the cameras is necessary. In this way, the 3-D reconstruction from a single stereo image pair can be performed using a sequence of patterns that enhances the stability and robustness of stereo matching. On the other hand, the effort of data transfer is much lower than the high-speed 3-D imaging techniques using temporal pattern sequence projection, enabling a lower hardware utilization.

Multispectral Snapshot Camera
Nowadays, the miniaturized multispectral snapshot cameras are mainly based on the principle of an on-chip multispectral filter array 21 (MSFA). Generally, MSFAs are available with up to 25 spectral bands in the visible and near infrared (NIR) spectral range. They are composed of multiple mosaic filter elements, each of whose subelement corresponds to a special spectral band. The MSFA is pixel-synchronously mounted in front of a monochromatic image sensor, as demonstrated in Fig. 2(a). As usual, a demosaicing algorithm is necessary to recovery the missing spectral components at the individual pixels because each sensor pixel captures only one spectral component. Figure 2(b) shows the spectral responses of the Silios multispectral cameras used in this work. 22 This MSFA is composed of one panchromatic neutral band and eight spectral bands in the red to NIR spectral range. Their central wavelengths lie between 650 and 930 nm, and they have a full-width at half-maximum (FWHM) of about 40 nm. Hence, the spectral characteristics of the projector should be designed according to this MSFA.
Supposing a linear transfer function of the image sensor, the spectral response values I of the ideal digital camera can be formulated by (1) where λ is the wavelength, LðλÞ denotes the spectral power distribution of the illumination, oðλÞ the spectral transmission of the camera optics, s cam ðλÞ is the spectral sensitivity of the image sensor, and rðλÞ is the spectral reflectance of object. Using a discrete representation of these spectral functions, Eq. (1) can be written as the following matrix notation: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 3 2 6 ; 1 6 0 where the row vector M denotes the product of the vector form of LðλÞ, oðλÞ, and s cam ðλÞ, and r is the spectral reflectance in the form of a column vector. Finally, the mathematical expression of the eight-band multispectral image acquisition can be formulated by extending Eq. (2) to multiple bands  where I λ is the spectral responses values at an image pixel, M λ is the overall spectral sensitivity matrix of the multispectral camera, and each row of M λ corresponds to a spectral response curve in Fig. 2(b).

Multiwavelength Array Projector with Aperiodic Sinusoidal Fringe Patterns
For the realization of a simultaneous multiwavelength projection, the multiaperture principle 14 is applied. In the implementation of this principle, it is unavoidable that some projection units are in the state of off-axis projection. Thus the use of pseudostatistical patterns is advantageous for the projection optics design and the alignment of single-projection units, because with these patterns the dispersion at the projector lens has no influence on 3-D results, and it is not necessary to control the patterns' characteristics precisely, as well as their relation between each other. In this work, the aperiodic sinusoidal fringe patterns 23 are used by the reason of simpler pattern fabrication than the commonly used speckle patterns. Experimental investigations 24 have shown that reasonable 3-D measurement results can be obtained with the use of N ≥ 6 vertical patterns. With the adaption that these patterns are spectrally coded instead of being temporally projected, an array projector is developed that consists of N projection units to project N different spectral-coded fringe patterns simultaneously with spatially and spectrally varying properties a λ ðxÞ, b λ ðxÞ, c λ ðxÞ, and d λ ðxÞ.
In the case of aperiodic patterns, the offset a λ ðxÞ and amplitude b λ ðxÞ are the properties of the spectral light sources in each single projection units, and these light sources should be adjusted to the same brightness level. The period length 2π∕c λ ðxÞ and phase shift d λ ðxÞ of each pattern are generated using the random method in Ref. 23. First, N spatial-frequency spectra are generated with a pseudorandom number generator. Then a bandpass filter is applied to these spectra in order to control the maximal and minimal half period lengths of the corresponding intensity profiles. In the middle working distance, the maximal and minimal half period lengths of projected fringes observed by cameras should be 20 and 10 pixels, respectively. Finally, these filtered spectra are transformed backward into the spatial domain to generate aperiodic intensity profiles. Figure 3 shows the intensity profiles at different wavelengths within the marked horizontal line segment. Nevertheless, the fabrication of such patterns with continuously varying transmission is technically difficult. As a simplification, the intensity profiles in Fig. 3 are binarized so that the patterns can be fabricated as slides with gray codes, and the sinusoidal profile shapes are produced by a minor defocusing of projection. Figure 4 shows the schematic of the multiwavelength array projector. Figure 4(a) shows the optical setup of a single projection unit. The concentrator with a gold coating shapes the light emitted by a light-emitting diode (LED) into a homogeneous beam for a slide with an aperiodic fringe pattern. In Fig. 4(b), the arrangement of the projection units is shown. The projector consists of eight pipe projection units with different high-power LEDs (3 to 5 W). Figure 4(c) shows the central wavelengths of the used LEDs with an FWHM of ca. 50 nm. The cameras and single projection units are adjusted for a specified middle working distance with overlapping viewing and illumination fields, whereas the projection units are with some defocusing in order to generate the sinusoidal intensity profiles. The LEDs illuminate permanently, so that all patterns can be extracted from a single multispectral image.
3 Experimental Setup Figure 5 shows the experimental setup consisting of two Silios multispectral NIR cameras and one multiwavelength array projector at the center. The cameras are synchronized and work at a frame rate of 60 Hz, and the full resolution of the CMOS sensor is 1280 × 1024 pixel. The subimages at each spectral band have a reduced resolution of about 0.146 megapixel and require to be restored to the full image sensor resolution by demosaicing.
The stereo vision setup has a triangulation angle of ca. 18 deg and a baseline of ca. 480 mm. The measurement volume is about 300 mm × 300 mm × 300 mm, and the middle working distance is 1.5 m. By the use of objectives that are optimized for the NIR spectral range, the cross-channel Optical Engineering 123105-3 December 2018 • Vol. 57 (12) image distortion due to chromatic aberration is neglected. The geometric calibration of the stereo vision camera setup is performed using Zhang's camera calibration method 25 with the middle spectral band at 730 nm.

Pattern Extraction and Image Data Denoising
At first, dark signal correction of the multispectral images is performed, and the image resolution is recovered to the original sensor resolution by demosaicing, for which we used a simple bilinear interpolation method. Because of the high spectral crosstalk at some bands due to the irregular shapes of their sensor spectral responses [see Fig. 6(a)], the fringes appear smeared on the images due to the cross-channel mixture of different spectral patterns, as shown in Fig. 7(a). In order to recovery the designed spatial modulation of the patterns, a computational crosstalk compensation is performed. As a result, a virtual multispectral image cube with lower crosstalk is reconstructed by a linear combination of the spectral bands, whose weighting coefficients are determined based on the real spectral sensitivity data of the multispectral image sensors. For the crosstalk compensation, a virtual spectral sensitivity matrix M v λ is defined, in which the spectral response curves at each band have narrow Gaussian shapes with the same FWHM values 26 [see Fig. 6(b)]. Subsequently, a correction matrix T containing the weighting coefficients of the original spectral bands is calculated from a linear mapping between the original spectral sensitivity matrix M λ and the ideal virtual spectral sensitivity matrix M v λ . This can be solved by minimizing the cost function f E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 3 2 6 ; 3 7 6 f ¼ kM v λ − M λ Tk 2 : As a verification of the obtained correction matrix T, it is applied to the original spectral response curves M λ to calculate the corrected spectral response curves that are shown in Fig. 6(c). In our experiment, the corrected spectral response curves exhibit a mean correlation coefficient of 0.989 to the ideal Gaussian spectral response curves M v λ , verifying the used linear method. Using the correction matrix T, the By applying this crosstalk compensation at each pixel, the reconstructed virtual image appears as if the sensor had corrected spectral response curves in Fig. 6(c). Figure 7 shows the performance of the crosstalk compensation at the band 890 nm, which suffers from a high spectral crosstalk. It can be seen that the image after crosstalk compensation exhibits a higher fringe contrast and less smearing. Furthermore, the sensor sensitivity difference between the stereo multispectral cameras is reduced by performing crosstalk compensation with respect to the same ideal spectral response curves. The following step is to filter out the ambient light and the high-frequency sensor noise. Therefore, the bandpass filtering method by Guo and Huang 27 is implemented. Initially, the Fourier-transformed image is pixel-wise multiplied with a hamming window in order to mitigate the artifacts in the marginal areas of the image. Then, a horizontal bandpass filter can be designed based on the knowledge about the bandwidth range of the fringe patterns and applied to extract the projected patterns.

Stereo Matching and 3-D Reconstruction
After the preprocessing, the projected fringe patterns are isolated from the raw image data. As preparation for the pixel matching, the rectification method by Hartley 28 is performed for the stereo cameras. In the stereo rectification, a pair of 2-D projective transformations are determined based on the fundamental matrix and a number of controlling point pairs and applied to the two images in order to match the epipolar lines, whereby the image distortion after the rectification should be minimal at both cameras. After the rectification, the stereo images are line-by-line registered, as shown in Fig. 8.
Furthermore, the depth range of the 3-D imaging system could provide another restriction for the search of corresponding points. 29 As shown in Fig. 9, the ray containing the 3-D object point P in the first camera is restricted by P near and P far that correspond to the lower and upper boundaries of the depth range. Thus the search area for the corresponding point P ð2Þ of image point P ð1Þ is restricted within an epipolar segment. By applying the rectification transformations, this epipolar segment is transformed to a horizontal line segment in the rectified image of the second camera. In this way, an acceleration of stereo matching is achieved, and it reduces the possibility of global matching errors that could occur at a low number of statistical patterns.
The stereo pixel matching is realized by calculating the normalized cross correlation between the sequence of Moreover, a subpixel accuracy up to 1/10 pixel is realized in the pixel matching using a linear interpolation of the spectral values along the same row in all spectral subimages. The difference between the x-coordinates of the corresponding point pair is the so-called disparity value. The disparity map is then denoised by median filtering and completed by filling the small gaps with extrapolation. In the end, the disparity values are converted into homogeneous 3-D points via a mapping matrix 20 that can be calculated from the calibration data of the stereo vision system.

3-D Measurement Results and Discussion
For the characterization of the system accuracy, we measured a prism and a sphere with opaque and diffuse surfaces [see Fig. 10(a)], and Fig. 10(b) shows the 3-D measurement results. The obtained 3-D point clouds exhibit a high measurement completeness, but with some microscopic periodical artifacts, maybe as a consequence of the remaining crosschannel spectral crosstalk. At the prism, the plane fitting standard deviations at flats F1 to F5 were calculated, resulting in a mean standard deviation of 0.284 mm, while the measurement of the sphere delivers a sphere fitting standard deviation of 0.337 mm. Additionally, Fig. 11 shows the qualitative measurement result of a free-form women figure. These experimental results indicate that the quality of the spectral-coded aperiodic fringe patterns are sufficient for the cross-correlation-based stereo matching and 3-D reconstruction, and the proposed approach can achieve a reasonable 3-D accuracy and robustness at nontextured diffuse objects under a wide-baseline configuration of the camera setup.
The current main restriction of this system is the reduced spatial resolution of the multispectral snapshot cameras at each band. The reduced camera resolution leads to a decrease of the achievable resolution of disparity value and thereby the depth resolution of reconstructed 3-D image. Thus an improved demosaicing algorithm, such as in Ref. 30, is needed for the enhancement of the depth accuracy. Further possible approaches for a high-quality restoration of spatially downsampled images may be the adaption of some compressive sensing methods. [31][32][33] However, customized MSFAs with spatially pseudorandomized filter arrangement or synthetic coded apertures are needed for this, which require more expensive fabrication techniques and lead to higher costs.
In addition, the fixed pattern noise, especially the photoresponse nonuniformity in each spectral band of the image sensors could also result in some minor artifacts in the stereo matching and thus in the 3-D measurement results. For this problem, an in-depth sensor characterization in compliance with EMVA1288 standard 34 will be need as a basis for a pixel-wise correction of the spatial nonuniformity.
Moreover, we recognized that the spectral crosstalk at the spectral bands 850, 890, and 930 nm of the multispectral cameras is markedly stronger. The achievable signal-tonoise ratio of these bands is lower due to the limitation of the sensor saturation capacity. It results in artifacts during the pixel matching process and a degradation of the matching precision. Hence, the progress in multispectral sensor techniques regarding the spectral characteristics is the other crucial condition for further developments of the proposed approach.

Summary
In this work, we presented the principle and design of an optical 3-D sensor based on multispectral snapshot cameras and multiwavelength pattern projection. Its benefit is the realization of single-frame 3-D imaging with great spatial resolution, excellent depth accuracy, and high measurement robustness even at nontextured objects. With this sensor principle, the 3-D frame rate corresponds directly to the 2-D frame rate of the applied multispectral cameras and can be raised significantly by the use of high-speed image sensors. The first measurement results obtained with the prototype setup show the validity and potential of the proposed approach. Future research will be carried out to improve the image processing algorithms, especially the demosaicing and the fixed pattern noise correction, as well as the multispectral camera technology concerning the signal-to-noise ratio and spectral crosstalk between the individual spectral bands.