Histogram formation and noise reduction in biaxial MEMS-based SPAD light detection and ranging systems

Abstract. In many applications, there is a great demand for reliable, small, and low-cost three-dimensional imaging systems. Promising systems for applications such as automotive applications as well as safe human robotic collaboration are light detection and ranging (lidar) systems based on the direct time-of-flight principle. Especially for covering a large field of view or long-range capabilities, the previously used polygon-scanners are replaced by microelectromechanical systems (MEMS)-scanners. A more recent development is to replace the typically used avalanche photodiodes with single-photon avalanche diodes (SPADs). The combination of both technologies into a MEMS-based SPAD lidar system promises a significant performance increase and cost reduction compared with other approaches. To distinguish between signal and background/noise photons, SPAD-based detectors have to form a histogram by accumulating multiple time-resolved measurements. In this article, a signal and data processing method is proposed, which considers the time-dependent scanning trajectory of the MEMS-scanner during the histogram formation. Based on known reconstruction processes used in stereo vision setups, an estimate for an accumulated time-resolved measurement is derived, which allows to classify it as signal or noise. In addition to the theoretical derivation of the signal and data processing, an implementation is experimentally verified in a proof-of-concept MEMS-based SPAD lidar system.


Introduction
For the realization of reliable, small, and low-cost three-dimensional (3D) imaging systems, light detection and ranging (lidar) systems based on the direct time-of-flight (dtof) principle are considered to be one of the most promising technologies. Especially for automotive applications as well as safe human robotic collaboration, many proof-of-concept systems are currently built and tested extensively. A new trend of scanning lidar systems is the replacement of the bulky and expensive polygon scanners by microelectromechanical systems (MEMS)-scanners. These MEMS-scanners have the advantage that they can be fabricated using standard CMOS processes, and the incorporation of these scanners offers the opportunity to greatly reduce the overall size and cost of the system. Furthermore, in many systems, it is now being tested to replace the typically used avalanche photodiodes with single-photon avalanche diodes (SPADs). A major advantage of SPADs is that they can also be fabricated using standard CMOS processes and their ability to be integrated into large photodetector matrices, which significantly increases the spatial resolution of the lidar system. Especially, in challenging applications that require long-range capabilities or a large field of view (FOV) to be covered, the combination of an MEMS-scanner and SPAD detector promises a significant performance increase. One of the most recent examples of a proof-of-concept MEMS-based SPAD lidar system was reported by Sony in Ref. 1 with a range up to 300 m.
Even though MEMS-based SPAD lidar systems are becoming more and more prominent, the authors are not aware of any prior publication that connects the statistical detection process necessary for the working principle of SPAD-based detectors with the time-dependent scanning trajectory of the MEMS-scanner. On the one hand, these systems often utilize a pointwise illumination of the FOV, especially for long-range applications. On the other hand, SPAD-based detectors have to form a histogram by accumulating multiple time-resolved measurements. Since a SPAD cannot distinguish between a signal and a noise/background photon, further distinguishing criteria based on the time-dependent scanning trajectory must be considered during the formation of a histogram.
In Sec. 2, the acquisition statistics of a system utilizing an MEMS-scanner driven in resonance is formulated. Furthermore, the concepts used in the reconstruction process of triangulation-based sensors are briefly summarized, which will be used to derive an analogous concept for MEMSbased SPAD lidar systems.
Based on the results of Sec. 2, Sec. 3 describes a proof-of-concept MEMS-based SPAD lidar system and proposes an implementation of a signal and data processing chain for the formation of a histogram that exploits the biaxial system configuration and utilizes the time-dependent scanning trajectory of the MEMS-scanner to further discriminate signal and background photons.
Section 4 applies the proposed signal and data processing chain to measured values. For the validation, two different experiments with varying lighting conditions are conducted. Section 5 summarizes the results and provides an outlook for further improvements.

Model and Method
The following extends the statistical detection process of SPADs to consider the time-dependent scanning trajectory of the MEMS-scanner. For simplicity, the geometry is reduced to a twodimensional (2D) problem, but the same arguments and correspondences hold for the 3D case. Furthermore, the imaging optic is assumed to be distortion-free. After a brief overview of a reconstruction method used in triangulation-based stereo vision setups, an analogous concept is derived for the detection process of MEMS-scanner systems. This method combines the spatial and timing information of a SPAD with the time-dependent scanning trajectory of the MEMS-scanner. To provide a further distinguishing feature between signal and noise/background photons, these information are checked for consistency.

Acquisition Statistics Considering the Time-Dependent Scanning
Trajectory of MEMS-Scanners SPAD-based detectors have to form a histogram by accumulating multiple time-resolved measurements. The statistical detection method for SPAD-based detectors is extensively covered in recent publications, see, e.g., Refs. 2 and 3. In contrast to SPAD-based flash lidar systems, where the amount of accumulations per histogram in every pixel is simply given by the laser pulse repetition frequency f rep and the frame rate, in a system utilizing a scanning illumination it also depends on the scan trajectory, the FOV, and the required spatial resolution. The definition and correspondences between the mechanical scan angle θ mech ðtÞ, an initial rotation angle θ 0 , and the resulting scan angle θðtÞ of an MEMS-scanner are shown in Fig. 1. In addition, the normalized direction of the laser emission d laser , the normal vector n MEMS of the MEMS-scanner, and the resulting scan direction d scan are given.
The determination of the mean number of accumulations for a MEMS-scanner driven in resonance can be considered as a sampling problem. The sampling points of the mechanical scan angle θ mech , which can be represented in the time domain by a periodic cosinusoidal oscillation, are given by the reciprocal of the laser pulse repetition frequency f rep and may be expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 1 0 1 where k is an integer, φ 0 represents an arbitrary constant phase, and θ mech;max is the maximum mechanical scan angle. In the case of an electrostatic driven scanner, the mechanical scan angle is a function of the geometry of the scanner, its driving voltage, and its scan frequency f mech . Using the sampled points, a distribution of the number of measurements may be stated or, by weighting it with the number of measurements, a mean number of accumulations per histogram and pixel may be obtained. For a one-dimensional oscillation, a frequency ratio of f mech ∕f rep ¼ 0.0785 and 400 consecutive accumulations both representations are exemplarily shown in Fig. 2.

3D reconstruction in Triangulation Sensors
The following gives a brief summary on the reconstruction of 3D points using a stereo camera setup. The reconstruction based on a purely geometric solution will be used here as an example because of its simplicity and to illustrate the basic concepts for the subsequent discussion. The basic geometric relations and notations required for the geometric solution are shown in Fig. 3. Assuming a 3D point x is observed from two different camera locations, where their projection centers O 0 and O 00 are separated by their baseline b. This point corresponds to two image points u 0 ¼ P 0 ðxÞ and u 00 ¼ P 00 ðxÞ in the image planes, where the correspondence is given by the camera-specific projection matrices P 0 and P 00 . The line equations l 0 and l 00 , which are given in Eqs. (2) and (3), in 3D space can be constructed. Their origin is the respective image point and their direction vectors d 0 and d 00 are determined using their respective projection center. If the  epipolar constraint l 0 · ðb × l 00 Þ ¼ 0 is fulfilled, both lines l 0 and l 00 intersect in the 3D space and an exact solution for the scalars k 1 and k 2 exists. If the epipolar constraint is violated, the geometrical solution solves the system of linear equations given in Eq. (4) and estimates the 3D point x as the mid-point of the shortest line segment that joins both lines l 0 and l 00 . 4 More sophisticated estimates for the point x take into account uncertainties in the imaging process and can be shown to be statistically optimal. Examples for these estimators can be found in Refs. 4 and 5.
Apart from the estimation of the intersection point, the epipolar constraint can be used to reduce the search space for correspondences in the image pairs from a 2D space to a one-dimensional (1D) space. 4 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 4 1 8 (2) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 3 7 5 l 00 ¼ u 00 þ k 2 d 00 ; E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 3 5 3

3D Reconstruction in dtof Measurement
In the following, a reconstruction method for MEMS-based scanning lidar systems utilizing the dtof method is outlined. The reconstruction process may be formulated analogously to the reconstruction presented in the previous subsection, where one of the projection centers is replaced by the MEMS-scanner. The top-view of this geometry is shown in Fig. 4. As shown in Fig. 4, the global coordinate frame is fixed to the center of the sensor and the optical axis of the receiving optics coincides with the z axis. The optical axis of the transmitter is defined by the MEMS-scanner at a mechanical scan angle θ mech of zero. The rotation angle θ 0 of the MEMS-scanner is chosen such that the optical axes of the transmitter and receiver intersect at half the maximum distance, defined here as the working distance W. If the MEMS-scanner and the projection center are separated by a baseline b ¼ ½x M0 ; 0;0 ⊺ , the rotation angle θ 0 may be expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 1 6 7 Using the time-dependent deflection angle θðtÞ, the normal vector n MEMS ¼ ½cosðθðtÞÞ; sinðθðtÞÞ ⊺ may be defined. The normalized reflection direction d scan follows from the vector reflection law and is determined as where d laser is the normalized direction of the laser emission.
Using the approximation of an ideal projection with a thin lens, the line l 0 may be expressed analogously to Eq. (2).
In the 1D pointwise scanning case with an active illumination and a single line sensor, the epipolar constraint must be fulfilled and may be stated as d scan · ðb × l 0 Þ ¼ 0. The importance of the epipolar constraint becomes obvious applying it to the case of a 2D pointwise scanning system with an active illumination and a 2D array detector. Since the current scan angle and therefore the scan direction d scan is known, only the pixels fulfilling this constraint must be readout, which greatly reduces the amount of data that needs to be transferred, stored, and processed.
In the absence of background radiation and noise in the sensor, invoking the epipolar constraint would be sufficient to uniquely specify the point x. Since this is usually not the case, a further criterion needs to be specified. A measurement utilizing the dtof method contains timing information for every pixel. Considering the geometry shown in Fig. 4, the time of flight t TOF must be equal to E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 3 0 7 where c is the speed of light through the medium and x scanner is the position of the scanner. Without any prior knowledge of the scene, an analytical solution for the distance Z can be derived for a given pixel position u 0 , a measured time of flight t TOF , and scan angle θ. For the 2D geometry, this solution is given in Eq. (8). Combining the distance Z with the imaging equation, given in Eq. (9), the 2D point x ¼ ½X; Z ⊺ can be determined: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 1 4 6

Spatial Uncertainties in the dtof Measurement
Spatial uncertainties in the dtof measurement can arise from either spatial or temporal uncertainties. Spatial uncertainties arise from the uncertainty involved in the determination of the current scan angle and the finite pixel size. Temporal uncertainties arise from the pulse-to-pulse timing jitter between subsequent laser pulse emissions and the minimum resolvable time given by the time-to-digital converter of the detector. The current scan angle is monitored with a sampling frequency much higher than the oscillation frequency. Therefore, this uncertainty is neglected in the following. In addition, we consider the case where the pulse-to-pulse timing jitter between the laser pulse emissions is less than the minimum resolvable time, so this is also neglected.
The finite pixel size gives rise to a spatial uncertainty in the x direction of the received photon. Under the assumption of a lateral uniform pixel response, this yields a uniform distribution over the active pixel area. The mean μ pix is equal to the center of the pixel and can be expressed for the sensor, which will be considered later, as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 6 1 5 where u is the pixel coordinate and w pix is the spacing between two pixels as given schematically in Fig. 6. Its variance σ 2 pix is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 5 5 5 where d SPAD is the diameter of an SPAD. Usually, this assumption is not valid for conventional photodiodes, 6 but for SPADs the uniformity of the response to photons impinging at different positions in the active area is a key parameter. Through careful design of the device, a uniform pixel response, in terms of the photon detection efficiency and the detection delay, may be achieved. 7 The minimum resolvable time represented by the bin width t bin gives rise to a spatial uncertainty in the z direction of the received photon. The discretized time of flight t TOF is equal to the bin number N bin multiplied by the minimum resolvable time t bin of the time-to-digital converter. This discretization causes a quantization error. In a time-to-digital converter, and with only minor assumptions about the underlying statistics of the photon detection, the time of arrival in a bin is uniformly distributed. A necessary and sufficient condition for this may be found in Ref. 8, and its application to a commonly used time-to-digital converter architecture may be found in Ref. 9. Therefore, the mean μ TOF is the center of the bin given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 1 1 6 ; 3 5 6 and its variance σ 2 TOF may be expressed as In the following, the first-order second-moment method is used to propagate the uncertainties of the measured pixel coordinate u and its time of flight t TOF from the image into the object space. To achieve this, the point x ¼ ½X; Z ⊺ is first expressed in polar coordinates using the known correspondences. The mean of the 2D point in ðr; φÞ ⊺ space is then E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 1 1 6 ; 2 0 7 and E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 5 ; 1 1 6 ; 1 5 5 φ TOF ¼ atan 2½Zðμ TOF Þ; Xðμ pix ; μ TOF Þ; where atan 2 is the two-argument arctangent function. Under the assumption of negligible covariance between μ pix and μ TOF , which is the case if the pixel response is assumed to be uniform in terms of the photon detection efficiency and the detection delay, applying the first-order second-moment method to the point in ðr; φÞ ⊺ space results in an uncertainty in r-direction of and in φ-direction of E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 7 ; 1 1 6 ; 4 8 6 As an example, resulting uncertainty bounds in Cartesian coordinates for a bin width t bin of 312.5 ps, different pixel numbers u and distances Z are shown in Fig. 5.

System Description and Implementation of the Signal and Data Processing
After a brief description of a proof-of-concept MEMS-based SPAD lidar system, the implementation of a signal and data processing based on the results of the previous section follows.

Sensor and System Description
The sensor used here is the SPADEye2 from the Fraunhofer Institute for Microelectronic Circuits and Systems. 10 It is a 2 × 192 pixel SPAD-based dtof line sensor, where only one of these lines is actively illuminated. For timing measurements, a time-to-digital converter with a resolution of t bin ¼ 312.5 ps and a full range of 1.28 μs, which corresponds to a total dtof detection range of 192 m, is implemented in each pixel. As schematically depicted in Fig. 6, each pixel consists of four vertically arranged SPADs with a diameter d SPAD of 12 μm. The height h pix of a pixel  is 209.6 μm and its width w pix is 40.56 μm. 11 The coordinate system for the following discussion is fixed to the center of the active sensor area. The relation between the Cartesian coordinates ðx; yÞ ⊺ and the pixel-units ðu; vÞ ⊺ is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 8 ; 1 1 6 ; 3 3 3 where u is the pixel number in the range of 1 to 192. Since only the line in the center of the sensor is actively illuminated here, the y-or v-component of the vector is zero.
To illuminate the scene, a collimated laser beam is deflected in the horizontal direction by a single axis resonant MEMS-scanner with electrostatic drive. The scanner is the Fovea3D sending mirror, and its driving and monitoring electronics SiMEDri are fabricated at the Fraunhofer Institute for Photonic Microsystems. 12,13 The laser source is a pulsed laser diode with a center wavelength of 659 nm, an optical peak power of 80 nW, and a temporal pulse width of 15 ns. Although this center wavelength is not commonly used in lidar applications, it greatly simplifies the alignment procedure and the discussion is valid for arbitrary optical wavelengths. The overall lidar system is a biaxial arrangement, and the distance between the center of the sensor and the center of the MEMS scanner is 12.5 cm. A list summarizing the components parameters is given in Table 1.

Signal and Data Processing
A block diagram of the proposed signal and data processing chain is shown in Fig. 7. For every measurement, the sensor outputs a vector N bin that contains the measured bin values of every pixel and a measurement timestamp T meas , which is used to determine the current scan angle θ. The zero crossings of the mechanical scan angle θ mech are monitored using a piezoresistive  sensor placed on the torsional bar of the scanner. For the determination of the current scan angle θ, a vector containing these zero crossings ZC mech is used. As outlined in Sec. 2.4, the determined scan angle θ and the vector of measured bin values N bin are used to estimate the mean distances μ r and angles μ φ as well as their respective variances σ r and σ φ for every pixel. These estimates are further tested for consistency using the estimated distance Z, the known position of the scanner x scanner , and its current scan angle θ. If the determined point lies within the uncertainty bound, the measured value is classified as signal S. Otherwise, the measured value is stored in a vector N and labeled as noise, which may further be used to estimate the background radiation impinging on the sensor for example.

Experimental Verification
To verify the signal and data processing, different experiments were conducted, which are described in the following. A first proof of concept measurement with room lighting as background radiation was conducted. The scene with annotated distances to the objects is shown in Fig. 8(a), and the lighting conditions are shown. For better visibility in the picture, the laser source was set to a constant optical output power of 10 mW. The raw measurement data of 800 consecutive accumulations are shown in Fig. 8(b). For better visibility, only the first 256 bins are displayed. As expected in room lighting conditions, the three different objects are clearly visible even without any further processing or classification. Applying the signal and data processing as outlined in Sec. 3 yields the histogram of the measured values classified as signal S shown in Fig. 8(c). The measurements labeled as noise N are shown in Fig. 8(d). Comparing Figs. 8(b) and 8(c), it can be seen that most of the noise is removed and only some spurious outliers that randomly satisfy the conditions of the uncertainty bounds are present. Furthermore, the total number of counts in the raw measurement data of 2737 is reduced to 879, which corresponds closely to the number of accumulations, while not altering the peak counts, for example, encountered in the pixels 14 and 81.
A second measurement was conducted with the same scene as shown in Fig. 8(a), but additionally a 1-kW halogen floodlight was used to artificially increase the background radiation. Using a maximum likelihood estimator, the average background rate generated by the additional illumination was estimated to be around 6 MHz per pixel. (For comparison without the additional illumination, a background rate of about 100 kHz per pixel was estimated.) For better visibility, 3200 consecutive accumulations were used and the raw measurement data are shown in Fig. 9(a). As can be seen, the scene is barely visible and dominated by the background noise. Figure 9(b) shows the histogram of the measured values classified as signal S. Before further processing both histograms with a simple peak detector, the same moving average filter and thresholding was applied to both histograms. The resulting filtered outputs are shown in Figs. 9(c) and 9(d), where the processed histogram using the data processing as outlined in Sec. 3 closely resembles the measurement without any background radiation as shown in Fig. 8(c).

Conclusion and Outlook
In the conclusion, an extension of the statistical detection process of SPAD-based detectors considering the time-dependent scanning trajectory of MEMS-scanners was derived. Based on this, a signal and data processing strategy was presented using principles known from the 3D reconstruction process of triangulation sensors to distinguish between signal and noise/background photons. The signal and data processing strategy was implemented in a proof-of-concept MEMS-based SPAD lidar system and its functionality was verified experimentally. Furthermore, it was shown that the influence of strong background radiation is largely attenuated, and an evaluation of the data was still possible.
Utilizing an SPAD-based 2D array detector and an MEMS-scanner with a 2D scan trajectory, the presented method may be easily extended to distinguish between signal and noise/ background photons in the 3D case. As briefly mentioned in Sec. 2.3, by checking the epipolar constraint, the amount of data to be transmitted, stored, and processed can be greatly reduced.
Roman Burkard received his BSc and MSc degrees in electrical engineering and information technology from the University of Duisburg-Essen in 2016 and 2018, respectively, where he is currently working toward his PhD in electrical engineering at the Chair of Electronic Components and Circuits. His research focuses on the development of light detection and ranging systems and on the challenges posed by the combination of single-photon avalanche diodes and a scanning illumination.
Manuel Ligges received his diploma and doctorate in physics from the University of Duisburg-Essen. Until 2019, he worked as a research assistant and assistant professor in the field of solidstate physics. Currently, he leads the group of optical systems at the Fraunhofer Institute for Microelectronic Circuits and Systems (IMS) in Duisburg.
Thilo Sandner studied electrical engineering at the Technical University of Dresden, Germany, where he received his doctorate in 2003. Since 2003, he has been working as a scientist at the Fraunhofer IPMS, where he headed the R&D group for MEMS scanning mirrors for more than 10 years. Currently, he works as a project manager and key researcher for the development of innovative MOEMS components, system design, and new applications of photonic microsystems such as MEMS-based LiDAR.
Reinhard Viga received his diploma degree in electrical engineering and the Dr.-Ing. from Gerhard Mercator University of Duisburg in 1990 and 2003, respectively. Since 1990, he was with the Chair of Electromechanical System Design working on medical sensor system typologies and application aspects. Currently, he is the group manager in the Chair of Electronic Components and Ciruits of the University of Duisburg-Essen. Besides sensor technology, his research interests cover the design of embedded systems for medical diagnostics and medical image processing.
Anton Grabmaier studied physics at the University of Stuttgart and specialized in semiconductor physics and measurement technology. His dissertation was focused on laser diodes. Since 2006, he has been a professor at the University of Duisburg-Essen and is working as the director of the Fraunhofer Institute for Microelectronic Circuits and Systems (IMS) in Duisburg.
André Merten: Biography is not available.