Pointwise classification of mobile laser scanning point clouds of urban scenes using raw data

Abstract. Mobile laser scanning (MLS), which can quickly collect a high-resolution and high-precision point cloud of the surroundings of a vehicle, is an appealing technology for three-dimensional (3D) urban scene analysis. In this regard, the classification of MLS point clouds is a common and core task. We focus on pointwise classification, in which each individual point is categorized into a specific class by applying a binary classifier involving a set of local features derived from the neighborhoods of the point. To speed up the neighbor search and enhance feature distinctiveness for pointwise classification, we exploit the topological and semantic information in the raw data acquired by light detection and ranging (LiDAR) and recorded in scan order. First, a two-dimensional (2D) scan grid for data indexing is recovered, and the relative 3D coordinates with respect to the LiDAR position are calculated. Subsequently, a set of local features is extracted using an efficient neighbor search method with a low computational complexity independent of the number of points in a point cloud. These features are further merged to produce a variety of binary classifiers for specific classes via a GentleBoost supervised learning algorithm combining decision trees. The experimental results on the Paris-rue-Cassette database demonstrate that the proposed approach outperforms the state-of-the-art methods with a 10% improvement in the F1 score, whereas it uses simpler geometric features derived from a spherical neighborhood with a radius of 0.5 m.


Introduction
With the development of light detection and ranging (LiDAR) technology, mobile laser scanning (MLS) systems, which deploy one or multiple LiDARs on a ground-based vehicle, 1 can quickly collect a high-resolution and high-precision point cloud of the surroundings of the vehicle and have gained increasing attention in three-dimensional (3D) urban scene analysis, 2 including urban 3D modeling 3 and automated urban driving. 4 Classification of MLS point clouds, in which each point in an MLS point cloud is determined to belong to a specific class, e.g., ground, 5 road, 6 road markings, 7 vehicles, 8 power lines, 9 and street trees, 10,11 is a common and core task for various applications of 3D urban scene analysis. 12 Weinmann et al. 13 proposed a pointwise classification framework, whereby each individual point is classified by a binary classifier involving a set of local geometric features derived from the neighborhoods of the point. The framework does not require expert knowledge in the specific domain 12 and thus can be applied to label a variety of urban object classes. However, the main challenges of pointwise classification include a low distinctiveness of local geometric features and a high computational complexity of the neighbor search. The commonly used neighbor search approaches for MLS point clouds are based on the k-D tree algorithm, in which a k-dimensional index tree is constructed and the average complexity for the nearest-neighbor search is Oðlog NÞ for a point cloud with N points. To enhance the discrimination of local low-level geometric features, multiple neighborhood scales [14][15][16] or a selected optimal neighborhood scale 13,17,18 are recovered via the k-D tree, resulting in a higher computational cost. To address the above issues, Hackel et al. 14 downsampled the point cloud and built a multiscale pyramid of k-D trees to help improve the efficiency of neighbor searching.
An MLS point cloud is usually georeferenced by merging data from LiDAR and other sensors, such as inertial measurement units and global positioning systems. 1 However, more semantic information exists among the raw data acquired by the LiDAR, i.e., the relative positions of the measured points with respect to the LiDAR, which can be viewed as the relative positions of these points with respect to the road surface. This information can help to monitor the presence of most constructed objects since the spatial distributions of these objects along the road are generally regular. In addition, the raw data recorded in scan order are beneficial for organizing the point cloud. 19 To reduce the neighbor search time and enhance the distinctness of local geometric features for pointwise classification, this study fully exploits the contextual and topological information in the raw data of the MLS point clouds, as shown in Fig. 1. First, a two-dimensional (2D) scan grid for data indexing is recovered, and the relative 3D coordinates with respect to the LiDAR position are calculated. Subsequently, a set of local features is extracted using an efficient neighbor search method with a low computational complexity independent of the number of points N in a point cloud. These features are further merged to produce a variety of binary classifiers for specific classes via a boosting supervised learning algorithm combining decision trees.

Methods
This study considers an MLS system with a single 2D LiDAR sensor used in push-broom mode; 20 i.e., the scan plane of the sensor is orthogonal to the direction of vehicle movement.

Scan Grid Construction
A scan line is defined as a 2D profile acquired by a single rotation of the 2D LiDAR mirror. 19,21 Thus, an MLS point cloud can be organized by constructing a scan grid in which each row represents a scan line, as shown in Fig. 2(a). In the grid, a point p measured by the i'th beam on the j'th scan line can be indexed by ði; jÞ. The scan grid provides a compact representation of the point cloud with a size of N sl × N b , where N sl is the number of scan lines and N b is the number of laser beams per scan line.
To construct a scan grid from the MLS data, the scan angles of the point cloud should be recorded. Some LiDARs do not record the point if the laser beam does not return. In this case, we can segment the point cloud into scan lines to calculate the scan line index j for point p by finding a significant jump between the scan angles of adjacent points. Then, the beam index i for point p can be calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 1 0 6  where θ is the scan angle of point p, θ 0 is the start scan angle of a scan line, and Δθ is the scan resolution, i.e., the nominal angular increment between adjacent beams on a scan line.
If the scan resolution Δθ is not provided, it can be estimated using the mean or median of the differences between scan angles of adjacent points on the same scan line.

Relative Coordinate Calculation
To obtain more contextual information from LiDAR data, a relative 3D coordinate system, where x represents the moving distance from the origin along the trajectory of the LiDAR, y represents the horizontal displacement with respect to the LiDAR position, and z represents the vertical displacement with respect to the LiDAR position, is constructed for the MLS point cloud. The relative coordinates of point pði; jÞ are calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 2 7 7 8 < : xði; jÞ ¼ P j k¼1 vðkÞΔt yði; jÞ ¼ rði; jÞ cos θði; jÞ zði; jÞ ¼ rði; jÞ sin θði; jÞ ; (2) where vðkÞ is the vehicle speed at the time when the k'th scan line is acquired by the LiDAR, Δt is the time interval between adjacent scan lines, rði; jÞ is the radial distance of point pði; jÞ, and θði; jÞ is the scan angle of point pði; jÞ.
Since the attitude of the LiDAR (vehicle) over a short period of time can be regarded as constant, the relative coordinates can describe a local spatial distribution of points as well as the georeferenced coordinates, as shown in Figs. 2(b) and 2(c).

Neighbor Search
The neighborhood of each point should be recovered for local feature extraction. A spherical neighborhood Sði; jÞ of point pði; jÞ is defined as a set of points within a sphere centered at pði; jÞ with a radius of δ. We propose a fast procedure for searching Sði; jÞ with a computational complexity of Oð1Þ, i.e., independent of the number of points N in a point cloud. Consider the measurement resolutions Δi and Δj of the scan grid at point pði; jÞ. The resolution Δi, the minimum distance between the point pði; jÞ and its adjacent points pði AE 1; jÞ on the j'th scan line, can be estimated by the distance between the point pði; jÞ and the ði AE 1Þ'th beams on the j'th scan line, as shown in Fig. 3. The resolution Δj, the distance between the point pði; jÞ and its adjacent points pði; j AE 1Þ, can be estimated by the distance between the j'th scan line and the ðj AE 1Þ'th scan lines along the trajectory of the LiDAR. Thus, the measurement resolutions Δi and Δj are calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 5 0 7 Δi ¼ rði; jÞ sin Δθ Then, a candidate neighborhood Rði; jÞ can be derived from the scan grid: Rði; jÞ ¼ The spherical neighborhood Sði; jÞ is finally determined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 3 9 1 Sði; jÞ ¼ fpði S ; j S Þjkpði S ; j S Þ − pði; jÞk ≤ δ; pði S ; j S Þ ∈ Rði; jÞg: The computational complexity of the proposed neighbor search method can be measured by the number of points in Rði; jÞ: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 3 3 5 which depends on the measurement resolutions of the scan grid at point pði; jÞ and is independent of the number of points in the whole point cloud. Note that affected by the variation of point density, the computational complexity of the proposed method becomes high with a small range rði; jÞ.

Feature Extraction
To demonstrate the effectiveness of the relative 3D coordinates, a set of intuitive geometric features are extracted from the spherical neighborhood with a δ ¼ 0.5 m radius, as shown in Table 1. The density feature is corrected with the measurement resolutions to eliminate the influences of varying vehicle speeds and measurement ranges: where N S ði; jÞ is the number of points in the spherical neighborhood. In addition, radiometric and penetrating features are derived based on the intensities I and numbers of echoes n measured by the LiDAR, providing further distinctive properties not covered by geometric features. Fig. 3 Resolution Δi at point pði; jÞ, i.e., the minimum distance between point pði; jÞ and its adjacent points on the j'th scan line.

Classifier Learning
The local features should be merged into a series of binary classifiers for a variety of specific classes by using supervised learning algorithms. The decision tree adopts a divide-and-conquer strategy by partitioning the feature space into subregions with a high class purity on the training set. 22 A top-down tree structure is constructed and each node chooses the best feature from the feature set to split the training data. Since a single decision tree has a very low bias and extremely high variance, we use the boosting framework to ensemble multiple decision trees to improve the classification accuracy and generalization ability.
We chose the GentleBoost algorithm, in which the total exponential loss on the training set is minimized using a functional Newton-like numerical optimization method. 23 The ensemble classifier is where T m is a decision tree generated as an incremental function in the m'th iteration and M is the number of iterations.

Dataset
To compare our method with prior state-of-the-art methods, we use a publicly available and labeled database, namely Paris-rue-Cassette. 20 It contains 12 million points recorded on a street section in Paris with a length of ∼200 m. A 2D LiDAR sweeps from −180 deg to 180 deg with Δt ¼ 10 ms time interval. The starting scan angle θ 0 is −180 deg, i.e., the upward direction. All coordinates are geo-referenced (E,N,U) in Lambert 93 and altitude IGN1969 (grid RAF09) reference system. For the points with laser beam return, the range r, scan angle θ, intensity I, number of echoes n, and georeferenced LiDAR position ðx 0 ; y 0 ; z 0 Þ are recorded in scan order in addition to the georeferenced coordinates. The object classes for the experiments include façade, ground, cars, two-wheelers, road inventory, pedestrians, and vegetation. See Table 2 for details.

Recovery of the Scan Grid
The point cloud was first divided into scan lines at the positions where the sign of the scan angle changed from positive to negative. Then, the angle resolution Δθ ¼ 0.12 deg was estimated by analyzing the distribution of the difference between scan angles of adjacent points on the same scan line. Finally, a scan grid with a size of 4642 × 3000 was constructed, as shown in Fig. 4. Given the real-time speeds v of LiDAR estimated by the georeferenced LiDAR positions, the relative 3D coordinates were computed using Eq. (2).

Neighbor Search Efficiency
To compare the computational complexities of the k-D tree algorithm and the proposed neighbor search method in searching spherical neighborhood, we run the two algorithms on a laptop PC with an AMD Ryzen 5 4600H CPU (hexa-core, 3.0 GHz) and 16 GB of RAM. The k-D tree algorithm was performed over the raw Paris-rue-Cassette dataset, while the proposed neighbor search method was performed over the scan grid of the Paris-rue-Cassette dataset. The Point Cloud Library was used for the k-D tree neighbor search. The search radius δ is set to 0.2, 0.5, and 0.8 m. As shown in Table 3, our method is much faster.

Effectiveness of the Relative 3D Coordinates
To demonstrate the effectiveness of the relative 3D coordinates, we compare the distinctiveness of the geometric features derived from the georeferenced and relative 3D coordinates. The geometric features in Table 1 are divided into three kinds for comparison: (i) single coordinates of the central point; (b) basic features derived from the neighborhood using a single coordinate, i.e., the means, standards, and ranges within the neighborhood; and (c) shape features derived from the structure tensor of the neighborhood using three coordinates, i.e., the linearity L λ , planarity P λ , scattering S λ , and omnivariance O λ .
The Bayes error 22 for each feature is numerically estimated as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 3 3 8 e ¼ 1 2 where h þ ðn h Þ and h − ðn h Þ are the n h 'th bins of the probability histograms with N h ¼ 100 bins for a single feature, corresponding to an object class and a nonobject class, respectively. All features are mapped to the interval [0, 1]. The search radius δ is set to 0.5 m. Figure 6 shows the average Bayes error for each kind of geometric feature on the seven classes. Figures 6(a) and 6(b) show that single relative coordinates are more distinctive than single georeferenced coordinates, implying that the relative coordinates introduce more semantic information. Figure 6(c) shows that shape features using three relative coordinates have similar distinctiveness as those using three georeferenced coordinates, implying that the relative 3D coordinates can maintain the local spatial distribution of points as well as the georeferenced 3D coordinates.

Pointwise Classification Accuracy
The best results of pointwise classification we are aware of are those of Refs. 14, 24, and 25. In Ref. 25, a hierarchical framework composed of ground filtering, structural segmentation, and contextual classification was proposed. In Refs. 14 and 24, geometric features are derived at multiple scales or an optimal scale and then are combined into pointwise object classifiers. To improve the classification accuracy, statistical features derived from the 2D projection of the point cloud, the shape context 3D, and signature of histogram of orientations features are also utilized in Refs. 14 and 24 as well as fundamental geometric features.
The experiments use the same training and test sets as in Refs. 14 and 24; i.e., 1000 points per class are randomly selected as training samples and the remaining data are used as test samples. The number of iterations for GentleBoost is M ¼ 500. Table 4 shows the classification results. Results of a variety of pointwise classification approaches are provided in Ref. 22, and the best result for each class is used for comparison. Compared with the prior state-of-the-art methods, our approach achieves an approximately 10% improvement in terms of the F 1 score. The improvement increased to 15% by adding radiometric and penetrating features mentioned in Sec. 2.4 into the classifiers.
The F 1 score depends on the decision threshold. When the decision threshold changes, the F 1 score changes. The area under the receiving operating characteristic (ROC) curve (AUC) does not depend on the decision threshold, so it is better than F 1 score to evaluate the performance of a classifier. Figure 7 shows the ROC curves for the proposed approach with all features. The AUC can summarize the relationship between the true-and false-positive rates of a binary classifier for different decision thresholds; hence, we also use the AUC to evaluate our approach in Table 4.  14 Landrieu et al. 24 Li et al. 25 Proposed method Geometric features All features

Discussion and Conclusion
This study aims to speed up the neighbor search and enhance feature distinctiveness for pointwise classification by exploiting topological and contextual information among raw data. Considering an MLS system with a single 2D LiDAR sensor, the cores of our approach are: (i) to construct a scan grid according to the scan pattern to organize an MLS point cloud; (ii) to compute the relative 3D coordinates with respect to the LiDAR position; and (iii) to recover the neighborhood by a fast search method. The computational complexity of the proposed neighbor search strategy is independent of the number of points in a point cloud with an average search rate of 49.49%. In terms of the Bayes error, geometric features using the relative coordinates are more distinctive than these features using georeferenced coordinates. Compared with the stateof-the-art methods, the proposed pointwise classification achieves an approximately 10% improvement in terms of the F 1 score, whereas it uses simpler geometric features derived for a search radius of 0.5 m. Furthermore, our approach is straightforward to parallelize and would be faster when taking advantage of parallel programming. The proposed approach has several limitations: (1) the proposed neighbor search approach only works on an MLS system with a single 2D LiDAR sensor used in push-broom mode; (ii) to construct or recover a scan grid from the MLS data, the scan angles of the point cloud should be recorded, and the MLS data needs to be recorded in scan order or time stamps of the MLS data are recorded; (iii) because the proposed approach uses local features derived from several adjacent scan lines, it will not work well if the vehicle drives have multiple drive-runs in different directions (e.g., drive forward and backward along the road).
In future work, the proposed approach will be extended to MLS systems with a single 3D LiDAR or multiple 2D/3D LiDARs by exploring the measurement geometry of 3D LiDARs as well as the spatial relationship of multiple LiDARs. Furthermore, postprocessing, such as soft labeling, 24 will be considered to improve the classification accuracy.