Building 3D local surface feature with a local reference frame (LRF) can obtain rotational invariance and make use of 3D spatial information, thereby boosting the distinctiveness of a 3D local feature. However, this situation is based on the assumption that the LRF is stable and repeatable. Owing to the disturbances like noise, point density variation, occlusion and clutter, LRF may suffer ambiguity so that limit the ability of a LRF-based 3D local feature. This paper presents an efficient method for LRF construction. The experimental results show the superior performance of our proposed LRF in terms of repeatability and robustness on several popular datasets by comparing with the state-of-the-art methods. Moreover, our method is computational efficient as well.
Currently, feature-based visual Simultaneous Localization and Mapping (SLAM) has reached a mature stage. Feature-based visual SLAM systems usually calculate the camera poses without producing a dense surface, even if a depth camera are provided. In contrast, dense SLAM systems simultaneously output camera poses as well as a dense surface of the reconstruction region. In this paper, we propose a new RGB-D dense SLAM system. First, camera pose is calculated by minimizing the combination of the reprojection error and the dense geometric error. We construct a new type of edge in g2o, which adds the extra constraints built with the dense geometric error to the graph optimization. The cost function is minimized in a coarse-to-fine strategy with GPU which contributes to the enhancement of system frame rate and promotion of large camera motion convergence. Second, in order to generate dense surfaces and provide users with a feedback of the scanned surfaces, we use the surfel model to fuse RGB-D streams and generated dense surface models in real-time. The surfels in the dense model are updated with embedded deformation graph to keep them consistent with the optimized camera poses after the system performs essential graph optimization and full Bundle Adjustment (BA). Third, a better 3D model is achieved by re-merging the stream with the optimized camera poses when the user ends the reconstruction. We compare the accuracy of generated camera trajectories and reconstruction surfaces with the state-of-the-art systems based on the TUM and ICL-NIUM RGB-D benchmark datasets. Experimental results show that the accuracy of dense surfaces produced online is very close to that of later re-fusion. And our system produces better results than the state-of-the-art systems in terms of the accuracy of the produced camera trajectories.
Simultaneous Localization and Mapping (SLAM) plays an important role in navigation and augmented reality (AR) systems. While feature-based visual SLAM has reached a pre-mature stage, RGB-D-based dense SLAM becomes popular since the birth of consumer RGB-D cameras. Different with the feature-based visual SLAM systems, RGB-D-based dense SLAM systems, for example, KinectFusion, calculate camera poses by registering the current frame with the images raycasted from the global model and produce a dense surface by fusing the RGB-D stream. In this paper, we propose a novel reconstruction system. Our system is built on ORB-SLAM2. To generate the dense surface in real-time, we first propose to use truncated signed distance function (TSDF) to fuse the RGB-D frames. Because camera tracking drift is inevitable, it is unwise to represent the entire reconstruction space with a TSDF model or utilize the voxel hashing approach to represent the entire measured surface. We use moving volume proposed in Kintinuous to represent the reconstruction region around the current frame frustum. Different with Kintinuous which corrects the points with embedded deformation graph after pose graph optimization, we re-fuse the images with the optimized camera poses and produce the dense surface again after the user ends the scanning. Second, we use the reconstructed dense map to filter out the outliers of the features in the sparse feature map. The depth maps of the keyframes are raycasted from the TSDF volume according to the camera pose. The feature points in the local map are projected into the nearest keyframe. If the discrepancy between depth values of the feature and the corresponding point in the depth map exceeds the threshold, the feature is considered as an outlier and removed from the feature map. The discrepancy value is also combined with feature pyramid layer to calculate the information matrix when minimizing the reprojection error. The features in the sparse map reconstructed near the produced dense surface will impose large influence in camera tracking. We compare the accuracy of the produced camera trajectories as well as the 3D models to the state-of-the-art systems on the TUM and ICL-NIUM RGB-D benchmark datasets. Experimental results show our system achieves state-of-the-art results.
We present an approach for real-time camera tracking with depth stream. Existing methods are prone to drift in sceneries without sufficient geometric information. First, we propose a new weight method for an iterative closest point algorithm commonly used in real-time dense mapping and tracking systems. By detecting uncertainty in pose and increasing weight of points that constrain unstable transformations, our system achieves accurate and robust trajectory estimation results. Our pipeline can be fully parallelized with GPU and incorporated into the current real-time depth camera tracking system seamlessly. Second, we compare the state-of-the-art weight algorithms and propose a weight degradation algorithm according to the measurement characteristics of a consumer depth camera. Third, we use Nvidia Kepler Shuffle instructions during warp and block reduction to improve the efficiency of our system. Results on the public TUM RGB-D database benchmark demonstrate that our camera tracking system achieves state-of-the-art results both in accuracy and efficiency.