In recent years, many SLAM (simultaneous localization and mapping) systems have appeared showing impressive dense scene reconstruction. However, the normal SLAM system build 3D scenes at point level without any semantic information. Many computer vision applications require high ability of scene understanding and point-based SLAM shows insufficiency in these applications. This paper studies about fusing 3D object recognition into SLAM system, using hand-held RGB-D camera and RTAB-Map to reconstruct dense point cloud of 3D indoor scene. Then we use supervoxel based point cloud segmentation approaches to over-segment the scene. 3D object classification model trained by PointNet is added to merge the segmentation process and object recognition. Our experiment on indoor environment shows the effectiveness of this system.