The technique of three-dimensional environment information perception and reconstruction is a model in which a subject equipped with a specific sensors establishes an environment and simultaneously estimates its own motion during movement without environmental prior information. This technique is widely used in application platforms such as driverless, unmanned aerial vehicle, and indoor robots. The research on mobile indoor robots has attracted wide attention as its diversity of application scenarios. In recent years, with the increasing maturity of sensor technology, a variety of sensors have been used in mobile robots to implement the function. Multi-sensor fusion to improve the quality of environment information perception and reconstruction has become the focus of research in this field. In this review, we discussed on the sensor types and hardware characteristics of multi-sensor fusion, and put forward that using cameras as the sensor core is the most widely used and best effect method. Furthermore, based on the research results in this field, multi-sensor fusion methods are divided into three levels: sensor data level, feature level and decision level. The implementation characteristics and effects of various methods based on typical technologies and representative papers are also discussed. Finally, the key issues to be solved in the multi-sensor fusion process are proposed, which points out the direction for the follow-up research work.