Understanding the 3D structure of the environment is advantageous for many tasks in the field of robotics and autonomous vehicles. From the robot’s point of view, 3D perception is often formulated as a depth image reconstruction problem. In the literature, dense depth images are often recovered deterministically from stereo image disparities. Other systems use an expensive LiDAR sensor to produce accurate, but semi-sparse depth images. With the advent of deep learning there have also been attempts to estimate depth by only using monocular images. In this paper we combine the best of the two worlds, focusing on a combination of monocular images and low cost LiDAR point clouds. We explore the idea that very sparse depth information accurately captures the global scene structure while variations in image patches can be used to reconstruct local depth to a high resolution. The main contribution of this paper is a supervised learning depth reconstruction system based on a deep convolutional neural network. The network is trained on RGB image patches reinforced with sparse depth information and the output is a depth estimate for each pixel. Using image and point cloud data from the KITTI vision dataset we are able to learn a correspondence between local RGB information and local depth, while at the same time preserving the global scene structure. Our results are evaluated on sequences from the KITTI dataset and our own recordings using a low cost camera and LiDAR setup.