Colorectal cancer is the second leading cause of cancer deaths in the United States and causes over 50,000 deaths annually. The standard of care for colorectal cancer detection and prevention is an optical colonoscopy and polypectomy. However, over 20% of the polyps are typically missed during a standard colonoscopy procedure and 60% of colorectal cancer cases are attributed to these missed polyps. Surface topography plays a vital role in identification and characterization of lesions, but topographic features often appear subtle to a conventional endoscope. Chromoendoscopy can highlight topographic features of the mucosa and has shown to improve lesion detection rate, but requires dedicated training and increases procedure time. Photometric stereo endoscopy captures this topography but is qualitative due to unknown working distances from each point of mucosa to the endoscope. In this work, we use deep learning to estimate a depth map from an endoscope camera with four alternating light sources. Since endoscopy videos with ground truth depth maps are challenging to attain, we generated synthetic data using graphical rendering from an anatomically realistic 3D colon model and a forward model of a virtual endoscope with alternating light sources. We propose an encoder-decoder style deep network, where the encoder is split into four branches of sub-encoder networks that simultaneously extract features from each of the four sources and fuse these feature maps as the network goes deeper. This is complemented by skip connections, which maintain spatial consistency when the features are decoded. We demonstrate that, when compared to monocular depth estimation, this setup can reduce the average NRMS error for depth estimation in a silicone colon phantom by 38% and in a pig colon by 31%.