The ongoing success of three-dimensional (3D) cinema fuels increasing efforts to spread the commercial success of 3D to new markets. The possibilities of a convincing 3D experience at home, such as three-dimensional television (3DTV), has generated a great deal of interest within the research and standardization community. A central issue for 3DTV is the creation and representation of 3D content. Acquiring scene depth information is a fundamental task in computer vision, yet complex and error-prone. Dedicated range sensors, such as the Time of-Flight camera (ToF), can simplify the scene depth capture process and overcome shortcomings of traditional solutions, such as active or passive stereo analysis. Admittedly, currently available ToF sensors deliver only a limited spatial resolution. However, sophisticated depth upscaling approaches use texture information to match depth and video resolution. At Electronic Imaging 2012 we proposed an upscaling routine based on error energy minimization, weighted with edge information from an accompanying video source. In this article we develop our algorithm further. By adding temporal consistency constraints to the upscaling process, we reduce disturbing depth jumps and flickering artifacts in the final 3DTV content. Temporal consistency in depth maps enhances the 3D experience, leading to a wider acceptance of 3D media content. More content in better quality can boost the commercial success of 3DTV.