In this paper, we report on a convenient and unified framework for generating a photo-realistic interactive virtual environment (piVE) using heterogeneous multiview cameras, while not using bluescreen techniques and special rendering hardware. In spite of the rapid growth of computer hardware, rendering a photo-realistic virtual environment on the fly is still a challenging problem. With the proposed framework, exploiting stereo images/videos, piVE can be rendered in realtime without using expensive high-end computer with rendering hardware. The proposed framework consists of three main parts, i.e. (1) photo-realistic virtual space generation exploiting a camera with stereoscopic adapter, (2) generation of a video avatar (a special object representing the user) by exploiting multiview camera, and (3) graphics object rendering according to the given camera parameters and the user's interaction. We also address z-keying issues among background video, graphics objects and video avatar.