Traditionally, three dimension models have been used for building virtual worlds, and a data structure called the "scene graph" is often employed to organize these 3D objects in the virtual space. On the other hand, image-based rendering has recently been suggested as a probable alternative VR platform for its photo-realism, however, due to limited interactivity, it has only been used for simple navigation systems. To combine the merits of these two approaches to object/scene representations, this paper proposes for a scene graph structure in which both 3D models and various image-based scenes/objects can be defined, traversed, and rendered together. In fact, as suggested by Shade et al., these different representations can be used as different LOD's for a given object. For instance, an object might be rendered using a 3D model at close range, a billboard at an intermediate range, and as part of an environment map at far range. The ultimate objective of this mixed platform is to breath more interactivity into the image based rendered VE's by employing 3D models as well. There are several technical challenges in devising such a platform: designing scene graph nodes for various types of image based techniques, establishing criteria for LOD/representation selection, handling their transitions, implementing appropriate interaction schemes, and correctly rendering the overall scene. Currently, we have extended the scene graph structure of the Sense8's WorldToolKit, to accommodate new node types for environment maps billboards, moving textures and sprites, "Tour-into-the-Picture" structure, and view interpolated objects. As for choosing the right LOD level, the usual viewing distance and image space criteria are used, however, the switching between the image and 3D model occurs at a distance from the user where the user starts to perceive the object's internal depth. Also, during interaction, regardless of the viewing distance, a 3D representation would be used, it if exists. Before rendering, objects are conservatively culled from the view frustum using the representation with the largest volume. Finally, we carried out experiments to verify the theoretical derivation of the switching rule and obtained positive results.