This paper presents a new framework of an immersive kendo game with an intelligent cyber-fighter, which has its own internal needs, motivations, sets of multimodal sensors, a motor system, and a behavior system. Unlike conventional interface such as keyboard or joystick, the proposed system provides more natural and comfortable interface by exploiting multimodal interfaces such as 3D vision and speech recognition. In addition, the proposed 3D vision-based interface allows relatively free-movement in 3D space, when it compares with wired tracker-based interfaces. As a result, the user with real sword can experience an immersive fighting with the cyber-fighter in virtual environment. The proposed framework will have wide variety of applications in VR-based edutainment applications.