Security and intelligence services are increasingly turning toward multi-sensor video surveillance which requires human ability to successfully fuse and comprehend the information provided by videos. A training system using the same front end as real multi-sensor system for users can significantly increase such human ability. The training system always needs scenarios replicating stressful situations which are videotaped in advance and played later. This not only puts a limitation on the training scenarios but also brings a high cost. This paper introduces a new framework, virtual video capture device for such training system. Using the latest graphics processing units (GPUs) technology, multiple video streams composed of computer graphics (CG) are generated on one high-end PC and ublished to a video stream server. Thus users can be trained using both real video streams and virtual video streams on one system. It also enables the training system to use real video streams incorporating augmented reality to improve situation awareness of the human.