The Night Vision Tactical Trainer - Shadow (NVTT-S) is a U.S. Army-developed training tool designed to improve critical Manned-Unmanned Teaming (MUMT) communication skills for payload operators in Unmanned Aerial Sensor (UAS) crews. The trainer is composed of several Government Off-The-Shelf (GOTS) simulation components and takes the trainee through a series of escalating engagements using tactically relevant, realistically complex, scenarios involving a variety of manned, unmanned, aerial, and ground-based assets. The trainee is the only human player in the game and he must collaborate, from his web-based mock operating station, with various non-human players via spoken natural language over simulated radio in order to execute the training missions successfully. Non-human players are modeled in two complementary layers - OneSAF provides basic background behaviors for entities while NVTT provides higher level models that control entity actions based on intent extracted from the trainee’s spoken natural dialog with game entities. Dialog structure is modeled based on Army standards for communication and verbal protocols. This paper presents an architecture that integrates the U.S. Army’s Night Vision Image Generator (NVIG), One Semi- Automated Forces (OneSAF), a flight dynamics model, as well as Commercial Off The Shelf (COTS) speech recognition and text to speech products to effect an environment with sufficient entity counts and fidelity to enable meaningful teaching and reinforcement of critical communication skills. It further demonstrates the model dynamics and synchronization mechanisms employed to execute purpose-built training scenarios, and to achieve ad-hoc collaboration on-the-fly between human and non-human players in the simulated environment.