As the military increasingly adopts semi-autonomous unmanned systems for military operations, utilizing redundant and intuitive interfaces for communication between Soldiers and robots is vital to mission success. Currently, Soldiers use a common lexicon to verbally and visually communicate maneuvers between teammates. In order for robots to be seamlessly integrated within mixed-initiative teams, they must be able to understand this lexicon. Recent innovations in gaming platforms have led to advancements in speech and gesture recognition technologies, but the reliability of these technologies for enabling communication in human robot teaming is unclear. The purpose for the present study is to investigate the performance of Commercial-Off-The-Shelf (COTS) speech and gesture recognition tools in classifying a Squad Level Vocabulary (SLV) for a spatial navigation reconnaissance and surveillance task. The SLV for this study was based on findings from a survey conducted with Soldiers at Fort Benning, GA. The items of the survey focused on the communication between the Soldier and the robot, specifically in regards to verbally instructing them to execute reconnaissance and surveillance tasks. Resulting commands, identified from the survey, were then converted to equivalent arm and hand gestures, leveraging existing visual signals (e.g. U.S. Army Field Manual for Visual Signaling). A study was then run to test the ability of commercially available automated speech recognition technologies and a gesture recognition glove to classify these commands in a simulated intelligence, surveillance, and reconnaissance task. This paper presents classification accuracy of these devices for both speech and gesture modalities independently.